Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.10851
Cited By
Leading Whitespaces of Language Models' Subword Vocabulary Poses a Confound for Calculating Word Probabilities
16 June 2024
Byung-Doh Oh
William Schuler
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Leading Whitespaces of Language Models' Subword Vocabulary Poses a Confound for Calculating Word Probabilities"
8 / 8 papers shown
Title
Lost in Space: Optimizing Tokens for Grammar-Constrained Decoding
Sil Hamilton
David Mimno
76
0
0
24 Feb 2025
Large Language Models Are Human-Like Internally
Tatsuki Kuribayashi
Yohei Oseki
Souhaib Ben Taieb
Kentaro Inui
Timothy Baldwin
71
4
0
03 Feb 2025
From Language Models over Tokens to Language Models over Characters
Tim Vieira
Ben LeBrun
Mario Giulianelli
Juan Luis Gastaldi
Brian DuSell
John Terilla
Timothy J. O'Donnell
Ryan Cotterell
75
8
0
04 Dec 2024
Towards a Similarity-adjusted Surprisal Theory
Clara Meister
Mario Giulianelli
Tiago Pimentel
36
3
0
23 Oct 2024
Reverse-Engineering the Reader
Samuel Kiegeland
Ethan Gotlieb Wilcox
Afra Amini
David Robert Reich
Ryan Cotterell
23
0
0
16 Oct 2024
Large-scale cloze evaluation reveals that token prediction tasks are neither lexically nor semantically aligned
Cassandra L. Jacobs
Loïc Grobol
Alvin Tsang
21
0
0
15 Oct 2024
Linear Recency Bias During Training Improves Transformers' Fit to Reading Times
Christian Clark
Byung-Doh Oh
William Schuler
39
3
0
17 Sep 2024
How to Compute the Probability of a Word
Tiago Pimentel
Clara Meister
37
14
0
20 Jun 2024
1