ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.14172
  4. Cited By
The Strawberry Problem: Emergence of Character-level Understanding in Tokenized Language Models
v1v2 (latest)

The Strawberry Problem: Emergence of Character-level Understanding in Tokenized Language Models

20 May 2025
Adrian Cosma
Stefan Ruseti
Emilian Radoi
Mihai Dascalu
Author Contacts:
ioan_adrian.cosma@upb.rostefan.ruseti@upb.roemilian.radoi@upb.romihai.dascalu@upb.ro
    LRM
ArXiv (abs)PDFHTML

Papers citing "The Strawberry Problem: Emergence of Character-level Understanding in Tokenized Language Models"

22 / 22 papers shown
Title
What Really is Commonsense Knowledge?
What Really is Commonsense Knowledge?
Quyet V. Do
Junze Li
Tung-Duong Vuong
Zhaowei Wang
Yangqiu Song
Xiaojuan Ma
59
1
0
06 Nov 2024
From Tokens to Words: On the Inner Lexicon of LLMs
From Tokens to Words: On the Inner Lexicon of LLMs
Guy Kaplan
Matanel Oren
Yuval Reif
Roy Schwartz
80
14
0
08 Oct 2024
Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept
  Space
Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space
Core Francisco Park
Maya Okawa
Andrew Lee
Ekdeep Singh Lubana
Hidenori Tanaka
86
20
0
27 Jun 2024
Transformers Can Do Arithmetic with the Right Embeddings
Transformers Can Do Arithmetic with the Right Embeddings
Sean McLeish
Arpit Bansal
Alex Stein
Neel Jain
John Kirchenbauer
...
B. Kailkhura
A. Bhatele
Jonas Geiping
Avi Schwarzschild
Tom Goldstein
64
35
0
27 May 2024
Large Language Models Lack Understanding of Character Composition of
  Words
Large Language Models Lack Understanding of Character Composition of Words
Andrew Shin
Kunitake Kaneko
64
11
0
18 May 2024
Inverse Scaling: When Bigger Isn't Better
Inverse Scaling: When Bigger Isn't Better
I. R. McKenzie
Alexander Lyzhov
Michael Pieler
Alicia Parrish
Aaron Mueller
...
Yuhui Zhang
Zhengping Zhou
Najoung Kim
Sam Bowman
Ethan Perez
67
139
0
15 Jun 2023
Physics of Language Models: Part 1, Learning Hierarchical Language Structures
Physics of Language Models: Part 1, Learning Hierarchical Language Structures
Zeyuan Allen-Zhu
Yuanzhi Li
97
20
0
23 May 2023
Are Emergent Abilities of Large Language Models a Mirage?
Are Emergent Abilities of Large Language Models a Mirage?
Rylan Schaeffer
Brando Miranda
Oluwasanmi Koyejo
LRM
107
429
0
28 Apr 2023
Resurrecting Recurrent Neural Networks for Long Sequences
Resurrecting Recurrent Neural Networks for Long Sequences
Antonio Orvieto
Samuel L. Smith
Albert Gu
Anushan Fernando
Çağlar Gülçehre
Razvan Pascanu
Soham De
326
288
0
11 Mar 2023
Describe, Explain, Plan and Select: Interactive Planning with Large
  Language Models Enables Open-World Multi-Task Agents
Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents
Zihao Wang
Shaofei Cai
Guanzhou Chen
Hoang Trung-Dung
Xiaojian Ma
Yitao Liang
LM&RoLLMAG
85
335
0
03 Feb 2023
Interpretability in the Wild: a Circuit for Indirect Object
  Identification in GPT-2 small
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
302
556
0
01 Nov 2022
An Exploration of Hierarchical Attention Transformers for Efficient Long
  Document Classification
An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification
Ilias Chalkidis
Xiang Dai
Manos Fergadiotis
Prodromos Malakasiotis
Desmond Elliott
61
34
0
11 Oct 2022
In-context Learning and Induction Heads
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
316
516
0
24 Sep 2022
A Vocabulary-Free Multilingual Neural Tokenizer for End-to-End Task
  Learning
A Vocabulary-Free Multilingual Neural Tokenizer for End-to-End Task Learning
Md. Mofijul Islam
Gustavo Aguilar
Pragaash Ponnusamy
Clint Solomon Mathialagan
Chengyuan Ma
Chenlei Guo
VLM
127
10
0
22 Apr 2022
Hierarchical Transformers Are More Efficient Language Models
Hierarchical Transformers Are More Efficient Language Models
Piotr Nawrot
Szymon Tworkowski
Michał Tyrolski
Lukasz Kaiser
Yuhuai Wu
Christian Szegedy
Henryk Michalewski
68
65
0
26 Oct 2021
Charformer: Fast Character Transformers via Gradient-based Subword
  Tokenization
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
Yi Tay
Vinh Q. Tran
Sebastian Ruder
Jai Gupta
Hyung Won Chung
Dara Bahri
Zhen Qin
Simon Baumgartner
Cong Yu
Donald Metzler
108
159
0
23 Jun 2021
Hi-Transformer: Hierarchical Interactive Transformer for Efficient and
  Effective Long Document Modeling
Hi-Transformer: Hierarchical Interactive Transformer for Efficient and Effective Long Document Modeling
Chuhan Wu
Fangzhao Wu
Tao Qi
Yongfeng Huang
103
68
0
02 Jun 2021
ByT5: Towards a token-free future with pre-trained byte-to-byte models
ByT5: Towards a token-free future with pre-trained byte-to-byte models
Linting Xue
Aditya Barua
Noah Constant
Rami Al-Rfou
Sharan Narang
Mihir Kale
Adam Roberts
Colin Raffel
98
504
0
28 May 2021
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image
  Classification
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification
Chun-Fu Chen
Quanfu Fan
Yikang Shen
ViT
71
1,478
0
27 Mar 2021
Longformer: The Long-Document Transformer
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALMVLM
171
4,071
0
10 Apr 2020
ConceptNet 5.5: An Open Multilingual Graph of General Knowledge
ConceptNet 5.5: An Open Multilingual Graph of General Knowledge
R. Speer
Joshua Chin
Catherine Havasi
201
2,900
0
12 Dec 2016
Neural Machine Translation of Rare Words with Subword Units
Neural Machine Translation of Rare Words with Subword Units
Rico Sennrich
Barry Haddow
Alexandra Birch
221
7,745
0
31 Aug 2015
1