ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.11336
  4. Cited By
Looking beyond the next token
v1v2 (latest)

Looking beyond the next token

15 April 2025
Abitha Thankaraj
Yiding Jiang
J. Zico Kolter
Yonatan Bisk
    LRM
ArXiv (abs)PDFHTML

Papers citing "Looking beyond the next token"

31 / 31 papers shown
Title
Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction
Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction
Vaishnavh Nagarajan
Chen Henry Wu
Charles Ding
Aditi Raghunathan
102
0
0
21 Apr 2025
Forking Paths in Neural Text Generation
Forking Paths in Neural Text Generation
Eric J. Bigelow
Ari Holtzman
Hidenori Tanaka
T. Ullman
98
6
0
10 Dec 2024
The Mystery of the Pathological Path-star Task for Language Models
The Mystery of the Pathological Path-star Task for Language Models
Arvid Frydenlund
LRM
97
4
0
17 Oct 2024
Semformer: Transformer Language Models with Semantic Planning
Semformer: Transformer Language Models with Semantic Planning
Yongjing Yin
Junran Ding
Kai Song
Yue Zhang
112
5
0
17 Sep 2024
The Factorization Curse: Which Tokens You Predict Underlie the Reversal
  Curse and More
The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More
O. Kitouni
Niklas Nolte
Diane Bouchacourt
Adina Williams
Mike Rabbat
Mark Ibrahim
LRMCLL
81
12
0
07 Jun 2024
The CLRS-Text Algorithmic Reasoning Language Benchmark
The CLRS-Text Algorithmic Reasoning Language Benchmark
Larisa Markeeva
Sean McLeish
Borja Ibarz
Wilfried Bounsi
Olga Kozlova
Alex Vitvitskyi
Charles Blundell
Tom Goldstein
Avi Schwarzschild
Petar Veličković
LRM
75
15
0
06 Jun 2024
Better & Faster Large Language Models via Multi-token Prediction
Better & Faster Large Language Models via Multi-token Prediction
Fabian Gloeckle
Badr Youbi Idrissi
Baptiste Rozière
David Lopez-Paz
Gabriele Synnaeve
99
120
0
30 Apr 2024
σ-GPTs: A New Approach to Autoregressive Models
σ-GPTs: A New Approach to Autoregressive Models
Arnaud Pannatier
Evann Courdier
Franccois Fleuret
AI4TS
61
10
0
15 Apr 2024
Rho-1: Not All Tokens Are What You Need
Rho-1: Not All Tokens Are What You Need
Zheng-Wen Lin
Zhibin Gou
Yeyun Gong
Xiao Liu
Yelong Shen
...
Chen Lin
Yujiu Yang
Jian Jiao
Nan Duan
Weizhu Chen
CLL
109
75
0
11 Apr 2024
The pitfalls of next-token prediction
The pitfalls of next-token prediction
Gregor Bachmann
Vaishnavh Nagarajan
97
80
0
11 Mar 2024
Think before you speak: Training Language Models With Pause Tokens
Think before you speak: Training Language Models With Pause Tokens
Sachin Goyal
Ziwei Ji
A. S. Rawat
A. Menon
Sanjiv Kumar
Vaishnavh Nagarajan
LRM
105
121
0
03 Oct 2023
Learning to Model the World with Language
Learning to Model the World with Language
Jessy Lin
Yuqing Du
Olivia Watkins
Danijar Hafner
Pieter Abbeel
Dan Klein
Anca Dragan
LM&RoSyDa
95
54
0
31 Jul 2023
Autoregressive Modeling with Lookahead Attention
Autoregressive Modeling with Lookahead Attention
Li Du
Hongyuan Mei
Jason Eisner
63
6
0
20 May 2023
TinyStories: How Small Can Language Models Be and Still Speak Coherent
  English?
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Ronen Eldan
Yuan-Fang Li
SyDaLRM
79
266
0
12 May 2023
Mastering Diverse Domains through World Models
Mastering Diverse Domains through World Models
Danijar Hafner
J. Pašukonis
Jimmy Ba
Timothy Lillicrap
77
612
0
10 Jan 2023
DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models
DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models
Shansan Gong
Mukai Li
Jiangtao Feng
Zhiyong Wu
Lingpeng Kong
93
333
0
17 Oct 2022
Efficient Training of Language Models to Fill in the Middle
Efficient Training of Language Models to Fill in the Middle
Mohammad Bavarian
Heewoo Jun
Nikolas Tezak
John Schulman
C. McLeavey
Jerry Tworek
Mark Chen
76
197
0
28 Jul 2022
Why Exposure Bias Matters: An Imitation Learning Perspective of Error
  Accumulation in Language Generation
Why Exposure Bias Matters: An Imitation Learning Perspective of Error Accumulation in Language Generation
Kushal Arora
Layla El Asri
Hareesh Bahuleyan
Jackie C.K. Cheung
61
82
0
03 Apr 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&RoLRMAI4CEReLM
843
9,644
0
28 Jan 2022
Show Your Work: Scratchpads for Intermediate Computation with Language
  Models
Show Your Work: Scratchpads for Intermediate Computation with Language Models
Maxwell Nye
Anders Andreassen
Guy Gur-Ari
Henryk Michalewski
Jacob Austin
...
Aitor Lewkowycz
Maarten Bosma
D. Luan
Charles Sutton
Augustus Odena
ReLMLRM
183
753
0
30 Nov 2021
GeDi: Generative Discriminator Guided Sequence Generation
GeDi: Generative Discriminator Guided Sequence Generation
Ben Krause
Akhilesh Deepak Gotmare
Bryan McCann
N. Keskar
Shafiq Joty
R. Socher
Nazneen Rajani
134
408
0
14 Sep 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
874
42,379
0
28 May 2020
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence
  Pre-training
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training
Weizhen Qi
Yu Yan
Yeyun Gong
Dayiheng Liu
Nan Duan
Jiusheng Chen
Ruofei Zhang
Ming Zhou
AI4TS
93
450
0
13 Jan 2020
Plug and Play Language Models: A Simple Approach to Controlled Text
  Generation
Plug and Play Language Models: A Simple Approach to Controlled Text Generation
Sumanth Dathathri
Andrea Madotto
Janice Lan
Jane Hung
Eric Frank
Piero Molino
J. Yosinski
Rosanne Liu
KELM
147
978
0
04 Dec 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
472
20,317
0
23 Oct 2019
CTRL: A Conditional Transformer Language Model for Controllable
  Generation
CTRL: A Conditional Transformer Language Model for Controllable Generation
N. Keskar
Bryan McCann
Lav Varshney
Caiming Xiong
R. Socher
AI4CE
130
1,239
0
11 Sep 2019
Video Representation Learning by Dense Predictive Coding
Video Representation Learning by Dense Predictive Coding
Tengda Han
Weidi Xie
Andrew Zisserman
SSL
98
361
0
10 Sep 2019
XLNet: Generalized Autoregressive Pretraining for Language Understanding
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Zhilin Yang
Zihang Dai
Yiming Yang
J. Carbonell
Ruslan Salakhutdinov
Quoc V. Le
AI4CE
236
8,447
0
19 Jun 2019
Non-Monotonic Sequential Text Generation
Non-Monotonic Sequential Text Generation
Sean Welleck
Kianté Brantley
Hal Daumé
Kyunghyun Cho
73
130
0
05 Feb 2019
Non-Autoregressive Neural Machine Translation
Non-Autoregressive Neural Machine Translation
Jiatao Gu
James Bradbury
Caiming Xiong
Victor O.K. Li
R. Socher
107
797
0
07 Nov 2017
A Reduction of Imitation Learning and Structured Prediction to No-Regret
  Online Learning
A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
Stéphane Ross
Geoffrey J. Gordon
J. Andrew Bagnell
OffRL
244
3,233
0
02 Nov 2010
1