ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.06963
  4. Cited By
The pitfalls of next-token prediction

The pitfalls of next-token prediction

11 March 2024
Gregor Bachmann
Vaishnavh Nagarajan
ArXivPDFHTML

Papers citing "The pitfalls of next-token prediction"

37 / 37 papers shown
Title
Insertion Language Models: Sequence Generation with Arbitrary-Position Insertions
Insertion Language Models: Sequence Generation with Arbitrary-Position Insertions
Dhruvesh Patel
Aishwarya Sahoo
Avinash Amballa
Tahira Naseem
Tim G. J. Rudner
Andrew McCallum
KELM
81
0
0
09 May 2025
Looking beyond the next token
Looking beyond the next token
Abitha Thankaraj
Yiding Jiang
J. Zico Kolter
Yonatan Bisk
LRM
76
1
0
15 Apr 2025
Extendable Long-Horizon Planning via Hierarchical Multiscale Diffusion
Extendable Long-Horizon Planning via Hierarchical Multiscale Diffusion
Chang Chen
Hany Hamed
Doojin Baek
Taegu Kang
Yoshua Bengio
Sungjin Ahn
73
0
0
25 Mar 2025
Reasoning Bias of Next Token Prediction Training
Reasoning Bias of Next Token Prediction Training
Pengxiao Lin
Zhongwang Zhang
Zhi-Qin John Xu
LRM
147
2
0
21 Feb 2025
Exposing Numeracy Gaps: A Benchmark to Evaluate Fundamental Numerical Abilities in Large Language Models
Exposing Numeracy Gaps: A Benchmark to Evaluate Fundamental Numerical Abilities in Large Language Models
Haoyang Li
Xuejia Chen
Zhanchao Xu
Darian Li
Nicole Hu
...
Yongbin Li
Luyu Qiu
C. Zhang
Qing Li
Lei Chen
ELM
LRM
80
1
0
16 Feb 2025
Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges
Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges
Nayoung Lee
Ziyang Cai
Avi Schwarzschild
Kangwook Lee
Dimitris Papailiopoulos
ReLM
VLM
LRM
AI4CE
107
6
0
03 Feb 2025
Scaling Diffusion Language Models via Adaptation from Autoregressive Models
Scaling Diffusion Language Models via Adaptation from Autoregressive Models
Shansan Gong
Shivam Agarwal
Yizhe Zhang
Jiacheng Ye
Lin Zheng
...
Peilin Zhao
W. Bi
Jiawei Han
Hao Peng
Dianbo Sui
AI4CE
102
24
0
23 Oct 2024
Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens
Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens
Zhepeng Cen
Yao Liu
Siliang Zeng
Pratik Chaudhar
Huzefa Rangwala
George Karypis
Rasool Fakoor
SyDa
AIFin
83
3
0
18 Oct 2024
Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning
Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning
Jiacheng Ye
Jiahui Gao
Shansan Gong
Lin Zheng
Xin Jiang
Zhiyu Li
Dianbo Sui
DiffM
LRM
117
20
0
18 Oct 2024
The Mystery of the Pathological Path-star Task for Language Models
The Mystery of the Pathological Path-star Task for Language Models
Arvid Frydenlund
LRM
53
4
0
17 Oct 2024
Transformers, parallel computation, and logarithmic depth
Transformers, parallel computation, and logarithmic depth
Clayton Sanford
Daniel J. Hsu
Matus Telgarsky
36
33
0
14 Feb 2024
Think before you speak: Training Language Models With Pause Tokens
Think before you speak: Training Language Models With Pause Tokens
Sachin Goyal
Ziwei Ji
A. S. Rawat
A. Menon
Sanjiv Kumar
Vaishnavh Nagarajan
LRM
82
114
0
03 Oct 2023
Evaluating Cognitive Maps and Planning in Large Language Models with
  CogEval
Evaluating Cognitive Maps and Planning in Large Language Models with CogEval
Ida Momennejad
Hosein Hasanbeig
Felipe Vieira Frujeri
Hiteshi Sharma
Robert Osazuwa Ness
Nebojsa Jojic
Hamid Palangi
Jonathan Larson
ELM
LLMAG
LRM
54
69
0
25 Sep 2023
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
Jiasheng Ye
Zaixiang Zheng
Yu Bao
Lihua Qian
Quanquan Gu
DiffM
100
16
0
23 Aug 2023
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Maciej Besta
Nils Blach
Aleš Kubíček
Robert Gerstenberger
Michal Podstawski
...
Joanna Gajda
Tomasz Lehmann
H. Niewiadomski
Piotr Nyczyk
Torsten Hoefler
LRM
AI4CE
LM&Ro
96
654
0
18 Aug 2023
Faith and Fate: Limits of Transformers on Compositionality
Faith and Fate: Limits of Transformers on Compositionality
Nouha Dziri
Ximing Lu
Melanie Sclar
Xiang Lorraine Li
Liwei Jian
...
Sean Welleck
Xiang Ren
Allyson Ettinger
Zaïd Harchaoui
Yejin Choi
ReLM
LRM
79
367
0
29 May 2023
Towards Revealing the Mystery behind Chain of Thought: A Theoretical
  Perspective
Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective
Guhao Feng
Bohang Zhang
Yuntian Gu
Haotian Ye
Di He
Liwei Wang
LRM
77
236
0
24 May 2023
Autoregressive Modeling with Lookahead Attention
Autoregressive Modeling with Lookahead Attention
Li Du
Hongyuan Mei
Jason Eisner
45
6
0
20 May 2023
Adaptive Computation with Elastic Input Sequence
Adaptive Computation with Elastic Input Sequence
Fuzhao Xue
Valerii Likhosherstov
Anurag Arnab
N. Houlsby
Mostafa Dehghani
Yang You
50
20
0
30 Jan 2023
Language models are better than humans at next-token prediction
Language models are better than humans at next-token prediction
Buck Shlegeris
Fabien Roger
Lawrence Chan
Euan McLean
ELM
LRM
48
11
0
21 Dec 2022
A Measure-Theoretic Characterization of Tight Language Models
A Measure-Theoretic Characterization of Tight Language Models
Li Du
Lucas Torroba Hennigen
Tiago Pimentel
Clara Meister
Jason Eisner
Ryan Cotterell
58
32
0
20 Dec 2022
DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models
DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models
Shansan Gong
Mukai Li
Jiangtao Feng
Zhiyong Wu
Lingpeng Kong
81
325
0
17 Oct 2022
In-context Learning and Induction Heads
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
303
494
0
24 Sep 2022
Exploring Length Generalization in Large Language Models
Exploring Length Generalization in Large Language Models
Cem Anil
Yuhuai Wu
Anders Andreassen
Aitor Lewkowycz
Vedant Misra
V. Ramasesh
Ambrose Slone
Guy Gur-Ari
Ethan Dyer
Behnam Neyshabur
ReLM
LRM
65
164
0
11 Jul 2022
STaR: Bootstrapping Reasoning With Reasoning
STaR: Bootstrapping Reasoning With Reasoning
E. Zelikman
Yuhuai Wu
Jesse Mu
Noah D. Goodman
ReLM
LRM
83
468
0
28 Mar 2022
Locating and Editing Factual Associations in GPT
Locating and Editing Factual Associations in GPT
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
KELM
170
1,308
0
10 Feb 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
616
9,009
0
28 Jan 2022
Show Your Work: Scratchpads for Intermediate Computation with Language
  Models
Show Your Work: Scratchpads for Intermediate Computation with Language Models
Maxwell Nye
Anders Andreassen
Guy Gur-Ari
Henryk Michalewski
Jacob Austin
...
Aitor Lewkowycz
Maarten Bosma
D. Luan
Charles Sutton
Augustus Odena
ReLM
LRM
145
728
0
30 Nov 2021
Teaching Autoregressive Language Models Complex Tasks By Demonstration
Teaching Autoregressive Language Models Complex Tasks By Demonstration
Gabriel Recchia
53
22
0
05 Sep 2021
Measuring and Improving BERT's Mathematical Abilities by Predicting the
  Order of Reasoning
Measuring and Improving BERT's Mathematical Abilities by Predicting the Order of Reasoning
Piotr Pikekos
Henryk Michalewski
Mateusz Malinowski
42
28
0
07 Jun 2021
Consistency of a Recurrent Language Model With Respect to Incomplete
  Decoding
Consistency of a Recurrent Language Model With Respect to Incomplete Decoding
Sean Welleck
Ilia Kulikov
Jaedeok Kim
Richard Yuanzhe Pang
Kyunghyun Cho
57
66
0
06 Feb 2020
Fine-Tuning Language Models from Human Preferences
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
434
1,664
0
18 Sep 2019
A Deep Reinforced Model for Abstractive Summarization
A Deep Reinforced Model for Abstractive Summarization
Romain Paulus
Caiming Xiong
R. Socher
AI4TS
159
1,551
0
11 May 2017
Program Induction by Rationale Generation : Learning to Solve and
  Explain Algebraic Word Problems
Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems
Wang Ling
Dani Yogatama
Chris Dyer
Phil Blunsom
AIMat
61
701
0
11 May 2017
On the Sample Complexity of End-to-end Training vs. Semantic Abstraction
  Training
On the Sample Complexity of End-to-end Training vs. Semantic Abstraction Training
Shai Shalev-Shwartz
Amnon Shashua
43
59
0
23 Apr 2016
Sequence Level Training with Recurrent Neural Networks
Sequence Level Training with Recurrent Neural Networks
MarcÁurelio Ranzato
S. Chopra
Michael Auli
Wojciech Zaremba
87
1,611
0
20 Nov 2015
Knowledge Matters: Importance of Prior Information for Optimization
Knowledge Matters: Importance of Prior Information for Optimization
Çağlar Gülçehre
Yoshua Bengio
114
166
0
17 Jan 2013
1