Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.13779
Cited By
The Mystery of the Pathological Path-star Task for Language Models
17 October 2024
Arvid Frydenlund
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Mystery of the Pathological Path-star Task for Language Models"
34 / 34 papers shown
Title
Multi-Token Prediction Needs Registers
Anastasios Gerontopoulos
Spyros Gidaris
N. Komodakis
65
0
0
15 May 2025
Looking beyond the next token
Abitha Thankaraj
Yiding Jiang
J. Zico Kolter
Yonatan Bisk
LRM
82
1
0
15 Apr 2025
Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More More
Arvid Frydenlund
LRM
98
0
0
13 Mar 2025
Reverse Training to Nurse the Reversal Curse
O. Yu. Golovneva
Zeyuan Allen-Zhu
Jason Weston
Sainbayar Sukhbaatar
63
36
0
20 Mar 2024
The pitfalls of next-token prediction
Gregor Bachmann
Vaishnavh Nagarajan
70
75
0
11 Mar 2024
Why are Sensitive Functions Hard for Transformers?
Michael Hahn
Mark Rofin
60
29
0
15 Feb 2024
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu
Tri Dao
Mamba
124
2,636
0
01 Dec 2023
PaSS: Parallel Speculative Sampling
Giovanni Monea
Armand Joulin
Edouard Grave
MoE
53
34
0
22 Nov 2023
What Algorithms can Transformers Learn? A Study in Length Generalization
Hattie Zhou
Arwen Bradley
Etai Littwin
Noam Razin
Omid Saremi
Josh Susskind
Samy Bengio
Preetum Nakkiran
63
121
0
24 Oct 2023
Think before you speak: Training Language Models With Pause Tokens
Sachin Goyal
Ziwei Ji
A. S. Rawat
A. Menon
Sanjiv Kumar
Vaishnavh Nagarajan
LRM
94
116
0
03 Oct 2023
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
Lukas Berglund
Meg Tong
Max Kaufmann
Mikita Balesni
Asa Cooper Stickland
Tomasz Korbak
Owain Evans
LRM
101
267
0
21 Sep 2023
The Impact of Positional Encoding on Length Generalization in Transformers
Amirhossein Kazemnejad
Inkit Padhi
Karthikeyan N. Ramamurthy
Payel Das
Siva Reddy
68
195
0
31 May 2023
On the Planning Abilities of Large Language Models : A Critical Investigation
Karthik Valmeekam
Matthew Marquez
S. Sreedharan
Subbarao Kambhampati
LLMAG
LRM
30
229
0
25 May 2023
Zero-shot Approach to Overcome Perturbation Sensitivity of Prompts
Mohna Chakraborty
Adithya Kulkarni
Qi Li
VLM
39
10
0
25 May 2023
Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions
S. Bhattamishra
Arkil Patel
Varun Kanade
Phil Blunsom
68
48
0
22 Nov 2022
On the Relation between Sensitivity and Accuracy in In-context Learning
Yanda Chen
Chen Zhao
Zhou Yu
Kathleen McKeown
He He
210
80
0
16 Sep 2022
Why Exposure Bias Matters: An Imitation Learning Perspective of Error Accumulation in Language Generation
Kushal Arora
Layla El Asri
Hareesh Bahuleyan
Jackie C.K. Cheung
52
81
0
03 Apr 2022
Transformer Language Models without Positional Encodings Still Learn Positional Information
Adi Haviv
Ori Ram
Ofir Press
Peter Izsak
Omer Levy
71
123
0
30 Mar 2022
Thinking Like Transformers
Gail Weiss
Yoav Goldberg
Eran Yahav
AI4CE
91
134
0
13 Jun 2021
Sensitivity as a Complexity Measure for Sequence Classification Tasks
Michael Hahn
Dan Jurafsky
Richard Futrell
176
22
0
21 Apr 2021
Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade
Jiatao Gu
X. Kong
55
137
0
31 Dec 2020
Glancing Transformer for Non-Autoregressive Neural Machine Translation
Lihua Qian
Hao Zhou
Yu Bao
Mingxuan Wang
Lin Qiu
Weinan Zhang
Yong Yu
Lei Li
74
157
0
18 Aug 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
604
41,736
0
28 May 2020
Probabilistically Masked Language Model Capable of Autoregressive Generation in Arbitrary Word Order
Yi-Lun Liao
Xin Jiang
Qun Liu
45
40
0
24 Apr 2020
Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of Kernel
Yao-Hung Hubert Tsai
Shaojie Bai
M. Yamada
Louis-Philippe Morency
Ruslan Salakhutdinov
98
254
0
30 Aug 2019
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Zhilin Yang
Zihang Dai
Yiming Yang
J. Carbonell
Ruslan Salakhutdinov
Quoc V. Le
AI4CE
215
8,415
0
19 Jun 2019
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
Myle Ott
Sergey Edunov
Alexei Baevski
Angela Fan
Sam Gross
Nathan Ng
David Grangier
Michael Auli
VLM
FaML
95
3,147
0
01 Apr 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.4K
94,511
0
11 Oct 2018
Semi-Autoregressive Neural Machine Translation
Chunqi Wang
Ji Zhang
Haiqing Chen
56
88
0
26 Aug 2018
Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement
Jason D. Lee
Elman Mansimov
Kyunghyun Cho
DiffM
BDL
58
456
0
19 Feb 2018
Non-Autoregressive Neural Machine Translation
Jiatao Gu
James Bradbury
Caiming Xiong
Victor O.K. Li
R. Socher
95
795
0
07 Nov 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
628
130,942
0
12 Jun 2017
Sequence Level Training with Recurrent Neural Networks
MarcÁurelio Ranzato
S. Chopra
Michael Auli
Wojciech Zaremba
96
1,614
0
20 Nov 2015
Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks
Samy Bengio
Oriol Vinyals
Navdeep Jaitly
Noam M. Shazeer
133
2,031
0
09 Jun 2015
1