ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.13779
  4. Cited By
The Mystery of the Pathological Path-star Task for Language Models

The Mystery of the Pathological Path-star Task for Language Models

17 October 2024
Arvid Frydenlund
    LRM
ArXivPDFHTML

Papers citing "The Mystery of the Pathological Path-star Task for Language Models"

34 / 34 papers shown
Title
Multi-Token Prediction Needs Registers
Multi-Token Prediction Needs Registers
Anastasios Gerontopoulos
Spyros Gidaris
N. Komodakis
65
0
0
15 May 2025
Looking beyond the next token
Looking beyond the next token
Abitha Thankaraj
Yiding Jiang
J. Zico Kolter
Yonatan Bisk
LRM
82
1
0
15 Apr 2025
Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More More
Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More More
Arvid Frydenlund
LRM
98
0
0
13 Mar 2025
Reverse Training to Nurse the Reversal Curse
Reverse Training to Nurse the Reversal Curse
O. Yu. Golovneva
Zeyuan Allen-Zhu
Jason Weston
Sainbayar Sukhbaatar
63
36
0
20 Mar 2024
The pitfalls of next-token prediction
The pitfalls of next-token prediction
Gregor Bachmann
Vaishnavh Nagarajan
70
75
0
11 Mar 2024
Why are Sensitive Functions Hard for Transformers?
Why are Sensitive Functions Hard for Transformers?
Michael Hahn
Mark Rofin
60
29
0
15 Feb 2024
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu
Tri Dao
Mamba
124
2,636
0
01 Dec 2023
PaSS: Parallel Speculative Sampling
PaSS: Parallel Speculative Sampling
Giovanni Monea
Armand Joulin
Edouard Grave
MoE
53
34
0
22 Nov 2023
What Algorithms can Transformers Learn? A Study in Length Generalization
What Algorithms can Transformers Learn? A Study in Length Generalization
Hattie Zhou
Arwen Bradley
Etai Littwin
Noam Razin
Omid Saremi
Josh Susskind
Samy Bengio
Preetum Nakkiran
63
121
0
24 Oct 2023
Think before you speak: Training Language Models With Pause Tokens
Think before you speak: Training Language Models With Pause Tokens
Sachin Goyal
Ziwei Ji
A. S. Rawat
A. Menon
Sanjiv Kumar
Vaishnavh Nagarajan
LRM
94
116
0
03 Oct 2023
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
Lukas Berglund
Meg Tong
Max Kaufmann
Mikita Balesni
Asa Cooper Stickland
Tomasz Korbak
Owain Evans
LRM
101
267
0
21 Sep 2023
The Impact of Positional Encoding on Length Generalization in
  Transformers
The Impact of Positional Encoding on Length Generalization in Transformers
Amirhossein Kazemnejad
Inkit Padhi
Karthikeyan N. Ramamurthy
Payel Das
Siva Reddy
68
195
0
31 May 2023
On the Planning Abilities of Large Language Models : A Critical
  Investigation
On the Planning Abilities of Large Language Models : A Critical Investigation
Karthik Valmeekam
Matthew Marquez
S. Sreedharan
Subbarao Kambhampati
LLMAG
LRM
30
229
0
25 May 2023
Zero-shot Approach to Overcome Perturbation Sensitivity of Prompts
Zero-shot Approach to Overcome Perturbation Sensitivity of Prompts
Mohna Chakraborty
Adithya Kulkarni
Qi Li
VLM
39
10
0
25 May 2023
Simplicity Bias in Transformers and their Ability to Learn Sparse
  Boolean Functions
Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions
S. Bhattamishra
Arkil Patel
Varun Kanade
Phil Blunsom
68
48
0
22 Nov 2022
On the Relation between Sensitivity and Accuracy in In-context Learning
On the Relation between Sensitivity and Accuracy in In-context Learning
Yanda Chen
Chen Zhao
Zhou Yu
Kathleen McKeown
He He
210
80
0
16 Sep 2022
Why Exposure Bias Matters: An Imitation Learning Perspective of Error
  Accumulation in Language Generation
Why Exposure Bias Matters: An Imitation Learning Perspective of Error Accumulation in Language Generation
Kushal Arora
Layla El Asri
Hareesh Bahuleyan
Jackie C.K. Cheung
52
81
0
03 Apr 2022
Transformer Language Models without Positional Encodings Still Learn
  Positional Information
Transformer Language Models without Positional Encodings Still Learn Positional Information
Adi Haviv
Ori Ram
Ofir Press
Peter Izsak
Omer Levy
71
123
0
30 Mar 2022
Thinking Like Transformers
Thinking Like Transformers
Gail Weiss
Yoav Goldberg
Eran Yahav
AI4CE
91
134
0
13 Jun 2021
Sensitivity as a Complexity Measure for Sequence Classification Tasks
Sensitivity as a Complexity Measure for Sequence Classification Tasks
Michael Hahn
Dan Jurafsky
Richard Futrell
176
22
0
21 Apr 2021
Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade
Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade
Jiatao Gu
X. Kong
55
137
0
31 Dec 2020
Glancing Transformer for Non-Autoregressive Neural Machine Translation
Glancing Transformer for Non-Autoregressive Neural Machine Translation
Lihua Qian
Hao Zhou
Yu Bao
Mingxuan Wang
Lin Qiu
Weinan Zhang
Yong Yu
Lei Li
74
157
0
18 Aug 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
604
41,736
0
28 May 2020
Probabilistically Masked Language Model Capable of Autoregressive
  Generation in Arbitrary Word Order
Probabilistically Masked Language Model Capable of Autoregressive Generation in Arbitrary Word Order
Yi-Lun Liao
Xin Jiang
Qun Liu
45
40
0
24 Apr 2020
Transformer Dissection: A Unified Understanding of Transformer's
  Attention via the Lens of Kernel
Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of Kernel
Yao-Hung Hubert Tsai
Shaojie Bai
M. Yamada
Louis-Philippe Morency
Ruslan Salakhutdinov
98
254
0
30 Aug 2019
XLNet: Generalized Autoregressive Pretraining for Language Understanding
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Zhilin Yang
Zihang Dai
Yiming Yang
J. Carbonell
Ruslan Salakhutdinov
Quoc V. Le
AI4CE
215
8,415
0
19 Jun 2019
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
Myle Ott
Sergey Edunov
Alexei Baevski
Angela Fan
Sam Gross
Nathan Ng
David Grangier
Michael Auli
VLM
FaML
95
3,147
0
01 Apr 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.4K
94,511
0
11 Oct 2018
Semi-Autoregressive Neural Machine Translation
Semi-Autoregressive Neural Machine Translation
Chunqi Wang
Ji Zhang
Haiqing Chen
56
88
0
26 Aug 2018
Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative
  Refinement
Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement
Jason D. Lee
Elman Mansimov
Kyunghyun Cho
DiffM
BDL
58
456
0
19 Feb 2018
Non-Autoregressive Neural Machine Translation
Non-Autoregressive Neural Machine Translation
Jiatao Gu
James Bradbury
Caiming Xiong
Victor O.K. Li
R. Socher
95
795
0
07 Nov 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
628
130,942
0
12 Jun 2017
Sequence Level Training with Recurrent Neural Networks
Sequence Level Training with Recurrent Neural Networks
MarcÁurelio Ranzato
S. Chopra
Michael Auli
Wojciech Zaremba
96
1,614
0
20 Nov 2015
Scheduled Sampling for Sequence Prediction with Recurrent Neural
  Networks
Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks
Samy Bengio
Oriol Vinyals
Navdeep Jaitly
Noam M. Shazeer
133
2,031
0
09 Jun 2015
1