Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.10749
Cited By
Transformers Learn Shortcuts to Automata
19 October 2022
Bingbin Liu
Jordan T. Ash
Surbhi Goel
A. Krishnamurthy
Cyril Zhang
OffRL
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Transformers Learn Shortcuts to Automata"
36 / 136 papers shown
Title
Uncovering mesa-optimization algorithms in Transformers
J. Oswald
Eyvind Niklasson
Maximilian Schlegel
Seijin Kobayashi
Nicolas Zucchet
...
Mark Sandler
Blaise Agüera y Arcas
Max Vladymyrov
Razvan Pascanu
João Sacramento
24
53
0
11 Sep 2023
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Erik Cambria
Björn W. Schuller
LM&MA
AuLLM
31
38
0
24 Aug 2023
Average-Hard Attention Transformers are Constant-Depth Uniform Threshold Circuits
Lena Strobl
21
11
0
06 Aug 2023
What can a Single Attention Layer Learn? A Study Through the Random Features Lens
Hengyu Fu
Tianyu Guo
Yu Bai
Song Mei
MLT
27
22
0
21 Jul 2023
Large Language Models
Michael R Douglas
LLMAG
LM&MA
40
557
0
11 Jul 2023
One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention
Arvind V. Mahankali
Tatsunori B. Hashimoto
Tengyu Ma
MLT
16
80
0
07 Jul 2023
Trainable Transformer in Transformer
A. Panigrahi
Sadhika Malladi
Mengzhou Xia
Sanjeev Arora
VLM
32
12
0
03 Jul 2023
Trained Transformers Learn Linear Models In-Context
Ruiqi Zhang
Spencer Frei
Peter L. Bartlett
26
173
0
16 Jun 2023
Schema-learning and rebinding as mechanisms of in-context learning and emergence
Siva K. Swaminathan
Antoine Dedieu
Rajkumar Vasudeva Raju
Murray Shanahan
Miguel Lazaro-Gredilla
Dileep George
34
8
0
16 Jun 2023
Representational Strengths and Limitations of Transformers
Clayton Sanford
Daniel J. Hsu
Matus Telgarsky
22
81
0
05 Jun 2023
Exposing Attention Glitches with Flip-Flop Language Modeling
Bingbin Liu
Jordan T. Ash
Surbhi Goel
A. Krishnamurthy
Cyril Zhang
LRM
27
46
0
01 Jun 2023
Birth of a Transformer: A Memory Viewpoint
A. Bietti
Vivien A. Cabannes
Diane Bouchacourt
Hervé Jégou
Léon Bottou
21
81
0
01 Jun 2023
Dissecting Chain-of-Thought: Compositionality through In-Context Filtering and Learning
Yingcong Li
Kartik K. Sreenivasan
Angeliki Giannou
Dimitris Papailiopoulos
Samet Oymak
LRM
16
16
0
30 May 2023
Faith and Fate: Limits of Transformers on Compositionality
Nouha Dziri
Ximing Lu
Melanie Sclar
Xiang Lorraine Li
Liwei Jian
...
Sean Welleck
Xiang Ren
Allyson Ettinger
Zaïd Harchaoui
Yejin Choi
ReLM
LRM
30
328
0
29 May 2023
A Closer Look at In-Context Learning under Distribution Shifts
Kartik Ahuja
David Lopez-Paz
34
14
0
26 May 2023
Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective
Guhao Feng
Bohang Zhang
Yuntian Gu
Haotian Ye
Di He
Liwei Wang
LRM
27
215
0
24 May 2023
The Crucial Role of Normalization in Sharpness-Aware Minimization
Yan Dai
Kwangjun Ahn
S. Sra
21
17
0
24 May 2023
Can Transformers Learn to Solve Problems Recursively?
Shizhuo Zhang
Curt Tigges
Stella Biderman
Maxim Raginsky
Talia Ringer
15
13
0
24 May 2023
Large Language Model Programs
Imanol Schlag
Sainbayar Sukhbaatar
Asli Celikyilmaz
Wen-tau Yih
Jason Weston
Jürgen Schmidhuber
Xian Li
LRM
36
14
0
09 May 2023
Transformer Working Memory Enables Regular Language Reasoning and Natural Language Length Extrapolation
Ta-Chung Chi
Ting-Han Fan
Alexander I. Rudnicky
Peter J. Ramadge
LRM
14
12
0
05 May 2023
Can LLMs Capture Human Preferences?
Ali Goli
Amandeep Singh
51
1
0
04 May 2023
The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning
Micah Goldblum
Marc Finzi
K. Rowan
A. Wilson
UQCV
FedML
24
37
0
11 Apr 2023
An Over-parameterized Exponential Regression
Yeqi Gao
Sridhar Mahadevan
Zhao-quan Song
16
35
0
29 Mar 2023
Do Transformers Parse while Predicting the Masked Word?
Haoyu Zhao
A. Panigrahi
Rong Ge
Sanjeev Arora
76
31
0
14 Mar 2023
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding
Yuchen Li
Yuan-Fang Li
Andrej Risteski
120
61
0
07 Mar 2023
Looped Transformers as Programmable Computers
Angeliki Giannou
Shashank Rajput
Jy-yong Sohn
Kangwook Lee
Jason D. Lee
Dimitris Papailiopoulos
10
94
0
30 Jan 2023
LAMBADA: Backward Chaining for Automated Reasoning in Natural Language
Seyed Mehran Kazemi
Najoung Kim
Deepti Bhatia
Xinyuan Xu
Deepak Ramachandran
LRM
19
76
0
20 Dec 2022
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
212
496
0
01 Nov 2022
A Logic for Expressing Log-Precision Transformers
William Merrill
Ashish Sabharwal
ReLM
NAI
LRM
48
47
0
06 Oct 2022
Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit
Boaz Barak
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Eran Malach
Cyril Zhang
30
123
0
18 Jul 2022
Neural Networks and the Chomsky Hierarchy
Grégoire Delétang
Anian Ruoss
Jordi Grau-Moya
Tim Genewein
L. Wenliang
...
Chris Cundy
Marcus Hutter
Shane Legg
Joel Veness
Pedro A. Ortega
UQCV
104
130
0
05 Jul 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
361
8,495
0
28 Jan 2022
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Ofir Press
Noah A. Smith
M. Lewis
250
695
0
27 Aug 2021
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges
M. Bronstein
Joan Bruna
Taco S. Cohen
Petar Velivcković
GNN
174
1,106
0
27 Apr 2021
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,743
0
26 Sep 2016
Benefits of depth in neural networks
Matus Telgarsky
142
602
0
14 Feb 2016
Previous
1
2
3