ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.12172
  4. Cited By
Overcoming a Theoretical Limitation of Self-Attention

Overcoming a Theoretical Limitation of Self-Attention

24 February 2022
David Chiang
Peter A. Cholak
ArXivPDFHTML

Papers citing "Overcoming a Theoretical Limitation of Self-Attention"

50 / 68 papers shown
Title
Lost in Transmission: When and Why LLMs Fail to Reason Globally
Lost in Transmission: When and Why LLMs Fail to Reason Globally
Tobias Schnabel
Kiran Tomlinson
Adith Swaminathan
Jennifer Neville
LRM
30
0
0
13 May 2025
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
Ruiquan Huang
Yingbin Liang
Jing Yang
46
0
0
02 May 2025
HalluLens: LLM Hallucination Benchmark
HalluLens: LLM Hallucination Benchmark
Yejin Bang
Ziwei Ji
Alan Schelten
Anthony Hartshorn
Tara Fowler
Cheng Zhang
Nicola Cancedda
Pascale Fung
HILM
92
0
0
24 Apr 2025
Exploring Compositional Generalization (in ReCOGS_pos) by Transformers using Restricted Access Sequence Processing (RASP)
Exploring Compositional Generalization (in ReCOGS_pos) by Transformers using Restricted Access Sequence Processing (RASP)
William Bruns
43
0
0
21 Apr 2025
Approximation Bounds for Transformer Networks with Application to Regression
Approximation Bounds for Transformer Networks with Application to Regression
Yuling Jiao
Yanming Lai
Defeng Sun
Yang Wang
Bokai Yan
29
0
0
16 Apr 2025
Unique Hard Attention: A Tale of Two Sides
Unique Hard Attention: A Tale of Two Sides
Selim Jerad
Anej Svete
Jiaoda Li
Ryan Cotterell
56
0
0
18 Mar 2025
AttentionRAG: Attention-Guided Context Pruning in Retrieval-Augmented Generation
Yixiong Fang
Tianran Sun
Yuling Shi
Xiaodong Gu
61
0
0
13 Mar 2025
Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More More
Arvid Frydenlund
LRM
48
0
0
13 Mar 2025
Selective Prompt Anchoring for Code Generation
Selective Prompt Anchoring for Code Generation
Yuan Tian
Tianyi Zhang
91
3
0
24 Feb 2025
HuDEx: Integrating Hallucination Detection and Explainability for Enhancing the Reliability of LLM responses
HuDEx: Integrating Hallucination Detection and Explainability for Enhancing the Reliability of LLM responses
Sujeong Lee
Hayoung Lee
Seongsoo Heo
Wonik Choi
HILM
93
0
0
12 Feb 2025
Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers
Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers
Alireza Amiri
Xinting Huang
Mark Rofin
Michael Hahn
LRM
180
0
0
04 Feb 2025
A completely uniform transformer for parity
A completely uniform transformer for parity
Alexander Kozachinskiy
Tomasz Steifer
35
1
0
07 Jan 2025
Theoretical limitations of multi-layer Transformer
Theoretical limitations of multi-layer Transformer
Lijie Chen
Binghui Peng
Hongxun Wu
AI4CE
72
6
0
04 Dec 2024
Sneaking Syntax into Transformer Language Models with Tree Regularization
Sneaking Syntax into Transformer Language Models with Tree Regularization
Ananjan Nandi
Christopher D. Manning
Shikhar Murty
74
0
0
28 Nov 2024
Training Neural Networks as Recognizers of Formal Languages
Training Neural Networks as Recognizers of Formal Languages
Alexandra Butoi
Ghazal Khalighinejad
Anej Svete
Josef Valvoda
Ryan Cotterell
Brian DuSell
NAI
41
2
0
11 Nov 2024
BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters
  for Efficient LLM Inference
BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference
Junqi Zhao
Zhijin Fang
Shu Li
Shaohui Yang
Shichao He
37
2
0
30 Oct 2024
Counting Ability of Large Language Models and Impact of Tokenization
Counting Ability of Large Language Models and Impact of Tokenization
Xiang Zhang
Juntai Cao
Chenyu You
LRM
35
5
0
25 Oct 2024
Extracting Finite State Machines from Transformers
Extracting Finite State Machines from Transformers
Rik Adriaensen
Jaron Maene
AI4CE
31
0
0
08 Oct 2024
Fundamental Limitations on Subquadratic Alternatives to Transformers
Fundamental Limitations on Subquadratic Alternatives to Transformers
Josh Alman
Hantao Yu
23
1
0
05 Oct 2024
ALR$^2$: A Retrieve-then-Reason Framework for Long-context Question
  Answering
ALR2^22: A Retrieve-then-Reason Framework for Long-context Question Answering
Huayang Li
Pat Verga
Priyanka Sen
Bowen Yang
Vijay Viswanathan
Patrick Lewis
Taro Watanabe
Yixuan Su
RALM
LRM
46
7
0
04 Oct 2024
softmax is not enough (for sharp out-of-distribution)
softmax is not enough (for sharp out-of-distribution)
Petar Veličković
Christos Perivolaropoulos
Federico Barbero
Razvan Pascanu
42
18
0
01 Oct 2024
Improvements to SDXL in NovelAI Diffusion V3
Improvements to SDXL in NovelAI Diffusion V3
Juan Ossa
Eren Doğan
Alex Birch
F. Johnson
37
1
0
24 Sep 2024
Knowledge Mechanisms in Large Language Models: A Survey and Perspective
Knowledge Mechanisms in Large Language Models: A Survey and Perspective
Meng Wang
Yunzhi Yao
Ziwen Xu
Shuofei Qiao
Shumin Deng
...
Yong-jia Jiang
Pengjun Xie
Fei Huang
Huajun Chen
Ningyu Zhang
52
28
0
22 Jul 2024
Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference
Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference
Anton Xue
Avishree Khare
Rajeev Alur
Surbhi Goel
Eric Wong
55
2
0
21 Jun 2024
On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning
On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning
Franz Nowak
Anej Svete
Alexandra Butoi
Ryan Cotterell
ReLM
LRM
48
12
0
20 Jun 2024
Language Models Need Inductive Biases to Count Inductively
Language Models Need Inductive Biases to Count Inductively
Yingshan Chang
Yonatan Bisk
LRM
32
5
0
30 May 2024
The Expressive Capacity of State Space Models: A Formal Language
  Perspective
The Expressive Capacity of State Space Models: A Formal Language Perspective
Yash Sarrof
Yana Veitsman
Michael Hahn
Mamba
32
8
0
27 May 2024
Rethinking Transformers in Solving POMDPs
Rethinking Transformers in Solving POMDPs
Chenhao Lu
Ruizhe Shi
Yuyao Liu
Kaizhe Hu
Simon S. Du
Huazhe Xu
AI4CE
32
3
0
27 May 2024
LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for
  Remote Sensing Image Interpretation
LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for Remote Sensing Image Interpretation
Wentao Jiang
Jing Zhang
Di Wang
Qiming Zhang
Zengmao Wang
Bo Du
37
5
0
16 May 2024
Transformers Can Represent $n$-gram Language Models
Transformers Can Represent nnn-gram Language Models
Anej Svete
Ryan Cotterell
32
17
0
23 Apr 2024
Length Generalization of Causal Transformers without Position Encoding
Length Generalization of Causal Transformers without Position Encoding
Jie Wang
Tao Ji
Yuanbin Wu
Hang Yan
Tao Gui
Qi Zhang
Xuanjing Huang
Xiaoling Wang
VLM
49
15
0
18 Apr 2024
LongEmbed: Extending Embedding Models for Long Context Retrieval
LongEmbed: Extending Embedding Models for Long Context Retrieval
Dawei Zhu
Liang Wang
Nan Yang
Yifan Song
Wenhao Wu
Furu Wei
Sujian Li
RALM
43
21
0
18 Apr 2024
TEL'M: Test and Evaluation of Language Models
TEL'M: Test and Evaluation of Language Models
G. Cybenko
Joshua Ackerman
Paul Lintilhac
ALM
ELM
40
0
0
16 Apr 2024
MemFlow: Optical Flow Estimation and Prediction with Memory
MemFlow: Optical Flow Estimation and Prediction with Memory
Qiaole Dong
Yanwei Fu
25
19
0
07 Apr 2024
Transformers as Transducers
Transformers as Transducers
Lena Strobl
Dana Angluin
David Chiang
Jonathan Rawski
Ashish Sabharwal
27
5
0
02 Apr 2024
Simulating Weighted Automata over Sequences and Trees with Transformers
Simulating Weighted Automata over Sequences and Trees with Transformers
Michael Rizvi
M. Lizaire
Clara Lacroce
Guillaume Rabusseau
AI4CE
53
0
0
12 Mar 2024
Why are Sensitive Functions Hard for Transformers?
Why are Sensitive Functions Hard for Transformers?
Michael Hahn
Mark Rofin
41
25
0
15 Feb 2024
Superiority of Multi-Head Attention in In-Context Linear Regression
Superiority of Multi-Head Attention in In-Context Linear Regression
Yingqian Cui
Jie Ren
Pengfei He
Jiliang Tang
Yue Xing
37
12
0
30 Jan 2024
MVSFormer++: Revealing the Devil in Transformer's Details for Multi-View
  Stereo
MVSFormer++: Revealing the Devil in Transformer's Details for Multi-View Stereo
Chenjie Cao
Xinlin Ren
Yanwei Fu
28
26
0
22 Jan 2024
Hallucination is Inevitable: An Innate Limitation of Large Language Models
Hallucination is Inevitable: An Innate Limitation of Large Language Models
Ziwei Xu
Sanjay Jain
Mohan S. Kankanhalli
HILM
LRM
71
212
0
22 Jan 2024
Extending LLMs' Context Window with 100 Samples
Extending LLMs' Context Window with 100 Samples
Yikai Zhang
Junlong Li
Pengfei Liu
31
11
0
13 Jan 2024
Modality-Collaborative Transformer with Hybrid Feature Reconstruction
  for Robust Emotion Recognition
Modality-Collaborative Transformer with Hybrid Feature Reconstruction for Robust Emotion Recognition
Chengxin Chen
Pengyuan Zhang
28
5
0
26 Dec 2023
TransNeXt: Robust Foveal Visual Perception for Vision Transformers
TransNeXt: Robust Foveal Visual Perception for Vision Transformers
Dai Shi
ViT
23
76
0
28 Nov 2023
Addressing the Length Bias Problem in Document-Level Neural Machine
  Translation
Addressing the Length Bias Problem in Document-Level Neural Machine Translation
Zhuocheng Zhang
Shuhao Gu
Min Zhang
Yang Feng
23
0
0
20 Nov 2023
The Transient Nature of Emergent In-Context Learning in Transformers
The Transient Nature of Emergent In-Context Learning in Transformers
Aaditya K. Singh
Stephanie C. Y. Chan
Ted Moskovitz
Erin Grant
Andrew M. Saxe
Felix Hill
67
32
0
14 Nov 2023
A Survey on Hallucination in Large Language Models: Principles,
  Taxonomy, Challenges, and Open Questions
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
Lei Huang
Weijiang Yu
Weitao Ma
Weihong Zhong
Zhangyin Feng
...
Qianglong Chen
Weihua Peng
Xiaocheng Feng
Bing Qin
Ting Liu
LRM
HILM
39
722
0
09 Nov 2023
What Formal Languages Can Transformers Express? A Survey
What Formal Languages Can Transformers Express? A Survey
Lena Strobl
William Merrill
Gail Weiss
David Chiang
Dana Angluin
AI4CE
20
48
0
01 Nov 2023
Pushdown Layers: Encoding Recursive Structure in Transformer Language
  Models
Pushdown Layers: Encoding Recursive Structure in Transformer Language Models
Shikhar Murty
Pratyusha Sharma
Jacob Andreas
Christopher D. Manning
AI4CE
46
13
0
29 Oct 2023
Unraveling Feature Extraction Mechanisms in Neural Networks
Unraveling Feature Extraction Mechanisms in Neural Networks
Xiaobing Sun
Jiaxi Li
Wei Lu
18
0
0
25 Oct 2023
What Algorithms can Transformers Learn? A Study in Length Generalization
What Algorithms can Transformers Learn? A Study in Length Generalization
Hattie Zhou
Arwen Bradley
Etai Littwin
Noam Razin
Omid Saremi
Josh Susskind
Samy Bengio
Preetum Nakkiran
34
110
0
24 Oct 2023
12
Next