Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.02301
Cited By
Sumformer: Universal Approximation for Efficient Transformers
5 July 2023
Silas Alberti
Niclas Dern
L. Thesing
Gitta Kutyniok
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Sumformer: Universal Approximation for Efficient Transformers"
15 / 15 papers shown
Title
CAKE: Cascading and Adaptive KV Cache Eviction with Layer Preferences
Ziran Qin
Yuchen Cao
Mingbao Lin
Wen Hu
Shixuan Fan
Ke Cheng
Weiyao Lin
Jianguo Li
71
3
0
16 Mar 2025
Exact Sequence Classification with Hardmax Transformers
Albert Alcalde
Giovanni Fantuzzi
Enrique Zuazua
77
1
0
04 Feb 2025
How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs
Guhao Feng
Kai-Bo Yang
Yuntian Gu
Xinyue Ai
Shengjie Luo
Jiacheng Sun
Di He
ZeLin Li
Liwei Wang
LRM
37
6
0
17 Oct 2024
Identification of Mean-Field Dynamics using Transformers
Shiba Biswal
Karthik Elamvazhuthi
Rishi Sonthalia
AI4CE
27
1
0
06 Oct 2024
Transformers are Universal In-context Learners
Takashi Furuya
Maarten V. de Hoop
Gabriel Peyré
42
6
0
02 Aug 2024
Relating the Seemingly Unrelated: Principled Understanding of Generalization for Generative Models in Arithmetic Reasoning Tasks
Xingcheng Xu
Zibo Zhao
Haipeng Zhang
Yanqing Yang
LRM
42
0
0
25 Jul 2024
A Survey on Universal Approximation Theorems
Midhun T. Augustine
31
4
0
17 Jul 2024
Clustering in pure-attention hardmax transformers and its role in sentiment analysis
Albert Alcalde
Giovanni Fantuzzi
Enrique Zuazua
32
3
0
26 Jun 2024
How Out-of-Distribution Detection Learning Theory Enhances Transformer: Learnability and Reliability
Yijin Zhou
Yuguang Wang
Xiaowen Dong
Yuguang Wang
48
0
0
13 Jun 2024
On the Theoretical Expressive Power and the Design Space of Higher-Order Graph Transformers
Cai Zhou
Rose Yu
Yusu Wang
41
7
0
04 Apr 2024
DPOT: Auto-Regressive Denoising Operator Transformer for Large-Scale PDE Pre-Training
Zhongkai Hao
Chang Su
Songming Liu
Julius Berner
Chengyang Ying
Hang Su
A. Anandkumar
Jian Song
Jun Zhu
AI4TS
AI4CE
26
22
0
06 Mar 2024
Prompting a Pretrained Transformer Can Be a Universal Approximator
Aleksandar Petrov
Philip H. S. Torr
Adel Bibi
26
11
0
22 Feb 2024
Do Efficient Transformers Really Save Computation?
Kai-Bo Yang
Jan Ackermann
Zhenyu He
Guhao Feng
Bohang Zhang
Yunzhen Feng
Qiwei Ye
Di He
Liwei Wang
39
17
0
21 Feb 2024
A mathematical perspective on Transformers
Borjan Geshkovski
Cyril Letrouit
Yury Polyanskiy
Philippe Rigollet
EDL
AI4CE
40
36
0
17 Dec 2023
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
282
2,015
0
28 Jul 2020
1