Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.09397
Cited By
Fine-grained Attention I/O Complexity: Comprehensive Analysis for Backward Passes
12 October 2024
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao Song
Yufa Zhou
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Fine-grained Attention I/O Complexity: Comprehensive Analysis for Backward Passes"
34 / 34 papers shown
Title
SLICK: Selective Localization and Instance Calibration for Knowledge-Enhanced Car Damage Segmentation in Automotive Insurance
Teerapong Panboonyuen
105
0
0
12 Jun 2025
Fast Gradient Computation for RoPE Attention in Almost Linear Time
Yifang Chen
Jiayan Huo
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao Song
122
14
0
03 Jan 2025
HSR-Enhanced Sparse Attention Acceleration
Bo Chen
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao Song
203
22
0
14 Oct 2024
Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer Models
Jerry Yao-Chieh Hu
Maojiang Su
En-Jui Kuo
Zhao Song
Han Liu
53
27
0
05 Jun 2024
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Tri Dao
Albert Gu
Mamba
108
513
0
31 May 2024
Linearizing Large Language Models
Jean Mercat
Igor Vasiljevic
Sedrick Scott Keh
Kushal Arora
Achal Dave
Adrien Gaidon
Thomas Kollar
93
24
0
10 May 2024
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Hanshi Sun
Zhuoming Chen
Xinyu Yang
Yuandong Tian
Beidi Chen
91
61
0
18 Apr 2024
Nonparametric Modern Hopfield Models
Jerry Yao-Chieh Hu
Bo-Yu Chen
Dennis Wu
Feng Ruan
Han Liu
50
17
0
05 Apr 2024
The Fine-Grained Complexity of Gradient Computation for Training Large Language Models
Josh Alman
Zhao Song
51
14
0
07 Feb 2024
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry
Michael Zhang
Kush S. Bhatia
Hermann Kumbong
Christopher Ré
66
54
0
06 Feb 2024
The Expressive Power of Low-Rank Adaptation
Yuchen Zeng
Kangwook Lee
96
62
0
26 Oct 2023
The Rise and Potential of Large Language Model Based Agents: A Survey
Zhiheng Xi
Wenxiang Chen
Xin Guo
Wei He
Yiwen Ding
...
Wenjuan Qin
Yongyan Zheng
Xipeng Qiu
Xuanjing Huan
Tao Gui
LM&MA
LM&Ro
3DV
AI4CE
110
934
0
14 Sep 2023
LongCoder: A Long-Range Pre-trained Language Model for Code Completion
Daya Guo
Canwen Xu
Nan Duan
Jian Yin
Julian McAuley
61
88
0
26 Jun 2023
Faith and Fate: Limits of Transformers on Compositionality
Nouha Dziri
Ximing Lu
Melanie Sclar
Xiang Lorraine Li
Liwei Jian
...
Sean Welleck
Xiang Ren
Allyson Ettinger
Zaïd Harchaoui
Yejin Choi
ReLM
LRM
138
377
0
29 May 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
1.4K
14,359
0
15 Mar 2023
Streaming Kernel PCA Algorithm With Small Space
Yichuan Deng
Zhao Song
Zifan Wang
Hangke Zhang
72
4
0
08 Mar 2023
Near Optimal Memory-Regret Tradeoff for Online Learning
Binghui Peng
A. Rubinstein
CLL
62
10
0
03 Mar 2023
Fast Attention Requires Bounded Entries
Josh Alman
Zhao Song
74
85
0
26 Feb 2023
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
316
516
0
24 Sep 2022
Online Prediction in Sub-linear Space
Binghui Peng
Fred Zhang
64
16
0
16 Jul 2022
Strong Memory Lower Bounds for Learning Natural Models
Gavin Brown
Mark Bun
Adam D. Smith
68
12
0
09 Jun 2022
Memory Bounds for the Experts Problem
Vaidehi Srinivas
David P. Woodruff
Ziyu Xu
Samson Zhou
33
19
0
21 Apr 2022
Long Time No See! Open-Domain Conversation with Long-Term Persona Memory
Xinchao Xu
Zhibin Gou
Wenquan Wu
Zheng-Yu Niu
Hua Wu
Haifeng Wang
Shihang Wang
RALM
66
114
0
11 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
817
9,576
0
28 Jan 2022
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRL
AI4TS
AI4CE
ALM
AIMat
477
10,367
0
17 Jun 2021
Bounded Memory Active Learning through Enriched Queries
Max Hopkins
D. Kane
Shachar Lovett
Michal Moshkovitz
20
7
0
09 Feb 2021
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Xiang Lisa Li
Percy Liang
246
4,261
0
01 Jan 2021
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
546
2,086
0
28 Jul 2020
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
174
4,071
0
10 Apr 2020
Sparse Sinkhorn Attention
Yi Tay
Dara Bahri
Liu Yang
Donald Metzler
Da-Cheng Juan
86
340
0
26 Feb 2020
Estimating Entropy of Distributions in Constant Space
Jayadev Acharya
Sourbh Bhadane
Piotr Indyk
Ziteng Sun
55
11
0
18 Nov 2019
CamemBERT: a Tasty French Language Model
Louis Martin
Benjamin Muller
Pedro Ortiz Suarez
Yoann Dupont
Laurent Romary
Eric Villemonte de la Clergerie
Djamé Seddah
Benoît Sagot
107
974
0
10 Nov 2019
Adaptive Attention Span in Transformers
Sainbayar Sukhbaatar
Edouard Grave
Piotr Bojanowski
Armand Joulin
76
285
0
19 May 2019
Generating Long Sequences with Sparse Transformers
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
129
1,899
0
23 Apr 2019
1