Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.14135
Cited By
v1
v2 (latest)
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"
50 / 1,510 papers shown
Title
Extended Mind Transformers
Phoebe Klett
Thomas Ahle
RALM
39
0
0
04 Jun 2024
A Study of Optimizations for Fine-tuning Large Language Models
Arjun Singh
Nikhil Pandey
Anup Shirgaonkar
Pavan Manoj
Vijay Aski
56
5
0
04 Jun 2024
Mamba as Decision Maker: Exploring Multi-scale Sequence Modeling in Offline Reinforcement Learning
Jiahang Cao
Qiang Zhang
Ziqing Wang
Jiaxu Wang
Hao Cheng
Yecheng Shao
Wen Zhao
Gang Han
Yijie Guo
Renjing Xu
Mamba
97
2
0
04 Jun 2024
GRAM: Generative Retrieval Augmented Matching of Data Schemas in the Context of Data Security
Xuanqing Liu
Luyang Kong
Runhui Wang
Patrick Song
Austin Nevins
Henrik Johnson
Nimish Amlathe
Davor Golac
71
3
0
04 Jun 2024
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
Tianchen Zhao
Tongcheng Fang
Haofeng Huang
Enshu Liu
Widyadewi Soedarmadji
...
Shengen Yan
Huazhong Yang
Xuefei Ning
Xuefei Ning
Yu Wang
MQ
VGen
193
35
0
04 Jun 2024
Sparsity-Accelerated Training for Large Language Models
Da Ma
Lu Chen
Pengyu Wang
Hongshen Xu
Hanqi Li
Liangtai Sun
Su Zhu
Shuai Fan
Kai Yu
LRM
58
1
0
03 Jun 2024
DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion
Yilong Chen
Linhao Zhang
Junyuan Shang
Zhenyu Zhang
Tingwen Liu
Shuohuan Wang
Yu Sun
69
1
0
03 Jun 2024
Achieving Sparse Activation in Small Language Models
Jifeng Song
Kai Huang
Xiangyu Yin
Boyuan Yang
Wei Gao
89
4
0
03 Jun 2024
LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models
Liang Zhao
Tianwen Wei
Liang Zeng
Cheng Cheng
Liu Yang
...
Yimeng Gan
Rui Hu
Shuicheng Yan
Han Fang
Yahui Zhou
LLMAG
SyDa
121
11
0
02 Jun 2024
GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models
Zicheng Liu
Jiahui Li
Siyuan Li
Z. Zang
Cheng Tan
Yufei Huang
Yajing Bai
Stan Z. Li
ELM
61
9
0
01 Jun 2024
Recurrent neural networks: vanishing and exploding gradients are not the end of the story
Nicolas Zucchet
Antonio Orvieto
ODL
AAML
114
17
0
31 May 2024
Sharing Key Semantics in Transformer Makes Efficient Image Restoration
Bin Ren
Yawei Li
Christos Sakaridis
Rakesh Ranjan
Mengyuan Liu
Rita Cucchiara
Luc Van Gool
Ming-Hsuan Yang
N. Sebe
115
7
0
30 May 2024
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable
Chaofan Lin
Zhenhua Han
Chengruidong Zhang
Yuqing Yang
Fan Yang
Chen Chen
Lili Qiu
135
46
0
30 May 2024
Fourier Controller Networks for Real-Time Decision-Making in Embodied Learning
Hengkai Tan
Songming Liu
Kai Ma
Chengyang Ying
Xingxing Zhang
Hang Su
Jun Zhu
91
3
0
30 May 2024
SparseDrive: End-to-End Autonomous Driving via Sparse Scene Representation
Wenchao Sun
Xuewu Lin
Yining Shi
Chuang Zhang
Haoran Wu
Sifa Zheng
124
41
0
30 May 2024
Conveyor: Efficient Tool-aware LLM Serving with Tool Partial Execution
Yechen Xu
Xinhao Kong
Tingjun Chen
Danyang Zhuo
LLMAG
84
4
0
29 May 2024
Understanding and Minimising Outlier Features in Neural Network Training
Bobby He
Lorenzo Noci
Daniele Paliotta
Imanol Schlag
Thomas Hofmann
91
4
0
29 May 2024
Spatio-Spectral Graph Neural Networks
Simon Geisler
Arthur Kosmala
Daniel Herbst
Stephan Günnemann
121
9
0
29 May 2024
Kestrel: Point Grounding Multimodal LLM for Part-Aware 3D Vision-Language Understanding
Junjie Fei
Mahmoud Ahmed
Jian Ding
Eslam Mohamed Bakr
Mohamed Elhoseiny
67
4
0
29 May 2024
Wavelet-Based Image Tokenizer for Vision Transformers
Zhenhai Zhu
Radu Soricut
ViT
102
5
0
28 May 2024
ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention
Bencheng Liao
Xinggang Wang
Lianghui Zhu
Qian Zhang
Chang Huang
121
4
0
28 May 2024
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
Alexander Hägele
Elie Bakouch
Atli Kosson
Loubna Ben Allal
Leandro von Werra
Martin Jaggi
125
45
0
28 May 2024
ATM: Adversarial Tuning Multi-agent System Makes a Robust Retrieval-Augmented Generator
Junda Zhu
Lingyong Yan
Haibo Shi
D. Yin
Lei Sha
RALM
80
8
0
28 May 2024
Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention
Zhen Qin
Weigao Sun
Dong Li
Xuyang Shen
Weixuan Sun
Yiran Zhong
76
12
0
27 May 2024
SWAT: Scalable and Efficient Window Attention-based Transformers Acceleration on FPGAs
Zhenyu Bai
Pranav Dangi
Huize Li
Tulika Mitra
84
9
0
27 May 2024
CARL: A Framework for Equivariant Image Registration
Hastings Greer
Lin Tian
François-Xavier Vialard
Roland Kwitt
R. Estépar
Marc Niethammer
3DPC
MedIm
166
0
0
27 May 2024
On Bits and Bandits: Quantifying the Regret-Information Trade-off
Itai Shufaro
Nadav Merlis
Nir Weinberger
Shie Mannor
199
0
0
26 May 2024
Disentangling and Integrating Relational and Sensory Information in Transformer Architectures
Awni Altabaa
John Lafferty
59
3
0
26 May 2024
Accelerating Transformers with Spectrum-Preserving Token Merging
Hoai-Chau Tran
D. M. Nguyen
Duy M. Nguyen
Trung Thanh Nguyen
Ngan Le
Pengtao Xie
Daniel Sonntag
James Y. Zou
Binh T. Nguyen
Mathias Niepert
106
13
0
25 May 2024
MoEUT: Mixture-of-Experts Universal Transformers
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
Christopher Potts
Christopher D. Manning
MoE
88
11
0
25 May 2024
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Byung-Kwan Lee
Chae Won Kim
Beomchan Park
Yonghyun Ro
MLLM
LRM
145
21
0
24 May 2024
Pipeline Parallelism with Controllable Memory
Penghui Qi
Xinyi Wan
Nyamdavaa Amar
Min Lin
72
6
0
24 May 2024
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training
Wenyu Du
Tongxu Luo
Zihan Qiu
Zeyu Huang
Songlin Yang
Reynold Cheng
Yike Guo
Jie Fu
82
15
0
24 May 2024
Quality-aware Masked Diffusion Transformer for Enhanced Music Generation
Chang Li
Ruoyu Wang
Lijuan Liu
Jun Du
Yixuan Sun
Zilu Guo
Zhenrong Zhang
Yuan Jiang
J. Gao
Feng Ma
119
5
0
24 May 2024
Emergence of a High-Dimensional Abstraction Phase in Language Transformers
Emily Cheng
Diego Doimo
Corentin Kervadec
Iuri Macocco
Jade Yu
Alessandro Laio
Marco Baroni
209
16
0
24 May 2024
Scalable Optimization in the Modular Norm
Tim Large
Yang Liu
Minyoung Huh
Hyojin Bahng
Phillip Isola
Jeremy Bernstein
90
16
0
23 May 2024
Lorentz-Equivariant Geometric Algebra Transformers for High-Energy Physics
Jonas Spinner
Victor Bresó
P. D. Haan
Tilman Plehn
Jesse Thaler
Johann Brehmer
AI4CE
96
19
0
23 May 2024
Integer Scale: A Free Lunch for Faster Fine-grained Quantization of LLMs
Qingyuan Li
Ran Meng
Yiduo Li
Bo Zhang
Yifan Lu
Yerui Sun
Lin Ma
Yuchen Xie
MQ
104
0
0
23 May 2024
Base of RoPE Bounds Context Length
Xin Men
Mingyu Xu
Bingning Wang
Qingyu Zhang
Hongyu Lin
Xianpei Han
Weipeng Chen
101
26
0
23 May 2024
MiniCache: KV Cache Compression in Depth Dimension for Large Language Models
Akide Liu
Jing Liu
Zizheng Pan
Yefei He
Gholamreza Haffari
Bohan Zhuang
MQ
91
37
0
23 May 2024
ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification
Yefei He
Luoming Zhang
Weijia Wu
Jing Liu
Hong Zhou
Bohan Zhuang
MQ
118
34
0
23 May 2024
Leveraging 2D Information for Long-term Time Series Forecasting with Vanilla Transformers
Xin Cheng
Preslav Nakov
Shuqi Li
Di Luo
Xun Wang
Dongyan Zhao
Rui Yan
AI4TS
94
2
0
22 May 2024
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
William Brandon
Mayank Mishra
Aniruddha Nrusimha
Yikang Shen
Jonathan Ragan-Kelley
MQ
118
52
0
21 May 2024
RecGPT: Generative Pre-training for Text-based Recommendation
Hoang Ngo
Dat Quoc Nguyen
LRM
70
5
0
21 May 2024
OLAPH: Improving Factuality in Biomedical Long-form Question Answering
Minbyul Jeong
Hyeon Hwang
Chanwoong Yoon
Taewhoo Lee
Jaewoo Kang
MedIm
HILM
LM&MA
123
12
0
21 May 2024
The CAP Principle for LLM Serving: A Survey of Long-Context Large Language Model Serving
Pai Zeng
Zhenyu Ning
Jieru Zhao
Weihao Cui
Mengwei Xu
Liwei Guo
Xusheng Chen
Yizhou Shan
LLMAG
95
4
0
18 May 2024
The Future of Large Language Model Pre-training is Federated
Lorenzo Sani
Alexandru Iacob
Zeyu Cao
Bill Marino
Yan Gao
...
Wanru Zhao
William F. Shen
Preslav Aleksandrov
Xinchi Qiu
Nicholas D. Lane
AI4CE
161
21
0
17 May 2024
Layer-Condensed KV Cache for Efficient Inference of Large Language Models
Haoyi Wu
Kewei Tu
MQ
130
19
0
17 May 2024
Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers
Rya Sanovar
Srikant Bharadwaj
Renée St. Amant
Victor Rühle
Saravan Rajmohan
164
7
0
17 May 2024
CAT3D: Create Anything in 3D with Multi-View Diffusion Models
Ruiqi Gao
Aleksander Holynski
Philipp Henzler
Arthur Brussee
Ricardo Martín Brualla
Pratul P. Srinivasan
Jonathan T. Barron
Ben Poole
150
179
0
16 May 2024
Previous
1
2
3
...
16
17
18
...
29
30
31
Next