Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.14135
Cited By
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"
50 / 1,439 papers shown
Title
Fourier Controller Networks for Real-Time Decision-Making in Embodied Learning
Hengkai Tan
Songming Liu
Kai Ma
Chengyang Ying
Xingxing Zhang
Hang Su
Jun Zhu
51
2
0
30 May 2024
SparseDrive: End-to-End Autonomous Driving via Sparse Scene Representation
Wenchao Sun
Xuewu Lin
Yining Shi
Chuang Zhang
Haoran Wu
Sifa Zheng
56
27
0
30 May 2024
Conveyor: Efficient Tool-aware LLM Serving with Tool Partial Execution
Yechen Xu
Xinhao Kong
Tingjun Chen
Danyang Zhuo
LLMAG
40
3
0
29 May 2024
Understanding and Minimising Outlier Features in Neural Network Training
Bobby He
Lorenzo Noci
Daniele Paliotta
Imanol Schlag
Thomas Hofmann
49
3
0
29 May 2024
Spatio-Spectral Graph Neural Networks
Simon Geisler
Arthur Kosmala
Daniel Herbst
Stephan Günnemann
58
8
0
29 May 2024
Kestrel: Point Grounding Multimodal LLM for Part-Aware 3D Vision-Language Understanding
Junjie Fei
Mahmoud Ahmed
Jian Ding
Eslam Mohamed Bakr
Mohamed Elhoseiny
49
3
0
29 May 2024
Wavelet-Based Image Tokenizer for Vision Transformers
Zhenhai Zhu
Radu Soricut
ViT
57
3
0
28 May 2024
ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention
Bencheng Liao
Xinggang Wang
Lianghui Zhu
Qian Zhang
Chang Huang
62
4
0
28 May 2024
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
Alexander Hägele
Elie Bakouch
Atli Kosson
Loubna Ben Allal
Leandro von Werra
Martin Jaggi
43
36
0
28 May 2024
ATM: Adversarial Tuning Multi-agent System Makes a Robust Retrieval-Augmented Generator
Junda Zhu
Lingyong Yan
Haibo Shi
Dawei Yin
Lei Sha
RALM
44
6
0
28 May 2024
Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention
Zhen Qin
Weigao Sun
Dong Li
Xuyang Shen
Weixuan Sun
Yiran Zhong
46
9
0
27 May 2024
SWAT: Scalable and Efficient Window Attention-based Transformers Acceleration on FPGAs
Zhenyu Bai
Pranav Dangi
Huize Li
Tulika Mitra
36
7
0
27 May 2024
CARL: A Framework for Equivariant Image Registration
Hastings Greer
Lin Tian
François-Xavier Vialard
Roland Kwitt
R. Estépar
Marc Niethammer
3DPC
MedIm
40
0
0
27 May 2024
Disentangling and Integrating Relational and Sensory Information in Transformer Architectures
Awni Altabaa
John Lafferty
42
3
0
26 May 2024
On Bits and Bandits: Quantifying the Regret-Information Trade-off
Itai Shufaro
Nadav Merlis
Nir Weinberger
Shie Mannor
40
0
0
26 May 2024
Accelerating Transformers with Spectrum-Preserving Token Merging
Hoai-Chau Tran
D. M. Nguyen
Duy M. Nguyen
Trung Thanh Nguyen
Ngan Le
Pengtao Xie
Daniel Sonntag
James Y. Zou
Binh T. Nguyen
Mathias Niepert
57
8
0
25 May 2024
MoEUT: Mixture-of-Experts Universal Transformers
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
Christopher Potts
Christopher D. Manning
MoE
47
6
0
25 May 2024
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Byung-Kwan Lee
Chae Won Kim
Beomchan Park
Yonghyun Ro
MLLM
LRM
64
19
0
24 May 2024
Pipeline Parallelism with Controllable Memory
Penghui Qi
Xinyi Wan
Nyamdavaa Amar
Min Lin
30
6
0
24 May 2024
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training
Wenyu Du
Tongxu Luo
Zihan Qiu
Zeyu Huang
Songlin Yang
Reynold Cheng
Yike Guo
Jie Fu
43
11
0
24 May 2024
Emergence of a High-Dimensional Abstraction Phase in Language Transformers
Emily Cheng
Diego Doimo
Corentin Kervadec
Iuri Macocco
Jade Yu
Alessandro Laio
Marco Baroni
112
11
0
24 May 2024
QA-MDT: Quality-aware Masked Diffusion Transformer for Enhanced Music Generation
Chang Li
Ruoyu Wang
Lijuan Liu
Jun Du
Yixuan Sun
Zilu Guo
Zhenrong Zhang
Yuan Jiang
J. Gao
Feng Ma
46
0
0
24 May 2024
Scalable Optimization in the Modular Norm
Tim Large
Yang Liu
Minyoung Huh
Hyojin Bahng
Phillip Isola
Jeremy Bernstein
59
13
0
23 May 2024
Lorentz-Equivariant Geometric Algebra Transformers for High-Energy Physics
Jonas Spinner
Victor Bresó
P. D. Haan
Tilman Plehn
Jesse Thaler
Johann Brehmer
AI4CE
44
16
0
23 May 2024
Integer Scale: A Free Lunch for Faster Fine-grained Quantization of LLMs
Qingyuan Li
Ran Meng
Yiduo Li
Bo Zhang
Yifan Lu
Yerui Sun
Lin Ma
Yuchen Xie
MQ
43
0
0
23 May 2024
Base of RoPE Bounds Context Length
Xin Men
Mingyu Xu
Bingning Wang
Qingyu Zhang
Hongyu Lin
Xianpei Han
Weipeng Chen
47
20
0
23 May 2024
MiniCache: KV Cache Compression in Depth Dimension for Large Language Models
Akide Liu
Jing Liu
Zizheng Pan
Yefei He
Gholamreza Haffari
Bohan Zhuang
MQ
37
30
0
23 May 2024
ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification
Yefei He
Luoming Zhang
Weijia Wu
Jing Liu
Hong Zhou
Bohan Zhuang
MQ
56
25
0
23 May 2024
Leveraging 2D Information for Long-term Time Series Forecasting with Vanilla Transformers
Xin Cheng
Preslav Nakov
Shuqi Li
Di Luo
Xun Wang
Dongyan Zhao
Rui Yan
AI4TS
74
1
0
22 May 2024
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
William Brandon
Mayank Mishra
Aniruddha Nrusimha
Yikang Shen
Jonathan Ragan-Kelley
MQ
52
45
0
21 May 2024
RecGPT: Generative Pre-training for Text-based Recommendation
Hoang Ngo
Dat Quoc Nguyen
LRM
56
5
0
21 May 2024
OLAPH: Improving Factuality in Biomedical Long-form Question Answering
Minbyul Jeong
Hyeon Hwang
Chanwoong Yoon
Taewhoo Lee
Jaewoo Kang
MedIm
HILM
LM&MA
56
12
0
21 May 2024
The CAP Principle for LLM Serving: A Survey of Long-Context Large Language Model Serving
Pai Zeng
Zhenyu Ning
Jieru Zhao
Weihao Cui
Mengwei Xu
Liwei Guo
Xusheng Chen
Yizhou Shan
LLMAG
53
4
0
18 May 2024
The Future of Large Language Model Pre-training is Federated
Lorenzo Sani
Alexandru Iacob
Zeyu Cao
Bill Marino
Yan Gao
...
Wanru Zhao
William F. Shen
Preslav Aleksandrov
Xinchi Qiu
Nicholas D. Lane
AI4CE
44
14
0
17 May 2024
Layer-Condensed KV Cache for Efficient Inference of Large Language Models
Haoyi Wu
Kewei Tu
MQ
53
19
0
17 May 2024
Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers
Rya Sanovar
Srikant Bharadwaj
Renée St. Amant
Victor Rühle
Saravan Rajmohan
73
6
0
17 May 2024
CAT3D: Create Anything in 3D with Multi-View Diffusion Models
Ruiqi Gao
Aleksander Holynski
Philipp Henzler
Arthur Brussee
Ricardo Martín Brualla
Pratul P. Srinivasan
Jonathan T. Barron
Ben Poole
48
153
0
16 May 2024
Libra: Building Decoupled Vision System on Large Language Models
Yifan Xu
Xiaoshan Yang
Y. Song
Changsheng Xu
MLLM
VLM
43
8
0
16 May 2024
An Embarrassingly Simple Approach to Enhance Transformer Performance in Genomic Selection for Crop Breeding
Renqi Chen
Wenwei Han
Haohao Zhang
Haoyang Su
Zhefan Wang
Xiaolei Liu
Hao Jiang
Wanli Ouyang
Nanqing Dong
22
0
0
15 May 2024
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Zhimin Li
Jianwei Zhang
Qin Lin
Jiangfeng Xiong
Yanxin Long
...
Wei Liu
Dingyong Wang
Yong Yang
Jie Jiang
Qinglin Lu
ViT
61
96
0
14 May 2024
Investigating Design Choices in Joint-Embedding Predictive Architectures for General Audio Representation Learning
Alain Riou
Stefan Lattner
Gaëtan Hadjeres
Geoffroy Peeters
45
2
0
14 May 2024
Improving Transformers with Dynamically Composable Multi-Head Attention
Da Xiao
Qingye Meng
Shengping Li
Xingyuan Yuan
40
3
0
14 May 2024
Self-Distillation Improves DNA Sequence Inference
Tong Yu
Lei Cheng
Ruslan Khalitov
Erland Brandser Olsson
Zhirong Yang
SyDa
48
0
0
14 May 2024
VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling
Siyuan Li
Zedong Wang
Zicheng Liu
Di Wu
Cheng Tan
Jiangbin Zheng
Yufei Huang
Stan Z. Li
45
7
0
13 May 2024
USP: A Unified Sequence Parallelism Approach for Long Context Generative AI
Jiarui Fang
Shangchun Zhao
40
15
0
13 May 2024
EMS-SD: Efficient Multi-sample Speculative Decoding for Accelerating Large Language Models
Yunsheng Ni
Chuanjian Liu
Yehui Tang
Kai Han
Yunhe Wang
36
0
0
13 May 2024
SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts
R. Prabhakar
R. Sivaramakrishnan
Darshan Gandhi
Yun Du
Mingran Wang
...
Urmish Thakker
Dawei Huang
Sumti Jairath
Kevin J. Brown
K. Olukotun
MoE
46
12
0
13 May 2024
DEPTH: Discourse Education through Pre-Training Hierarchically
Zachary Bamberger
Ofek Glick
Chaim Baskin
Yonatan Belinkov
67
0
0
13 May 2024
CaFA: Global Weather Forecasting with Factorized Attention on Sphere
Zijie Li
Anthony Zhou
Saurabh Patil
A. Farimani
50
6
0
12 May 2024
Smurfs: Leveraging Multiple Proficiency Agents with Context-Efficiency for Tool Planning
Junzhi Chen
Juhao Liang
Benyou Wang
LLMAG
30
3
0
09 May 2024
Previous
1
2
3
...
14
15
16
...
27
28
29
Next