Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.14135
Cited By
v1
v2 (latest)
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"
50 / 1,508 papers shown
Title
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLM
MLLM
277
1,216
0
21 Dec 2023
VideoPoet: A Large Language Model for Zero-Shot Video Generation
Dan Kondratyuk
Lijun Yu
Xiuye Gu
José Lezama
Jonathan Huang
...
Irfan Essa
Huisheng Wang
David A. Ross
Bryan Seybold
Lu Jiang
VGen
152
273
0
21 Dec 2023
Neural Point Cloud Diffusion for Disentangled 3D Shape and Appearance Generation
Philipp Schroppel
Christopher Wewer
J. E. Lenssen
Eddy Ilg
Thomas Brox
68
9
0
21 Dec 2023
Exploring Multimodal Large Language Models for Radiology Report Error-checking
Jinge Wu
Yunsoo Kim
Eva C. Keller
Jamie Chow
Adam P. Levine
Nikolas Pontikos
Zina M. Ibrahim
Paul Taylor
Michelle C. Williams
Honghan Wu
LM&MA
43
4
0
20 Dec 2023
Language Resources for Dutch Large Language Modelling
Bram Vanroy
MoE
ALM
57
9
0
20 Dec 2023
Lookahead: An Inference Acceleration Framework for Large Language Model with Lossless Generation Accuracy
Yao-Min Zhao
Zhitian Xie
Chen Liang
Chenyi Zhuang
Jinjie Gu
146
14
0
20 Dec 2023
Optimizing Distributed Training on Frontier for Large Language Models
Sajal Dash
Isaac Lyngaas
Junqi Yin
Xiao Wang
Romain Egele
Guojing Cong
Feiyi Wang
Prasanna Balaprakash
ALM
MoE
180
16
0
20 Dec 2023
A Case Study in CUDA Kernel Fusion: Implementing FlashAttention-2 on NVIDIA Hopper Architecture using the CUTLASS Library
Ganesh Bikshandi
Jay Shah
55
7
0
19 Dec 2023
Text-Conditioned Resampler For Long Form Video Understanding
Bruno Korbar
Yongqin Xian
A. Tonioni
Andrew Zisserman
Federico Tombari
108
12
0
19 Dec 2023
Efficient LLM inference solution on Intel GPU
Hui Wu
Yi Gan
Feng Yuan
Jing Ma
Wei Zhu
...
Hong Zhu
Yuhua Zhu
Xiaoli Liu
Jinghui Gu
Peng Zhao
63
3
0
19 Dec 2023
A Heterogeneous Chiplet Architecture for Accelerating End-to-End Transformer Models
Harsh Sharma
Pratyush Dhingra
J. Doppa
Ümit Y. Ogras
P. Pande
83
7
0
18 Dec 2023
Efficiency-oriented approaches for self-supervised speech representation learning
Luis Lugo
Valentin Vielzeuf
SSL
57
1
0
18 Dec 2023
Linear Attention via Orthogonal Memory
Jun Zhang
Shuyang Jiang
Jiangtao Feng
Lin Zheng
Lingpeng Kong
91
3
0
18 Dec 2023
SPT: Fine-Tuning Transformer-based Language Models Efficiently with Sparsification
Yuntao Gui
Xiao Yan
Peiqi Yin
Han Yang
James Cheng
89
2
0
16 Dec 2023
Extending Context Window of Large Language Models via Semantic Compression
Weizhi Fei
Xueyan Niu
Pingyi Zhou
Lu Hou
Bo Bai
Lei Deng
Wei Han
81
28
0
15 Dec 2023
Marathon: A Race Through the Realm of Long Context with Large Language Models
Lei Zhang
Yunshui Li
Ziqiang Liu
Jiaxi Yang
Junhao Liu
Longze Chen
Run Luo
Min Yang
OffRL
LRM
79
6
0
15 Dec 2023
Context-PEFT: Efficient Multi-Modal, Multi-Task Fine-Tuning
Avelina Asada Hadji-Kyriacou
Ognjen Arandjelović
60
0
0
14 Dec 2023
Motion Flow Matching for Human Motion Synthesis and Editing
Vincent Tao Hu
Wenzhe Yin
Pingchuan Ma
Yunlu Chen
Basura Fernando
Yuki M. Asano
E. Gavves
Pascal Mettes
Bjorn Ommer
Cees G. M. Snoek
DiffM
93
20
0
14 Dec 2023
TigerBot: An Open Multilingual Multitask LLM
Ye Chen
Wei Cai
Liangming Wu
Xiaowei Li
Zhanxuan Xin
Cong Fu
271
11
0
14 Dec 2023
Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention
Kaiqiang Song
Xiaoyang Wang
Sangwoo Cho
Xiaoman Pan
Dong Yu
88
7
0
14 Dec 2023
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Róbert Csordás
Piotr Piekos
Kazuki Irie
Jürgen Schmidhuber
MoE
55
16
0
13 Dec 2023
On a Foundation Model for Operating Systems
Divyanshu Saxena
Nihal Sharma
Donghyun Kim
Rohit Dwivedula
Jiayi Chen
...
Alex Dimakis
P. B. Godfrey
Daehyeok Kim
Chris Rossbach
Gang Wang
61
2
0
13 Dec 2023
SGLang: Efficient Execution of Structured Language Model Programs
Lianmin Zheng
Liangsheng Yin
Zhiqiang Xie
Chuyue Sun
Jeff Huang
...
Christos Kozyrakis
Ion Stoica
Joseph E. Gonzalez
Clark W. Barrett
Ying Sheng
LRM
137
174
0
12 Dec 2023
DYAD: A Descriptive Yet Abjuring Density efficient approximation to linear neural network layers
S. Chandy
Varun Gangal
Yi Yang
Gabriel Maggiotti
59
0
0
11 Dec 2023
Gated Linear Attention Transformers with Hardware-Efficient Training
Aaron Courville
Bailin Wang
Songlin Yang
Yikang Shen
Yoon Kim
124
180
0
11 Dec 2023
DiT-Head: High-Resolution Talking Head Synthesis using Diffusion Transformers
Aaron Mir
Eduardo Alonso
Esther Mondragón
DiffM
93
2
0
11 Dec 2023
Audio-Visual LLM for Video Understanding
Fangxun Shu
Lei Zhang
Hao Jiang
Cihang Xie
VLM
MLLM
74
44
0
11 Dec 2023
ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models
Zhihang Yuan
Yuzhang Shang
Yue Song
Qiang Wu
Yan Yan
Guangyu Sun
MQ
124
61
0
10 Dec 2023
Batched Low-Rank Adaptation of Foundation Models
Yeming Wen
Swarat Chaudhuri
OffRL
100
21
0
09 Dec 2023
Stateful Large Language Model Serving with Pensieve
Lingfan Yu
Jinyang Li
RALM
KELM
LLMAG
80
14
0
09 Dec 2023
ESPN: Memory-Efficient Multi-Vector Information Retrieval
Susav Shrestha
Narasimha Reddy
Zongwang Li
71
7
0
09 Dec 2023
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
Yanxi Chen
Xuchen Pan
Yaliang Li
Bolin Ding
Jingren Zhou
LRM
101
33
0
08 Dec 2023
Trajeglish: Traffic Modeling as Next-Token Prediction
Jonah Philion
Xue Bin Peng
Sanja Fidler
39
25
0
07 Dec 2023
A Hardware Evaluation Framework for Large Language Model Inference
Hengrui Zhang
August Ning
R. Prabhakar
D. Wentzlaff
ELM
84
18
0
05 Dec 2023
Rank-without-GPT: Building GPT-Independent Listwise Rerankers on Open-Source Large Language Models
Xinyu Crystina Zhang
Sebastian Hofstatter
Patrick Lewis
Raphael Tang
Jimmy J. Lin
LRM
KELM
ELM
RALM
ALM
86
8
0
05 Dec 2023
Decoding Data Quality via Synthetic Corruptions: Embedding-guided Pruning of Code Data
Yu Yang
Aaditya K. Singh
Mostafa Elhoushi
Anas Mahmoud
Kushal Tirumala
Fabian Gloeckle
Baptiste Rozière
Carole-Jean Wu
Ari S. Morcos
Newsha Ardalani
AAML
SyDa
96
10
0
05 Dec 2023
Efficient Online Data Mixing For Language Model Pre-Training
Alon Albalak
Liangming Pan
Colin Raffel
Wenjie Wang
99
46
0
05 Dec 2023
The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning
Bill Yuchen Lin
Abhilasha Ravichander
Ximing Lu
Nouha Dziri
Melanie Sclar
Khyathi Chandu
Chandra Bhagavatula
Yejin Choi
68
199
0
04 Dec 2023
Recurrent Distance Filtering for Graph Representation Learning
Yuhui Ding
Antonio Orvieto
Bobby He
Thomas Hofmann
GNN
136
8
0
03 Dec 2023
TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long Documents
James Enouen
Hootan Nakhost
Sayna Ebrahimi
Sercan O. Arik
Yan Liu
Tomas Pfister
98
7
0
03 Dec 2023
Token Fusion: Bridging the Gap between Token Pruning and Token Merging
Minchul Kim
Shangqian Gao
Yen-Chang Hsu
Yilin Shen
Hongxia Jin
92
41
0
02 Dec 2023
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
Tianyu Ding
Tianyi Chen
Haidong Zhu
Jiachen Jiang
Yiqi Zhong
Jinxin Zhou
Guangzhi Wang
Zhihui Zhu
Ilya Zharkov
Luming Liang
121
24
0
01 Dec 2023
Nonparametric Variational Regularisation of Pretrained Transformers
Fabio Fehr
James Henderson
85
0
0
01 Dec 2023
CoLLiE: Collaborative Training of Large Language Models in an Efficient Way
Kai Lv
Shuo Zhang
Tianle Gu
Shuhao Xing
Jiawei Hong
...
Tengxiao Liu
Yu Sun
Penousal Machado
Hang Yan
Xipeng Qiu
87
7
0
01 Dec 2023
Splitwise: Efficient generative LLM inference using phase splitting
Pratyush Patel
Esha Choukse
Chaojie Zhang
Aashaka Shah
Íñigo Goiri
Saeed Maleki
Ricardo Bianchini
96
247
0
30 Nov 2023
HOT: Higher-Order Dynamic Graph Representation Learning with Efficient Transformers
Maciej Besta
Afonso Claudino Catarino
Lukas Gianinazzi
Nils Blach
Piotr Nyczyk
H. Niewiadomski
Torsten Hoefler
116
8
0
30 Nov 2023
Perceptual Group Tokenizer: Building Perception with Iterative Grouping
Zhiwei Deng
Ting Chen
Yang Li
ViT
VLM
75
2
0
30 Nov 2023
Diffusion Models Without Attention
Jing Nathan Yan
Jiatao Gu
Alexander M. Rush
105
69
0
30 Nov 2023
End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames
Shuming Liu
Chen-Da Liu-Zhang
Chen Zhao
Guohao Li
119
29
0
28 Nov 2023
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
...
Jilan Xu
Guo Chen
Ping Luo
Limin Wang
Yu Qiao
VLM
MLLM
166
507
0
28 Nov 2023
Previous
1
2
3
...
23
24
25
...
29
30
31
Next