Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.14135
Cited By
v1
v2 (latest)
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"
50 / 1,508 papers shown
Title
Attention Alignment and Flexible Positional Embeddings Improve Transformer Length Extrapolation
Ta-Chung Chi
Ting-Han Fan
Alexander I. Rudnicky
39
4
0
01 Nov 2023
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
Sanchit Gandhi
Patrick von Platen
Alexander M. Rush
VLM
100
64
0
01 Nov 2023
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
Ruihang Lai
Junru Shao
Siyuan Feng
Steven Lyubomirsky
Bohan Hou
...
Sunghyun Park
Prakalp Srivastava
Jared Roesch
T. Mowry
Tianqi Chen
109
11
0
01 Nov 2023
ChipNeMo: Domain-Adapted LLMs for Chip Design
Mingjie Liu
Teodor-Dumitru Ene
Robert M. Kirby
Chris Cheng
N. Pinckney
...
Pratik P Suthar
Varun Tej
Walker J. Turner
Kaizhe Xu
Haoxin Ren
176
164
0
31 Oct 2023
HyPE: Attention with Hyperbolic Biases for Relative Positional Encoding
Giorgio Angelotti
52
0
0
30 Oct 2023
Skywork: A More Open Bilingual Foundation Model
Tianwen Wei
Liang Zhao
Lichang Zhang
Bo Zhu
Lijie Wang
...
Yongyi Peng
Xiaojuan Liang
Shuicheng Yan
Han Fang
Yahui Zhou
93
102
0
30 Oct 2023
Punica: Multi-Tenant LoRA Serving
Lequn Chen
Zihao Ye
Yongji Wu
Danyang Zhuo
Luis Ceze
Arvind Krishnamurthy
90
35
0
28 Oct 2023
FP8-LM: Training FP8 Large Language Models
Houwen Peng
Kan Wu
Yixuan Wei
Guoshuai Zhao
Yuxiang Yang
...
Zheng Zhang
Shuguang Liu
Joe Chau
Han Hu
Peng Cheng
MQ
111
45
0
27 Oct 2023
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
Zichang Liu
Jue Wang
Tri Dao
Dinesh Manocha
Binhang Yuan
...
Anshumali Shrivastava
Ce Zhang
Yuandong Tian
Christopher Ré
Beidi Chen
BDL
123
221
0
26 Oct 2023
Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution
Aaron Lou
Chenlin Meng
Stefano Ermon
DiffM
100
116
0
25 Oct 2023
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images
Aaron Gokaslan
A. Feder Cooper
Jasmine Collins
Landan Seguin
Austin Jacobson
Mihir Patel
Jonathan Frankle
Cory Stephenson
Volodymyr Kuleshov
DiffM
89
18
0
25 Oct 2023
Learning Generalizable Program and Architecture Representations for Performance Modeling
Lingda Li
T. Flynn
A. Hoisie
57
2
0
25 Oct 2023
CLEX: Continuous Length Extrapolation for Large Language Models
Guanzheng Chen
Xin Li
Zaiqiao Meng
Shangsong Liang
Li Bing
102
32
0
25 Oct 2023
MindLLM: Pre-training Lightweight Large Language Model from Scratch, Evaluations and Domain Applications
Yizhe Yang
Huashan Sun
Jiawei Li
Runheng Liu
Yinghao Li
Yuhang Liu
Heyan Huang
Yang Gao
ALM
LRM
43
10
0
24 Oct 2023
Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive Survey and Evaluation
Yinjie Lei
Zixuan Wang
Feng Chen
Guoqing Wang
Peng Wang
Yang Yang
95
12
0
24 Oct 2023
How Much Context Does My Attention-Based ASR System Need?
Robert Flynn
Anton Ragni
63
2
0
24 Oct 2023
Simple Hardware-Efficient PCFGs with Independent Left and Right Productions
Wei Liu
Aaron Courville
Yoon Kim
Kewei Tu
111
2
0
23 Oct 2023
Large Search Model: Redefining Search Stack in the Era of LLMs
Liang Wang
Nan Yang
Xiaolong Huang
Linjun Yang
Rangan Majumder
Furu Wei
LRM
KELM
96
15
0
23 Oct 2023
Teaching Language Models to Self-Improve through Interactive Demonstrations
Xiao Yu
Baolin Peng
Michel Galley
Jianfeng Gao
Zhou Yu
LRM
ReLM
104
22
0
20 Oct 2023
Character-level Chinese Backpack Language Models
Hao Sun
John Hewitt
59
0
0
19 Oct 2023
Efficient Long-Range Transformers: You Need to Attend More, but Not Necessarily at Every Layer
Qingru Zhang
Dhananjay Ram
Cole Hawkins
Sheng Zha
Tuo Zhao
105
16
0
19 Oct 2023
CLARA: Multilingual Contrastive Learning for Audio Representation Acquisition
K. A. Noriy
Xiaosong Yang
Marcin Budka
Jian Jun Zhang
VLM
77
3
0
18 Oct 2023
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Akari Asai
Zeqiu Wu
Yizhong Wang
Avirup Sil
Hannaneh Hajishirzi
RALM
281
783
0
17 Oct 2023
BitNet: Scaling 1-bit Transformers for Large Language Models
Hongyu Wang
Shuming Ma
Li Dong
Shaohan Huang
Huaijie Wang
Lingxiao Ma
Fan Yang
Ruiping Wang
Yi Wu
Furu Wei
MQ
78
119
0
17 Oct 2023
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
MoE
110
20
0
16 Oct 2023
In-context Pretraining: Language Modeling Beyond Document Boundaries
Weijia Shi
Sewon Min
Maria Lomeli
Chunting Zhou
Margaret Li
...
Victoria Lin
Noah A. Smith
Luke Zettlemoyer
Scott Yih
Mike Lewis
LRM
RALM
SyDa
135
56
0
16 Oct 2023
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
Ziniu Li
Tian Xu
Yushun Zhang
Zhihang Lin
Yang Yu
Ruoyu Sun
Zhimin Luo
137
79
0
16 Oct 2023
An Interpretable Deep-Learning Framework for Predicting Hospital Readmissions From Electronic Health Records
Fabio Azzalini
Tommaso Dolci
Marco Vagaggini
OOD
181
1
0
16 Oct 2023
AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents
Jake Grigsby
Linxi Fan
Yuke Zhu
OffRL
LM&Ro
121
33
0
15 Oct 2023
QUIK: Towards End-to-End 4-Bit Inference on Generative Large Language Models
Saleh Ashkboos
Ilia Markov
Elias Frantar
Tingxuan Zhong
Xincheng Wang
Jie Ren
Torsten Hoefler
Dan Alistarh
MQ
SyDa
188
22
0
13 Oct 2023
Pit One Against Many: Leveraging Attention-head Embeddings for Parameter-efficient Multi-head Attention
Huiyin Xue
Nikolaos Aletras
102
0
0
11 Oct 2023
Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models
Raphael Tang
Xinyu Crystina Zhang
Xueguang Ma
Jimmy Lin
Ferhan Ture
LRM
114
18
0
11 Oct 2023
MatFormer: Nested Transformer for Elastic Inference
Devvrit
Sneha Kudugunta
Aditya Kusupati
Tim Dettmers
Kaifeng Chen
...
Yulia Tsvetkov
Hannaneh Hajishirzi
Sham Kakade
Ali Farhadi
Prateek Jain
114
31
0
11 Oct 2023
CacheGen: KV Cache Compression and Streaming for Fast Language Model Serving
Yuhan Liu
Hanchen Li
Yihua Cheng
Siddhant Ray
Yuyang Huang
...
Ganesh Ananthanarayanan
Michael Maire
Henry Hoffmann
Ari Holtzman
Junchen Jiang
128
53
0
11 Oct 2023
Sparse Fine-tuning for Inference Acceleration of Large Language Models
Eldar Kurtic
Denis Kuznedelev
Elias Frantar
Michael Goin
Dan Alistarh
69
13
0
10 Oct 2023
Mistral 7B
Albert Q. Jiang
Alexandre Sablayrolles
A. Mensch
Chris Bamford
Devendra Singh Chaplot
...
Teven Le Scao
Thibaut Lavril
Thomas Wang
Timothée Lacroix
William El Sayed
MoE
LRM
154
2,261
0
10 Oct 2023
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Carlos E. Jimenez
John Yang
Alexander Wettig
Shunyu Yao
Kexin Pei
Ofir Press
Karthik Narasimhan
ELM
141
647
0
10 Oct 2023
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
Mengzhou Xia
Tianyu Gao
Zhiyuan Zeng
Danqi Chen
125
310
0
10 Oct 2023
Latent Diffusion Counterfactual Explanations
Karim Farid
Simon Schrodi
Max Argus
Thomas Brox
DiffM
99
14
0
10 Oct 2023
iTransformer: Inverted Transformers Are Effective for Time Series Forecasting
Yong Liu
Tengge Hu
Haoran Zhang
Haixu Wu
Shiyu Wang
Lintao Ma
Mingsheng Long
AI4TS
167
565
0
10 Oct 2023
Humans and language models diverge when predicting repeating text
Aditya R. Vaidya
Javier S. Turek
Alexander G. Huth
69
6
0
10 Oct 2023
CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model
Peng Di
Jianguo Li
Hang Yu
Wei Jiang
Wenting Cai
...
Zelin Zhao
Xunjin Zheng
Hailian Zhou
Lifu Zhu
Xianying Zhu
ELM
ALM
AI4CE
100
14
0
10 Oct 2023
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Lijun Yu
José Lezama
N. B. Gundavarapu
Luca Versari
Kihyuk Sohn
...
Boqing Gong
Ming-Hsuan Yang
Irfan Essa
David A. Ross
Lu Jiang
137
325
0
09 Oct 2023
Generative Judge for Evaluating Alignment
Junlong Li
Shichao Sun
Weizhe Yuan
Run-Ze Fan
Hai Zhao
Pengfei Liu
ELM
ALM
112
91
0
09 Oct 2023
Scaling Laws of RoPE-based Extrapolation
Xiaoran Liu
Hang Yan
Shuo Zhang
Chen An
Xipeng Qiu
Dahua Lin
89
89
0
08 Oct 2023
Counter Turing Test CT^2: AI-Generated Text Detection is Not as Easy as You May Think -- Introducing AI Detectability Index
Megha Chakraborty
S.M. Towhidul Islam Tonmoy
S. M. Mehedi
Krish Sharma
Niyar R. Barman
...
Tanay Kumar
Vinija Jain
Aman Chadha
Amit P. Sheth
Amitava Das
DeLMO
82
21
0
08 Oct 2023
Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading
Howard Chen
Ramakanth Pasunuru
Jason Weston
Asli Celikyilmaz
RALM
146
86
0
08 Oct 2023
The Troubling Emergence of Hallucination in Large Language Models -- An Extensive Definition, Quantification, and Prescriptive Remediations
Vipula Rawte
Swagata Chakraborty
Agnibh Pathak
Anubhav Sarkar
S.M. Towhidul Islam Tonmoy
Aman Chadha
Mikel Artetxe
Punit Daniel Simig
HILM
94
131
0
08 Oct 2023
Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM
Luoming Zhang
Wen Fei
Weijia Wu
Yefei He
Zhenyu Lou
Hong Zhou
MQ
64
5
0
07 Oct 2023
DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies
Shuaiwen Leon Song
Bonnie Kruft
Minjia Zhang
Conglong Li
Shiyang Chen
...
Arash Vahdat
Chaowei Xiao
Thomas Gibbs
Anima Anandkumar
R. Stevens
112
14
0
06 Oct 2023
Previous
1
2
3
...
25
26
27
...
29
30
31
Next