ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.14135
  4. Cited By
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
    VLM
ArXivPDFHTML

Papers citing "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"

50 / 1,436 papers shown
Title
Skywork: A More Open Bilingual Foundation Model
Skywork: A More Open Bilingual Foundation Model
Tianwen Wei
Liang Zhao
Lichang Zhang
Bo Zhu
Lijie Wang
...
Yongyi Peng
Xiaojuan Liang
Shuicheng Yan
Han Fang
Yahui Zhou
38
93
0
30 Oct 2023
Punica: Multi-Tenant LoRA Serving
Punica: Multi-Tenant LoRA Serving
Lequn Chen
Zihao Ye
Yongji Wu
Danyang Zhuo
Luis Ceze
Arvind Krishnamurthy
44
34
0
28 Oct 2023
FP8-LM: Training FP8 Large Language Models
FP8-LM: Training FP8 Large Language Models
Houwen Peng
Kan Wu
Yixuan Wei
Guoshuai Zhao
Yuxiang Yang
...
Zheng-Wei Zhang
Shuguang Liu
Joe Chau
Han Hu
Peng Cheng
MQ
59
40
0
27 Oct 2023
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
Zichang Liu
Jue Wang
Tri Dao
Dinesh Manocha
Binhang Yuan
...
Anshumali Shrivastava
Ce Zhang
Yuandong Tian
Christopher Ré
Beidi Chen
BDL
25
192
0
26 Oct 2023
Discrete Diffusion Modeling by Estimating the Ratios of the Data
  Distribution
Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution
Aaron Lou
Chenlin Meng
Stefano Ermon
DiffM
33
68
0
25 Oct 2023
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons
  Images
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images
Aaron Gokaslan
A. Feder Cooper
Jasmine Collins
Landan Seguin
Austin Jacobson
Mihir Patel
Jonathan Frankle
Cory Stephenson
Volodymyr Kuleshov
DiffM
17
16
0
25 Oct 2023
Learning Generalizable Program and Architecture Representations for
  Performance Modeling
Learning Generalizable Program and Architecture Representations for Performance Modeling
Lingda Li
T. Flynn
A. Hoisie
32
1
0
25 Oct 2023
CLEX: Continuous Length Extrapolation for Large Language Models
CLEX: Continuous Length Extrapolation for Large Language Models
Guanzheng Chen
Xin Li
Zaiqiao Meng
Shangsong Liang
Li Bing
32
29
0
25 Oct 2023
MindLLM: Pre-training Lightweight Large Language Model from Scratch,
  Evaluations and Domain Applications
MindLLM: Pre-training Lightweight Large Language Model from Scratch, Evaluations and Domain Applications
Yizhe Yang
Huashan Sun
Jiawei Li
Runheng Liu
Yinghao Li
Yuhang Liu
Heyan Huang
Yang Gao
ALM
LRM
16
8
0
24 Oct 2023
Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive
  Survey and Evaluation
Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive Survey and Evaluation
Yinjie Lei
Zixuan Wang
Feng Chen
Guoqing Wang
Peng Wang
Yang Yang
37
10
0
24 Oct 2023
How Much Context Does My Attention-Based ASR System Need?
How Much Context Does My Attention-Based ASR System Need?
Robert Flynn
Anton Ragni
32
1
0
24 Oct 2023
Simple Hardware-Efficient PCFGs with Independent Left and Right
  Productions
Simple Hardware-Efficient PCFGs with Independent Left and Right Productions
Wei Liu
Aaron Courville
Yoon Kim
Kewei Tu
48
1
0
23 Oct 2023
Large Search Model: Redefining Search Stack in the Era of LLMs
Large Search Model: Redefining Search Stack in the Era of LLMs
Liang Wang
Nan Yang
Xiaolong Huang
Linjun Yang
Rangan Majumder
Furu Wei
LRM
KELM
45
13
0
23 Oct 2023
Teaching Language Models to Self-Improve through Interactive
  Demonstrations
Teaching Language Models to Self-Improve through Interactive Demonstrations
Xiao Yu
Baolin Peng
Michel Galley
Jianfeng Gao
Zhou Yu
LRM
ReLM
38
20
0
20 Oct 2023
Character-level Chinese Backpack Language Models
Character-level Chinese Backpack Language Models
Hao Sun
John Hewitt
27
0
0
19 Oct 2023
Efficient Long-Range Transformers: You Need to Attend More, but Not
  Necessarily at Every Layer
Efficient Long-Range Transformers: You Need to Attend More, but Not Necessarily at Every Layer
Qingru Zhang
Dhananjay Ram
Cole Hawkins
Sheng Zha
Tuo Zhao
35
15
0
19 Oct 2023
CLARA: Multilingual Contrastive Learning for Audio Representation
  Acquisition
CLARA: Multilingual Contrastive Learning for Audio Representation Acquisition
K. A. Noriy
Xiaosong Yang
Marcin Budka
Jian Jun Zhang
VLM
26
3
0
18 Oct 2023
Self-RAG: Learning to Retrieve, Generate, and Critique through
  Self-Reflection
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Akari Asai
Zeqiu Wu
Yizhong Wang
Avirup Sil
Hannaneh Hajishirzi
RALM
176
647
0
17 Oct 2023
BitNet: Scaling 1-bit Transformers for Large Language Models
BitNet: Scaling 1-bit Transformers for Large Language Models
Hongyu Wang
Shuming Ma
Li Dong
Shaohan Huang
Huaijie Wang
Lingxiao Ma
Fan Yang
Ruiping Wang
Yi Wu
Furu Wei
MQ
34
98
0
17 Oct 2023
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
MoE
27
18
0
16 Oct 2023
In-context Pretraining: Language Modeling Beyond Document Boundaries
In-context Pretraining: Language Modeling Beyond Document Boundaries
Weijia Shi
Sewon Min
Maria Lomeli
Chunting Zhou
Margaret Li
...
Victoria Lin
Noah A. Smith
Luke Zettlemoyer
Scott Yih
Mike Lewis
LRM
RALM
SyDa
34
48
0
16 Oct 2023
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method
  for Aligning Large Language Models
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
Ziniu Li
Tian Xu
Yushun Zhang
Zhihang Lin
Yang Yu
Ruoyu Sun
Zhimin Luo
27
52
0
16 Oct 2023
AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents
AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents
Jake Grigsby
Linxi Fan
Yuke Zhu
OffRL
LM&Ro
38
10
0
15 Oct 2023
QUIK: Towards End-to-End 4-Bit Inference on Generative Large Language
  Models
QUIK: Towards End-to-End 4-Bit Inference on Generative Large Language Models
Saleh Ashkboos
Ilia Markov
Elias Frantar
Tingxuan Zhong
Xincheng Wang
Jie Ren
Torsten Hoefler
Dan Alistarh
MQ
SyDa
126
22
0
13 Oct 2023
Pit One Against Many: Leveraging Attention-head Embeddings for
  Parameter-efficient Multi-head Attention
Pit One Against Many: Leveraging Attention-head Embeddings for Parameter-efficient Multi-head Attention
Huiyin Xue
Nikolaos Aletras
45
0
0
11 Oct 2023
Found in the Middle: Permutation Self-Consistency Improves Listwise
  Ranking in Large Language Models
Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models
Raphael Tang
Xinyu Crystina Zhang
Xueguang Ma
Jimmy Lin
Ferhan Ture
LRM
39
15
0
11 Oct 2023
MatFormer: Nested Transformer for Elastic Inference
MatFormer: Nested Transformer for Elastic Inference
Devvrit
Sneha Kudugunta
Aditya Kusupati
Tim Dettmers
Kaifeng Chen
...
Yulia Tsvetkov
Hannaneh Hajishirzi
Sham Kakade
Ali Farhadi
Prateek Jain
39
23
0
11 Oct 2023
CacheGen: KV Cache Compression and Streaming for Fast Language Model
  Serving
CacheGen: KV Cache Compression and Streaming for Fast Language Model Serving
Yuhan Liu
Hanchen Li
Yihua Cheng
Siddhant Ray
Yuyang Huang
...
Ganesh Ananthanarayanan
Michael Maire
Henry Hoffmann
Ari Holtzman
Junchen Jiang
50
42
0
11 Oct 2023
Sparse Fine-tuning for Inference Acceleration of Large Language Models
Sparse Fine-tuning for Inference Acceleration of Large Language Models
Eldar Kurtic
Denis Kuznedelev
Elias Frantar
Michael Goin
Dan Alistarh
35
8
0
10 Oct 2023
Mistral 7B
Mistral 7B
Albert Q. Jiang
Alexandre Sablayrolles
A. Mensch
Chris Bamford
Devendra Singh Chaplot
...
Teven Le Scao
Thibaut Lavril
Thomas Wang
Timothée Lacroix
William El Sayed
MoE
LRM
23
2,014
0
10 Oct 2023
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Carlos E. Jimenez
John Yang
Alexander Wettig
Shunyu Yao
Kexin Pei
Ofir Press
Karthik R. Narasimhan
ELM
34
473
0
10 Oct 2023
Sheared LLaMA: Accelerating Language Model Pre-training via Structured
  Pruning
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
Mengzhou Xia
Tianyu Gao
Zhiyuan Zeng
Danqi Chen
40
268
0
10 Oct 2023
Latent Diffusion Counterfactual Explanations
Latent Diffusion Counterfactual Explanations
Karim Farid
Simon Schrodi
Max Argus
Thomas Brox
DiffM
46
12
0
10 Oct 2023
iTransformer: Inverted Transformers Are Effective for Time Series
  Forecasting
iTransformer: Inverted Transformers Are Effective for Time Series Forecasting
Yong Liu
Tengge Hu
Haoran Zhang
Haixu Wu
Shiyu Wang
Lintao Ma
Mingsheng Long
AI4TS
24
447
0
10 Oct 2023
Humans and language models diverge when predicting repeating text
Humans and language models diverge when predicting repeating text
Aditya R. Vaidya
Javier S. Turek
Alexander G. Huth
25
6
0
10 Oct 2023
CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model
CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model
Peng Di
Jianguo Li
Hang Yu
Wei Jiang
Wenting Cai
...
Zelin Zhao
Xunjin Zheng
Hailian Zhou
Lifu Zhu
Xianying Zhu
ELM
ALM
AI4CE
35
12
0
10 Oct 2023
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Lijun Yu
José Lezama
N. B. Gundavarapu
Luca Versari
Kihyuk Sohn
...
Boqing Gong
Ming-Hsuan Yang
Irfan Essa
David A. Ross
Lu Jiang
32
284
0
09 Oct 2023
Generative Judge for Evaluating Alignment
Generative Judge for Evaluating Alignment
Junlong Li
Shichao Sun
Weizhe Yuan
Run-Ze Fan
Hai Zhao
Pengfei Liu
ELM
ALM
35
79
0
09 Oct 2023
Scaling Laws of RoPE-based Extrapolation
Scaling Laws of RoPE-based Extrapolation
Xiaoran Liu
Hang Yan
Shuo Zhang
Chen An
Xipeng Qiu
Dahua Lin
23
83
0
08 Oct 2023
Counter Turing Test CT^2: AI-Generated Text Detection is Not as Easy as
  You May Think -- Introducing AI Detectability Index
Counter Turing Test CT^2: AI-Generated Text Detection is Not as Easy as You May Think -- Introducing AI Detectability Index
Megha Chakraborty
S.M. Towhidul Islam Tonmoy
S. M. Mehedi
Krish Sharma
Niyar R. Barman
...
Tanay Kumar
Vinija Jain
Aman Chadha
Amit P. Sheth
Amitava Das
DeLMO
22
21
0
08 Oct 2023
Walking Down the Memory Maze: Beyond Context Limit through Interactive
  Reading
Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading
Howard Chen
Ramakanth Pasunuru
Jason Weston
Asli Celikyilmaz
RALM
68
73
0
08 Oct 2023
The Troubling Emergence of Hallucination in Large Language Models -- An
  Extensive Definition, Quantification, and Prescriptive Remediations
The Troubling Emergence of Hallucination in Large Language Models -- An Extensive Definition, Quantification, and Prescriptive Remediations
Vipula Rawte
Swagata Chakraborty
Agnibh Pathak
Anubhav Sarkar
S.M. Towhidul Islam Tonmoy
Aman Chadha
Mikel Artetxe
Punit Daniel Simig
HILM
43
119
0
08 Oct 2023
Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM
Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM
Luoming Zhang
Wen Fei
Weijia Wu
Yefei He
Zhenyu Lou
Hong Zhou
MQ
25
5
0
07 Oct 2023
DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery
  through Sophisticated AI System Technologies
DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies
Shuaiwen Leon Song
Bonnie Kruft
Minjia Zhang
Conglong Li
Shiyang Chen
...
Arash Vahdat
Chaowei Xiao
Thomas Gibbs
Anima Anandkumar
R. Stevens
48
13
0
06 Oct 2023
A Comprehensive Performance Study of Large Language Models on Novel AI
  Accelerators
A Comprehensive Performance Study of Large Language Models on Novel AI Accelerators
M. Emani
Sam Foreman
Varuni K. Sastry
Zhen Xie
Siddhisanket Raskar
William Arnold
R. Thakur
V. Vishwanath
M. Papka
ELM
32
10
0
06 Oct 2023
How to Capture Higher-order Correlations? Generalizing Matrix Softmax
  Attention to Kronecker Computation
How to Capture Higher-order Correlations? Generalizing Matrix Softmax Attention to Kronecker Computation
Josh Alman
Zhao Song
38
32
0
06 Oct 2023
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical
  Reasoning
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning
Ke Wang
Houxing Ren
Aojun Zhou
Zimu Lu
Sichun Luo
Weikang Shi
Renrui Zhang
Linqi Song
Mingjie Zhan
Hongsheng Li
ReLM
LRM
SyDa
30
95
0
05 Oct 2023
DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context
  LLMs Training
DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs Training
Dacheng Li
Rulin Shao
Anze Xie
Eric P. Xing
Xuezhe Ma
Ion Stoica
Joseph E. Gonzalez
Hao Zhang
32
18
0
05 Oct 2023
Retrieval meets Long Context Large Language Models
Retrieval meets Long Context Large Language Models
Peng Xu
Ming-Yu Liu
Xianchao Wu
Lawrence C. McAfee
Chen Zhu
Zihan Liu
Sandeep Subramanian
Evelina Bakhturina
M. Shoeybi
Bryan Catanzaro
RALM
LRM
14
82
0
04 Oct 2023
Never Train from Scratch: Fair Comparison of Long-Sequence Models
  Requires Data-Driven Priors
Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors
Ido Amos
Jonathan Berant
Ankit Gupta
33
25
0
04 Oct 2023
Previous
123...232425...272829
Next