ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.14135
  4. Cited By
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
    VLM
ArXivPDFHTML

Papers citing "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"

50 / 1,438 papers shown
Title
Knowledge Fusion of Large Language Models
Knowledge Fusion of Large Language Models
Fanqi Wan
Xinting Huang
Deng Cai
Xiaojun Quan
Wei Bi
Shuming Shi
MoMe
42
63
0
19 Jan 2024
Reconstructing the Invisible: Video Frame Restoration through Siamese
  Masked Conditional Variational Autoencoder
Reconstructing the Invisible: Video Frame Restoration through Siamese Masked Conditional Variational Autoencoder
Yongchen Zhou
Richard Jiang
24
0
0
18 Jan 2024
Beyond Traditional Benchmarks: Analyzing Behaviors of Open LLMs on
  Data-to-Text Generation
Beyond Traditional Benchmarks: Analyzing Behaviors of Open LLMs on Data-to-Text Generation
Zdeněk Kasner
Ondrej Dusek
33
8
0
18 Jan 2024
Towards Principled Graph Transformers
Towards Principled Graph Transformers
Luis Muller
Daniel Kusuma
Blai Bonet
Christopher Morris
33
3
0
18 Jan 2024
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized
  Large Language Model Serving
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving
Yinmin Zhong
Shengyu Liu
Junda Chen
Jianbo Hu
Yibo Zhu
Xuanzhe Liu
Xin Jin
Hao Zhang
44
179
0
18 Jan 2024
Computing in the Era of Large Generative Models: From Cloud-Native to
  AI-Native
Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native
Yao Lu
Song Bian
Lequn Chen
Yongjun He
Yulong Hui
...
Huanchen Zhang
Minjia Zhang
Qizhen Zhang
Tianyi Zhou
Danyang Zhuo
37
7
0
17 Jan 2024
InternEvo: Efficient Long-sequence Large Language Model Training via
  Hybrid Parallelism and Redundant Sharding
InternEvo: Efficient Long-sequence Large Language Model Training via Hybrid Parallelism and Redundant Sharding
Qiaoling Chen
Diandian Gu
Guoteng Wang
Xun Chen
Yingtong Xiong
...
Qi Hu
Xin Jin
Yonggang Wen
Tianwei Zhang
Peng Sun
57
8
0
17 Jan 2024
GMLake: Efficient and Transparent GPU Memory Defragmentation for
  Large-scale DNN Training with Virtual Memory Stitching
GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching
Cong Guo
Rui Zhang
Jiale Xu
Jingwen Leng
Zihan Liu
...
Minyi Guo
Hao Wu
Shouren Zhao
Junping Zhao
Ke Zhang
VLM
86
10
0
16 Jan 2024
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
Zongxin Yang
Guikun Chen
Xiaodi Li
Wenguan Wang
Yi Yang
LM&Ro
LLMAG
69
35
0
16 Jan 2024
The What, Why, and How of Context Length Extension Techniques in Large
  Language Models -- A Detailed Survey
The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey
Saurav Pawar
S.M. Towhidul Islam Tonmoy
S. M. M. Zaman
Vinija Jain
Aman Chadha
Amitava Das
39
28
0
15 Jan 2024
Unlocking Efficiency in Large Language Model Inference: A Comprehensive
  Survey of Speculative Decoding
Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding
Heming Xia
Zhe Yang
Qingxiu Dong
Peiyi Wang
Yongqi Li
Tao Ge
Tianyu Liu
Wenjie Li
Zhifang Sui
LRM
38
105
0
15 Jan 2024
Flexibly Scaling Large Language Models Contexts Through Extensible
  Tokenization
Flexibly Scaling Large Language Models Contexts Through Extensible Tokenization
Ninglu Shao
Shitao Xiao
Zheng Liu
Peitian Zhang
36
4
0
15 Jan 2024
Extending LLMs' Context Window with 100 Samples
Extending LLMs' Context Window with 100 Samples
Yikai Zhang
Junlong Li
Pengfei Liu
37
11
0
13 Jan 2024
DocFinQA: A Long-Context Financial Reasoning Dataset
DocFinQA: A Long-Context Financial Reasoning Dataset
Varshini Reddy
Rik Koncel-Kedziorski
Viet Dac Lai
Michael Krumdick
Charles Lovering
Chris Tanner
RALM
35
16
0
12 Jan 2024
APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding
APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding
Mingdao Liu
Aohan Zeng
Bowen Wang
Peng Zhang
Jie Tang
Yuxiao Dong
72
8
0
12 Jan 2024
INTERS: Unlocking the Power of Large Language Models in Search with
  Instruction Tuning
INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning
Yutao Zhu
Peitian Zhang
Chenghao Zhang
Yifei Chen
Binyu Xie
Zheng Liu
Ji-Rong Wen
Zhicheng Dou
21
15
0
12 Jan 2024
xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering
  the Language of Protein
xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein
Bo Chen
Xingyi Cheng
Pan Li
Yangli-ao Geng
Jing Gong
...
Chiming Liu
Aohan Zeng
Yuxiao Dong
Jie Tang
Leo T. Song
42
102
0
11 Jan 2024
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator
  for Vision Applications
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
Yuwen Xiong
Zhiqi Li
Yuntao Chen
Feng Wang
Xizhou Zhu
...
Hongsheng Li
Yu Qiao
Lewei Lu
Jie Zhou
Jifeng Dai
36
51
0
11 Jan 2024
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence
  Lengths in Large Language Models
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Zhen Qin
Weigao Sun
Dong Li
Xuyang Shen
Weixuan Sun
Yiran Zhong
72
22
0
09 Jan 2024
Masked Audio Generation using a Single Non-Autoregressive Transformer
Masked Audio Generation using a Single Non-Autoregressive Transformer
Alon Ziv
Itai Gat
Gaël Le Lan
Tal Remez
Felix Kreuk
Alexandre Défossez
Jade Copet
Gabriel Synnaeve
Yossi Adi
54
36
0
09 Jan 2024
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and
  DeepSpeed-Inference
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference
Connor Holmes
Masahiro Tanaka
Michael Wyatt
A. A. Awan
Jeff Rasley
...
Reza Yazdani Aminabadi
Heyang Qin
Arash Bakhtiari
Lev Kurilenko
Yuxiong He
27
63
0
09 Jan 2024
TeleChat Technical Report
TeleChat Technical Report
Zhongjiang He
Zihan Wang
Xinzhan Liu
Shixuan Liu
Yitong Yao
...
Zilu Huang
Sishi Xiong
Yuxiang Zhang
Chao Wang
Shuangyong Song
AI4MH
LRM
ALM
66
3
0
08 Jan 2024
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DeepSeek-AI Xiao Bi
:
Xiao Bi
Deli Chen
Guanting Chen
...
Yao Zhao
Shangyan Zhou
Shunfeng Zhou
Qihao Zhu
Yuheng Zou
LRM
ALM
139
314
0
05 Jan 2024
Introducing Bode: A Fine-Tuned Large Language Model for Portuguese
  Prompt-Based Task
Introducing Bode: A Fine-Tuned Large Language Model for Portuguese Prompt-Based Task
Gabriel Lino Garcia
P. H. Paiola
Luis Henrique Morelli
Giovani Candido
Arnaldo Cândido Júnior
D. Jodas
Luis C. S. Afonso
I. R. Guilherme
B. Penteado
João Paulo Papa
29
11
0
05 Jan 2024
Large Language Models in Plant Biology
Large Language Models in Plant Biology
H. Lam
Xing Er Ong
Marek Mutwil
11
16
0
05 Jan 2024
A Cost-Efficient FPGA Implementation of Tiny Transformer Model using
  Neural ODE
A Cost-Efficient FPGA Implementation of Tiny Transformer Model using Neural ODE
Ikumi Okubo
Keisuke Sugiura
Hiroki Matsutani
36
2
0
05 Jan 2024
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention
  and Distributed KVCache
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Bin Lin
Chen Zhang
Tao Peng
Hanyu Zhao
Wencong Xiao
...
Shen Li
Zhigang Ji
Tao Xie
Yong Li
Wei Lin
52
48
0
05 Jan 2024
AST-T5: Structure-Aware Pretraining for Code Generation and
  Understanding
AST-T5: Structure-Aware Pretraining for Code Generation and Understanding
Linyuan Gong
Mostafa Elhoushi
Alvin Cheung
34
14
0
05 Jan 2024
Re-evaluating the Memory-balanced Pipeline Parallelism: BPipe
Re-evaluating the Memory-balanced Pipeline Parallelism: BPipe
Mincong Huang
Chao Wang
Chi Ma
Yineng Zhang
Peng Zhang
Lei Yu
33
1
0
04 Jan 2024
Understanding LLMs: A Comprehensive Overview from Training to Inference
Understanding LLMs: A Comprehensive Overview from Training to Inference
Yi-Hsueh Liu
Haoyang He
Tianle Han
Xu-Yao Zhang
Mengyuan Liu
...
Xintao Hu
Tuo Zhang
Ning Qiang
Tianming Liu
Bao Ge
SyDa
35
65
0
04 Jan 2024
IoT in the Era of Generative AI: Vision and Challenges
IoT in the Era of Generative AI: Vision and Challenges
Xin Wang
Zhongwei Wan
Arvin Hekmati
M. Zong
Samiul Alam
Mi Zhang
Bhaskar Krishnamachari
32
15
0
03 Jan 2024
Transformer Neural Autoregressive Flows
Transformer Neural Autoregressive Flows
Massimiliano Patacchiola
Aliaksandra Shysheya
Katja Hofmann
Richard Turner
TPM
14
1
0
03 Jan 2024
PLLaMa: An Open-source Large Language Model for Plant Science
PLLaMa: An Open-source Large Language Model for Plant Science
Xianjun Yang
Junfeng Gao
Wenxin Xue
Erik Alexandersson
43
19
0
03 Jan 2024
AstroLLaMA-Chat: Scaling AstroLLaMA with Conversational and Diverse
  Datasets
AstroLLaMA-Chat: Scaling AstroLLaMA with Conversational and Diverse Datasets
Ernest Perkowski
Rui Pan
Tuan Dung Nguyen
Yuan-Sen Ting
Sandor Kruk
...
Michael J. Smith
Huiling Liu
Kevin Schawinski
K. Iyer
I. Ciucă
AI4MH
20
12
0
03 Jan 2024
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Hongye Jin
Xiaotian Han
Jingfeng Yang
Zhimeng Jiang
Zirui Liu
Chia-Yuan Chang
Huiyuan Chen
Xia Hu
42
101
0
02 Jan 2024
Quokka: An Open-source Large Language Model ChatBot for Material Science
Quokka: An Open-source Large Language Model ChatBot for Material Science
Xianjun Yang
Stephen D. Wilson
Linda R. Petzold
OSLM
37
2
0
02 Jan 2024
ScatterFormer: Efficient Voxel Transformer with Scattered Linear
  Attention
ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention
Chenhang He
Ruihuang Li
Guowen Zhang
Lei Zhang
35
5
0
01 Jan 2024
Building Efficient Universal Classifiers with Natural Language Inference
Building Efficient Universal Classifiers with Natural Language Inference
Moritz Laurer
W. Atteveldt
Andreu Casas
Kasper Welbers
38
8
0
29 Dec 2023
MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining
MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining
Jacob P. Portes
Alex Trott
Sam Havens
Daniel King
Abhinav Venigalla
Moin Nadeem
Nikhil Sardana
D. Khudia
Jonathan Frankle
28
17
0
29 Dec 2023
Spike No More: Stabilizing the Pre-training of Large Language Models
Spike No More: Stabilizing the Pre-training of Large Language Models
Sho Takase
Shun Kiyono
Sosuke Kobayashi
Jun Suzuki
20
14
0
28 Dec 2023
PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with
  Time-Decoupled Training and Reusable Coop-Diffusion
PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion
Guansong Lu
Yuanfan Guo
Jianhua Han
Minzhe Niu
Yihan Zeng
Songcen Xu
Zeyi Huang
Zhao Zhong
Wei Zhang
Hang Xu
39
4
0
27 Dec 2023
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
Jiannan Wu
Yi-Xin Jiang
Bin Yan
Huchuan Lu
Zehuan Yuan
Ping Luo
VOS
39
17
0
25 Dec 2023
Understanding the Potential of FPGA-Based Spatial Acceleration for Large
  Language Model Inference
Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference
Hongzheng Chen
Jiahao Zhang
Yixiao Du
Shaojie Xiang
Zichao Yue
Niansong Zhang
Yaohui Cai
Zhiru Zhang
65
35
0
23 Dec 2023
Generative AI Beyond LLMs: System Implications of Multi-Modal Generation
Generative AI Beyond LLMs: System Implications of Multi-Modal Generation
Alicia Golden
Samuel Hsia
Fei Sun
Bilge Acun
Basil Hosmer
...
Zachary DeVito
Jeff Johnson
Gu-Yeon Wei
David Brooks
Carole-Jean Wu
VLM
DiffM
37
8
0
22 Dec 2023
InternVL: Scaling up Vision Foundation Models and Aligning for Generic
  Visual-Linguistic Tasks
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLM
MLLM
176
961
0
21 Dec 2023
VideoPoet: A Large Language Model for Zero-Shot Video Generation
VideoPoet: A Large Language Model for Zero-Shot Video Generation
Dan Kondratyuk
Lijun Yu
Xiuye Gu
José Lezama
Jonathan Huang
...
Irfan Essa
Huisheng Wang
David A. Ross
Bryan Seybold
Lu Jiang
VGen
20
240
0
21 Dec 2023
Neural Point Cloud Diffusion for Disentangled 3D Shape and Appearance
  Generation
Neural Point Cloud Diffusion for Disentangled 3D Shape and Appearance Generation
Philipp Schroppel
Christopher Wewer
J. E. Lenssen
Eddy Ilg
Thomas Brox
35
9
0
21 Dec 2023
Exploring Multimodal Large Language Models for Radiology Report
  Error-checking
Exploring Multimodal Large Language Models for Radiology Report Error-checking
Jinge Wu
Yunsoo Kim
Eva C. Keller
Jamie Chow
Adam P. Levine
Nikolas Pontikos
Zina M. Ibrahim
Paul Taylor
Michelle C. Williams
Honghan Wu
LM&MA
22
3
0
20 Dec 2023
Language Resources for Dutch Large Language Modelling
Language Resources for Dutch Large Language Modelling
Bram Vanroy
MoE
ALM
31
7
0
20 Dec 2023
Lookahead: An Inference Acceleration Framework for Large Language Model
  with Lossless Generation Accuracy
Lookahead: An Inference Acceleration Framework for Large Language Model with Lossless Generation Accuracy
Yao-Min Zhao
Zhitian Xie
Chen Liang
Chenyi Zhuang
Jinjie Gu
70
12
0
20 Dec 2023
Previous
123...202122...272829
Next