ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.14135
  4. Cited By
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
    VLM
ArXivPDFHTML

Papers citing "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"

50 / 1,432 papers shown
Title
Faster Causal Attention Over Large Sequences Through Sparse Flash
  Attention
Faster Causal Attention Over Large Sequences Through Sparse Flash Attention
Matteo Pagliardini
Daniele Paliotta
Martin Jaggi
Franccois Fleuret
LRM
20
22
0
01 Jun 2023
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora
  with Web Data, and Web Data Only
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only
Guilherme Penedo
Quentin Malartic
Daniel Hesslow
Ruxandra-Aimée Cojocaru
Alessandro Cappelli
Hamza Alobeidli
B. Pannier
Ebtesam Almazrouei
Julien Launay
32
751
0
01 Jun 2023
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Chaitanya K. Ryali
Yuan-Ting Hu
Daniel Bolya
Chen Wei
Haoqi Fan
...
Omid Poursaeed
Judy Hoffman
Jitendra Malik
Yanghao Li
Christoph Feichtenhofer
3DH
45
160
0
01 Jun 2023
SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two
  Seconds
SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds
Yanyu Li
Huan Wang
Qing Jin
Ju Hu
Pavlo Chemerys
Yun Fu
Yanzhi Wang
Sergey Tulyakov
Jian Ren
VLM
26
152
0
01 Jun 2023
Coneheads: Hierarchy Aware Attention
Coneheads: Hierarchy Aware Attention
Albert Tseng
Tao Yu
Toni J.B. Liu
Chris De Sa
3DPC
17
5
0
01 Jun 2023
Protein Design with Guided Discrete Diffusion
Protein Design with Guided Discrete Diffusion
Nate Gruver
Samuel Stanton
Nathan C. Frey
Tim G. J. Rudner
I. Hotzel
J. Lafrance-Vanasse
A. Rajpal
Kyunghyun Cho
A. Wilson
DiffM
42
101
0
31 May 2023
Self-Verification Improves Few-Shot Clinical Information Extraction
Self-Verification Improves Few-Shot Clinical Information Extraction
Zelalem Gero
Chandan Singh
Hao Cheng
Tristan Naumann
Michel Galley
Jianfeng Gao
Hoifung Poon
40
54
0
30 May 2023
Blockwise Parallel Transformer for Large Context Models
Blockwise Parallel Transformer for Large Context Models
Hao Liu
Pieter Abbeel
49
11
0
30 May 2023
Likelihood-Based Diffusion Language Models
Likelihood-Based Diffusion Language Models
Ishaan Gulrajani
Tatsunori B. Hashimoto
DiffM
29
51
0
30 May 2023
From Zero to Turbulence: Generative Modeling for 3D Flow Simulation
From Zero to Turbulence: Generative Modeling for 3D Flow Simulation
Marten Lienen
David Lüdke
Jan Hansen-Palmus
Stephan Günnemann
DiffM
AI4CE
26
24
0
29 May 2023
SlimFit: Memory-Efficient Fine-Tuning of Transformer-based Models Using
  Training Dynamics
SlimFit: Memory-Efficient Fine-Tuning of Transformer-based Models Using Training Dynamics
A. Ardakani
Altan Haan
Shangyin Tan
Doru-Thom Popovici
Alvin Cheung
Costin Iancu
Koushik Sen
24
2
0
29 May 2023
BigTranslate: Augmenting Large Language Models with Multilingual
  Translation Capability over 100 Languages
BigTranslate: Augmenting Large Language Models with Multilingual Translation Capability over 100 Languages
Wen Yang
Chong Li
Jiajun Zhang
Chengqing Zong
LRM
25
46
0
29 May 2023
Geometric Algebra Transformer
Geometric Algebra Transformer
Johann Brehmer
P. D. Haan
S. Behrends
Taco S. Cohen
41
26
0
28 May 2023
Exploring the Practicality of Generative Retrieval on Dynamic Corpora
Exploring the Practicality of Generative Retrieval on Dynamic Corpora
Soyoung Yoon
Chaeeun Kim
Hyunji Lee
Joel Jang
Sohee Yang
Minjoon Seo
24
3
0
27 May 2023
Accelerating Text-to-Image Editing via Cache-Enabled Sparse Diffusion
  Inference
Accelerating Text-to-Image Editing via Cache-Enabled Sparse Diffusion Inference
Zihao Yu
Haoyang Li
Fangcheng Fu
Xupeng Miao
Bin Cui
DiffM
30
8
0
27 May 2023
Fine-Tuning Language Models with Just Forward Passes
Fine-Tuning Language Models with Just Forward Passes
Sadhika Malladi
Tianyu Gao
Eshaan Nichani
Alexandru Damian
Jason D. Lee
Danqi Chen
Sanjeev Arora
41
179
0
27 May 2023
Scissorhands: Exploiting the Persistence of Importance Hypothesis for
  LLM KV Cache Compression at Test Time
Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time
Zichang Liu
Aditya Desai
Fangshuo Liao
Weitao Wang
Victor Xie
Zhaozhuo Xu
Anastasios Kyrillidis
Anshumali Shrivastava
28
202
0
26 May 2023
Backpack Language Models
Backpack Language Models
John Hewitt
John Thickstun
Christopher D. Manning
Percy Liang
KELM
19
16
0
26 May 2023
Imitating Task and Motion Planning with Visuomotor Transformers
Imitating Task and Motion Planning with Visuomotor Transformers
Murtaza Dalal
Ajay Mandlekar
Caelan Reed Garrett
Ankur Handa
Ruslan Salakhutdinov
Dieter Fox
55
54
0
25 May 2023
Landmark Attention: Random-Access Infinite Context Length for
  Transformers
Landmark Attention: Random-Access Infinite Context Length for Transformers
Amirkeivan Mohtashami
Martin Jaggi
LLMAG
27
149
0
25 May 2023
Online learning of long-range dependencies
Online learning of long-range dependencies
Nicolas Zucchet
Robert Meier
Simon Schug
Asier Mujika
João Sacramento
CLL
46
18
0
25 May 2023
Dynamic Context Pruning for Efficient and Interpretable Autoregressive
  Transformers
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
Sotiris Anagnostidis
Dario Pavllo
Luca Biggio
Lorenzo Noci
Aurelien Lucchi
Thomas Hofmann
42
53
0
25 May 2023
Manifold Diffusion Fields
Manifold Diffusion Fields
Ahmed A. A. Elhag
Yuyang Wang
J. Susskind
Miguel Angel Bautista
DiffM
AI4CE
36
6
0
24 May 2023
Focus Your Attention (with Adaptive IIR Filters)
Focus Your Attention (with Adaptive IIR Filters)
Shahar Lutati
Itamar Zimerman
Lior Wolf
32
9
0
24 May 2023
Just CHOP: Embarrassingly Simple LLM Compression
Just CHOP: Embarrassingly Simple LLM Compression
A. Jha
Tom Sherborne
Evan Pete Walsh
Dirk Groeneveld
Emma Strubell
Iz Beltagy
30
3
0
24 May 2023
Adapting Language Models to Compress Contexts
Adapting Language Models to Compress Contexts
Alexis Chevalier
Alexander Wettig
Anirudh Ajith
Danqi Chen
LLMAG
16
174
0
24 May 2023
Dual Path Transformer with Partition Attention
Dual Path Transformer with Partition Attention
Zhengkai Jiang
Liang Liu
Jiangning Zhang
Yabiao Wang
Mingang Chen
Chengjie Wang
ViT
36
2
0
24 May 2023
BinaryViT: Towards Efficient and Accurate Binary Vision Transformers
Junrui Xiao
Zhikai Li
Lianwei Yang
Qingyi Gu
MQ
ViT
35
2
0
24 May 2023
Segmented Recurrent Transformer: An Efficient Sequence-to-Sequence Model
Segmented Recurrent Transformer: An Efficient Sequence-to-Sequence Model
Yinghan Long
Sayeed Shafayet Chowdhury
Kaushik Roy
40
1
0
24 May 2023
WikiChat: Stopping the Hallucination of Large Language Model Chatbots by
  Few-Shot Grounding on Wikipedia
WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia
Sina J. Semnani
Violet Z. Yao
He Zhang
M. Lam
KELM
AI4MH
30
72
0
23 May 2023
Training Transitive and Commutative Multimodal Transformers with LoReTTa
Training Transitive and Commutative Multimodal Transformers with LoReTTa
Manuel Tran
Yashin Dicente Cid
Amal Lahiani
Fabian J. Theis
Tingying Peng
Eldad Klaiman
26
2
0
23 May 2023
Neural Machine Translation for Code Generation
Neural Machine Translation for Code Generation
K. Dharma
Clayton T. Morrison
32
4
0
22 May 2023
A Framework for Fine-Grained Synchronization of Dependent GPU Kernels
A Framework for Fine-Grained Synchronization of Dependent GPU Kernels
Abhinav Jangda
Saeed Maleki
M. Dehnavi
Madan Musuvathi
Olli Saarikivi
24
5
0
22 May 2023
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head
  Checkpoints
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Joshua Ainslie
James Lee-Thorp
Michiel de Jong
Yury Zemlyanskiy
Federico Lebrón
Sumit Sanghai
26
586
0
22 May 2023
Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM
  Inference Pipeline
Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline
Zangwei Zheng
Xiaozhe Ren
Fuzhao Xue
Yang Luo
Xin Jiang
Yang You
42
55
0
22 May 2023
RWKV: Reinventing RNNs for the Transformer Era
RWKV: Reinventing RNNs for the Transformer Era
Bo Peng
Eric Alcaide
Quentin G. Anthony
Alon Albalak
Samuel Arcadinho
...
Qihang Zhao
P. Zhou
Qinghua Zhou
Jian Zhu
Rui-Jie Zhu
90
562
0
22 May 2023
Non-Autoregressive Document-Level Machine Translation
Non-Autoregressive Document-Level Machine Translation
Guangsheng Bao
Zhiyang Teng
Hao Zhou
Jianhao Yan
Yue Zhang
44
0
0
22 May 2023
FIT: Far-reaching Interleaved Transformers
FIT: Far-reaching Interleaved Transformers
Ting-Li Chen
Lala Li
29
12
0
22 May 2023
Has It All Been Solved? Open NLP Research Questions Not Solved by Large
  Language Models
Has It All Been Solved? Open NLP Research Questions Not Solved by Large Language Models
Oana Ignat
Zhijing Jin
Artem Abzaliev
Laura Biester
Santiago Castro
...
Verónica Pérez-Rosas
Siqi Shen
Zekun Wang
Winston Wu
Rada Mihalcea
LRM
41
6
0
21 May 2023
CARD: Channel Aligned Robust Blend Transformer for Time Series
  Forecasting
CARD: Channel Aligned Robust Blend Transformer for Time Series Forecasting
Xue Wang
Tian Zhou
Qingsong Wen
Jinyang Gao
Bolin Ding
Rong Jin
AI4TS
26
39
0
20 May 2023
Efficient ConvBN Blocks for Transfer Learning and Beyond
Efficient ConvBN Blocks for Transfer Learning and Beyond
Kaichao You
Guo Qin
Anchang Bao
Mengsi Cao
Ping-Chia Huang
Jiulong Shan
Mingsheng Long
31
1
0
19 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited
  Modalities
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
48
115
0
18 May 2023
Less is More! A slim architecture for optimal language translation
Less is More! A slim architecture for optimal language translation
Luca Herranz-Celotti
E. Rrapaj
36
0
0
18 May 2023
Discffusion: Discriminative Diffusion Models as Few-shot Vision and
  Language Learners
Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners
Xuehai He
Weixi Feng
Tsu-jui Fu
Varun Jampani
Arjun Reddy Akula
P. Narayana
Sugato Basu
William Yang Wang
Qing Guo
DiffM
54
7
0
18 May 2023
Ray-Patch: An Efficient Querying for Light Field Transformers
Ray-Patch: An Efficient Querying for Light Field Transformers
T. B. Martins
Javier Civera
ViT
39
0
0
16 May 2023
Diffusion Models for Imperceptible and Transferable Adversarial Attack
Diffusion Models for Imperceptible and Transferable Adversarial Attack
Jianqi Chen
H. Chen
Keyan Chen
Yilan Zhang
Zhengxia Zou
Z. Shi
DiffM
32
57
0
14 May 2023
EfficientViT: Memory Efficient Vision Transformer with Cascaded Group
  Attention
EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention
Xinyu Liu
Houwen Peng
Ningxin Zheng
Yuqing Yang
Han Hu
Yixuan Yuan
ViT
25
277
0
11 May 2023
StarCoder: may the source be with you!
StarCoder: may the source be with you!
Raymond Li
Loubna Ben Allal
Yangtian Zi
Niklas Muennighoff
Denis Kocetkov
...
Sean M. Hughes
Thomas Wolf
Arjun Guha
Leandro von Werra
H. D. Vries
62
719
0
09 May 2023
CaloClouds: Fast Geometry-Independent Highly-Granular Calorimeter
  Simulation
CaloClouds: Fast Geometry-Independent Highly-Granular Calorimeter Simulation
E. Buhmann
S. Diefenbacher
E. Eren
F. Gaede
Gregor Kasieczka
A. Korol
W. Korcari
K. Krüger
Peter McKeown
DiffM
23
45
0
08 May 2023
Cheaply Evaluating Inference Efficiency Metrics for Autoregressive
  Transformer APIs
Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs
Deepak Narayanan
Keshav Santhanam
Peter Henderson
Rishi Bommasani
Tony Lee
Percy Liang
145
3
0
03 May 2023
Previous
123...26272829
Next