Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.14135
Cited By
v1
v2 (latest)
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"
50 / 1,508 papers shown
Title
LimGen: Probing the LLMs for Generating Suggestive Limitations of Research Papers
Abdur Rahman Bin Md Faizullah
Ashok Urlana
Rahul Mishra
68
7
0
22 Mar 2024
Hierarchical Skip Decoding for Efficient Autoregressive Text Generation
Yunqi Zhu
Xuebing Yang
Yuanyuan Wu
Wensheng Zhang
124
3
0
22 Mar 2024
ZigMa: A DiT-style Zigzag Mamba Diffusion Model
Vincent Tao Hu
S. A. Baumann
Ming Gui
Olga Grebenkova
Pingchuan Ma
Johannes S. Fischer
Bjorn Ommer
120
46
0
20 Mar 2024
Chain-of-Interaction: Enhancing Large Language Models for Psychiatric Behavior Understanding by Dyadic Contexts
Guangzeng Han
Weisi Liu
Xiaolei Huang
Brian Borsari
76
22
0
20 Mar 2024
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Yaowei Zheng
Richong Zhang
Junhao Zhang
Yanhan Ye
Zheyan Luo
Zhangchi Feng
Yongqiang Ma
157
558
0
20 Mar 2024
Efficient Encoder-Decoder Transformer Decoding for Decomposable Tasks
Bo-Ru Lu
Nikita Haduong
Chien-Yu Lin
Hao Cheng
Noah A. Smith
Mari Ostendorf
AI4CE
73
0
0
19 Mar 2024
MELTing point: Mobile Evaluation of Language Transformers
Stefanos Laskaridis
Kleomenis Katevas
Lorenzo Minto
Hamed Haddadi
95
24
0
19 Mar 2024
WaterVG: Waterway Visual Grounding based on Text-Guided Vision and mmWave Radar
Runwei Guan
Liye Jia
Fengyufan Yang
Shanliang Yao
Erick Purwanto
...
Eng Gee Lim
Jeremy S. Smith
Ka Lok Man
Xuming Hu
Yutao Yue
119
9
0
19 Mar 2024
HCPM: Hierarchical Candidates Pruning for Efficient Detector-Free Matching
Ying Chen
Yong-Jin Liu
Kai Wu
Qiang Nie
Shang Xu
Huifang Ma
Bing Wang
Chengjie Wang
VLM
66
1
0
19 Mar 2024
Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization
Haocheng Xi
Yuxiang Chen
Kang Zhao
Kaijun Zheng
Jianfei Chen
Jun Zhu
MQ
99
23
0
19 Mar 2024
HDLdebugger: Streamlining HDL debugging with Large Language Models
Xufeng Yao
Haoyang Li
T. H. Chan
Wenyi Xiao
Mingxuan Yuan
Yu Huang
Lei Chen
Bei Yu
71
23
0
18 Mar 2024
Narrative Feature or Structured Feature? A Study of Large Language Models to Identify Cancer Patients at Risk of Heart Failure
Ziyi Chen
Mengyuan Zhang
M. M. Ahmed
Yi Guo
T. George
Jiang Bian
Yonghui Wu
79
2
0
18 Mar 2024
FastDecode: High-Throughput GPU-Efficient LLM Serving using Heterogeneous Pipelines
Jiaao He
Jidong Zhai
96
35
0
18 Mar 2024
NeoNeXt: Novel neural network operator and architecture based on the patch-wise matrix multiplications
Vladimir Korviakov
Denis Koposov
56
0
0
17 Mar 2024
Is Mamba Effective for Time Series Forecasting?
Zihan Wang
Fanheng Kong
Shi Feng
Ming Wang
Xiaocui Yang
Han Zhao
Daling Wang
Yifei Zhang
Mamba
AI4TS
85
75
0
17 Mar 2024
StainDiffuser: MultiTask Dual Diffusion Model for Virtual Staining
Tushar Kataria
Beatrice Knudsen
Shireen Y. Elhabian
DiffM
MedIm
105
10
0
17 Mar 2024
EfficientMorph: Parameter-Efficient Transformer-Based Architecture for 3D Image Registration
Abu Zahid Bin Aziz
Mokshagna Sai Teja Karanam
Tushar Kataria
Shireen Y. Elhabian
ViT
MedIm
69
2
0
16 Mar 2024
NeuFlow: Real-time, High-accuracy Optical Flow Estimation on Robots Using Edge Devices
Zhiyong Zhang
Huaizu Jiang
H. Singh
79
6
0
15 Mar 2024
ExeGPT: Constraint-Aware Resource Scheduling for LLM Inference
Hyungjun Oh
Kihong Kim
Jaemin Kim
Sungkyun Kim
Junyeol Lee
Du-Seong Chang
Jiwon Seo
91
36
0
15 Mar 2024
FastSAM3D: An Efficient Segment Anything Model for 3D Volumetric Medical Images
Yiqing Shen
Jingxing Li
Xinyuan Shao
Blanca Inigo Romillo
Ankush Jindal
David Dreizin
Mathias Unberath
MedIm
88
15
0
14 Mar 2024
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference
Piotr Nawrot
Adrian Lañcucki
Marcin Chochowski
David Tarjan
Edoardo Ponti
94
56
0
14 Mar 2024
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Guo Chen
Yifei Huang
Jilan Xu
Baoqi Pei
Zhe Chen
Zhiqi Li
Jiahao Wang
Kunchang Li
Tong Lu
Limin Wang
Mamba
135
78
0
14 Mar 2024
depyf: Open the Opaque Box of PyTorch Compiler for Machine Learning Researchers
Kaichao You
Runsheng Bai
Meng Cao
Jianmin Wang
Ion Stoica
Mingsheng Long
VLM
77
0
0
14 Mar 2024
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences
Sun Ao
Weilin Zhao
Xu Han
Cheng Yang
Zhiyuan Liu
Chuan Shi
Maosong Sun
GNN
69
8
0
14 Mar 2024
PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest
Jiajun Deng
Sha Zhang
Feras Dayoub
Wanli Ouyang
Yanyong Zhang
Ian Reid
3DPC
124
4
0
14 Mar 2024
SAM-Lightening: A Lightweight Segment Anything Model with Dilated Flash Attention to Achieve 30 times Acceleration
Yanfei Song
Bangzheng Pu
Peng Wang
Hongxu Jiang
Dong Dong
Yongxiang Cao
Yiqing Shen
VLM
86
13
0
14 Mar 2024
Keyformer: KV Cache Reduction through Key Tokens Selection for Efficient Generative Inference
Muhammad Adnan
Akhil Arunkumar
Gaurav Jain
Prashant J. Nair
Ilya Soloveychik
Purushotham Kamath
112
62
0
14 Mar 2024
The Garden of Forking Paths: Observing Dynamic Parameters Distribution in Large Language Models
Carlo Nicolini
Jacopo Staiano
Bruno Lepri
Raffaele Marino
MoE
62
1
0
13 Mar 2024
Bifurcated Attention: Accelerating Massively Parallel Decoding with Shared Prefixes in LLMs
Ben Athiwaratkun
Sujan Kumar Gonugondla
Sanjay Krishna Gouda
Haifeng Qian
Hantian Ding
...
Liangfu Chen
Parminder Bhatia
Ramesh Nallapati
Sudipta Sengupta
Bing Xiang
88
4
0
13 Mar 2024
Language models scale reliably with over-training and on downstream tasks
S. Gadre
Georgios Smyrnis
Vaishaal Shankar
Suchin Gururangan
Mitchell Wortsman
...
Y. Carmon
Achal Dave
Reinhard Heckel
Niklas Muennighoff
Ludwig Schmidt
ALM
ELM
LRM
183
48
0
13 Mar 2024
StreamingDialogue: Prolonged Dialogue Learning via Long Context Compression with Minimal Losses
Jia-Nan Li
Quan Tu
Cunli Mao
Zhengtao Yu
Ji-Rong Wen
Rui Yan
OffRL
76
4
0
13 Mar 2024
CHAI: Clustered Head Attention for Efficient LLM Inference
Saurabh Agarwal
Bilge Acun
Basil Homer
Mostafa Elhoushi
Yejin Lee
Shivaram Venkataraman
Dimitris Papailiopoulos
Carole-Jean Wu
107
11
0
12 Mar 2024
Rethinking Generative Large Language Model Evaluation for Semantic Comprehension
Fangyun Wei
Xi Chen
Linzi Luo
ELM
ALM
LRM
63
8
0
12 Mar 2024
Characterization of Large Language Model Development in the Datacenter
Qi Hu
Zhisheng Ye
Zerui Wang
Guoteng Wang
Mengdie Zhang
...
Dahua Lin
Xiaolin Wang
Yingwei Luo
Yonggang Wen
Tianwei Zhang
94
49
0
12 Mar 2024
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Liang Chen
Haozhe Zhao
Tianyu Liu
Shuai Bai
Junyang Lin
Chang Zhou
Baobao Chang
MLLM
VLM
121
155
0
11 Mar 2024
Algorithmic progress in language models
Anson Ho
T. Besiroglu
Ege Erdil
David Owen
Robi Rahman
Zifan Carl Guo
David Atkinson
Neil Thompson
J. Sevilla
67
18
0
09 Mar 2024
GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM
Hao Kang
Qingru Zhang
Souvik Kundu
Geonhwa Jeong
Zaoxing Liu
Tushar Krishna
Tuo Zhao
MQ
175
94
0
08 Mar 2024
Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed
Yifan Wang
Xingyi He He
Sida Peng
Dongli Tan
Xiaowei Zhou
3DV
84
53
0
07 Mar 2024
Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level
Ali Hassani
Wen-mei W. Hwu
Humphrey Shi
66
9
0
07 Mar 2024
Yi: Open Foundation Models by 01.AI
01. AI
Alex Young
01.AI Alex Young
Bei Chen
Chao Li
...
Yue Wang
Yuxuan Cai
Zhenyu Gu
Zhiyuan Liu
Zonghong Dai
OSLM
LRM
317
576
0
07 Mar 2024
Low-Resource Court Judgment Summarization for Common Law Systems
Shuaiqi Liu
Jiannong Cao
Yicong Li
Ruosong Yang
Zhiyuan Wen
ELM
AILaw
63
3
0
07 Mar 2024
Mastering Memory Tasks with World Models
Mohammad Reza Samsami
Artem Zholus
Janarthanan Rajendran
Sarath Chandar
CLL
OffRL
102
28
0
07 Mar 2024
RATSF: Empowering Customer Service Volume Management through Retrieval-Augmented Time-Series Forecasting
Tianfeng Wang
Gaojie Cui
AI4TS
99
0
0
07 Mar 2024
SaulLM-7B: A pioneering Large Language Model for Law
Pierre Colombo
T. Pires
Malik Boudiaf
Dominic Culver
Rui Melo
...
Andre F. T. Martins
Fabrizio Esposito
Vera Lúcia Raposo
Sofia Morgado
Michael Desa
ELM
AILaw
114
75
0
06 Mar 2024
AcceleratedLiNGAM: Learning Causal DAGs at the speed of GPUs
Victor Akinwande
J. Zico Kolter
CML
86
1
0
06 Mar 2024
CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models
Zexuan Qiu
Jingjing Li
Shijue Huang
Wanjun Zhong
Irwin King
ELM
ALM
109
6
0
06 Mar 2024
Slot Abstractors: Toward Scalable Abstract Visual Reasoning
S. S. Mondal
Jonathan D. Cohen
Taylor W. Webb
OCL
75
9
0
06 Mar 2024
Reliable, Adaptable, and Attributable Language Models with Retrieval
Akari Asai
Zexuan Zhong
Danqi Chen
Pang Wei Koh
Luke Zettlemoyer
Hanna Hajishirzi
Wen-tau Yih
KELM
RALM
116
62
0
05 Mar 2024
TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back) using Taylor-Softmax
Tobias Christian Nauen
Sebastián M. Palacio
Andreas Dengel
110
3
0
05 Mar 2024
Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review
Iryna Hartsock
Ghulam Rasool
102
81
0
04 Mar 2024
Previous
1
2
3
...
19
20
21
...
29
30
31
Next