Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.08691
Cited By
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
17 July 2023
Tri Dao
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning"
50 / 230 papers shown
Title
Unifying KV Cache Compression for Large Language Models with LeanKV
Yanqi Zhang
Yuwei Hu
Runyuan Zhao
John C. S. Lui
Haibo Chen
MQ
151
5
0
04 Dec 2024
Marconi: Prefix Caching for the Era of Hybrid LLMs
Rui Pan
Zhuang Wang
Zhen Jia
Can Karakus
Luca Zancato
Tri Dao
Ravi Netravali
Yida Wang
97
4
0
28 Nov 2024
MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion Model
Chenjie Cao
Chaohui Yu
Shang Liu
Fan Wang
Xiangyang Xue
Yanwei Fu
97
1
0
25 Nov 2024
Disentangling the Complex Multiplexed DIA Spectra in De Novo Peptide Sequencing
Zheng Ma
Zeping Mao
Ruixue Zhang
Jiazhen Chen
L. Xin
Paul Shan
A. Ghodsi
Ming Li
75
0
0
24 Nov 2024
MAS-Attention: Memory-Aware Stream Processing for Attention Acceleration on Resource-Constrained Edge Devices
Mohammadali Shakerdargah
Shan Lu
Chao Gao
Di Niu
75
0
0
20 Nov 2024
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Jared Fernandez
Luca Wehrstedt
Leonid Shamis
Mostafa Elhoushi
Kalyan Saladi
Yonatan Bisk
Emma Strubell
Jacob Kahn
254
3
0
20 Nov 2024
Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings
Miguel Moura Ramos
Tomás Almeida
Daniel Vareta
Filipe Azevedo
Sweta Agrawal
Patrick Fernandes
André F. T. Martins
33
1
0
08 Nov 2024
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
Jaemin Cho
Debanjan Mahata
Ozan Irsoy
Yujie He
Joey Tianyi Zhou
VLM
35
10
0
07 Nov 2024
Exploring Hierarchical Molecular Graph Representation in Multimodal LLMs
Chengxin Hu
Hao Li
Yihe Yuan
Jing Li
Ivor Tsang
51
0
0
07 Nov 2024
MEG: Medical Knowledge-Augmented Large Language Models for Question Answering
Laura Cabello
Carmen Martin-Turrero
Uchenna Akujuobi
Anders Søgaard
Carlos Bobed
AI4MH
157
1
0
06 Nov 2024
DiMSUM: Diffusion Mamba -- A Scalable and Unified Spatial-Frequency Method for Image Generation
Hao Phung
Quan Dao
T. Dao
Hoang Phan
Dimitris Metaxas
Anh Tran
Mamba
67
4
0
06 Nov 2024
TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
Wei Wu
Zhuoshi Pan
Chao Wang
L. Chen
Y. Bai
Kun Fu
Zehua Wang
Hui Xiong
Hui Xiong
LLMAG
41
5
0
05 Nov 2024
Context Parallelism for Scalable Million-Token Inference
Amy Yang
Jingyi Yang
Aya Ibrahim
Xinfeng Xie
Bangsheng Tang
Grigory Sizov
Jeremy Reizenstein
Jongsoo Park
Jianyu Huang
MoE
LRM
72
5
0
04 Nov 2024
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
Hanshi Sun
Li-Wen Chang
Yiyuan Ma
Wenlei Bao
Ningxin Zheng
Xin Liu
Harry Dong
Yuejie Chi
Beidi Chen
VLM
88
16
0
28 Oct 2024
Scaling Diffusion Language Models via Adaptation from Autoregressive Models
Shansan Gong
Shivam Agarwal
Yizhe Zhang
Jiacheng Ye
Lin Zheng
...
Peilin Zhao
W. Bi
Jiawei Han
Hao Peng
Lingpeng Kong
AI4CE
78
16
0
23 Oct 2024
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Michael Noukhovitch
Shengyi Huang
Sophie Xhonneux
Arian Hosseini
Rishabh Agarwal
Rameswar Panda
OffRL
85
6
0
23 Oct 2024
Scaling Stick-Breaking Attention: An Efficient Implementation and In-depth Study
Shawn Tan
Songlin Yang
Aaron Courville
Rameswar Panda
Yikang Shen
30
4
0
23 Oct 2024
Markov Chain of Thought for Efficient Mathematical Reasoning
Wen Yang
Kai Fan
Minpeng Liao
LRM
47
3
0
23 Oct 2024
Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation
Shaonan Wu
Shuai Lu
Y. Gong
Nan Duan
Ping Wei
AIMat
45
0
0
21 Oct 2024
Comprehensive benchmarking of large language models for RNA secondary structure prediction
L. I. Zablocki
L. A. Bugnon
M. Gerard
L. Di Persia
G. Stegmayer
D. H. Milone
AI4TS
31
3
0
21 Oct 2024
A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference
You Wu
Haoyi Wu
Kewei Tu
34
3
0
18 Oct 2024
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
Yizhao Gao
Zhichen Zeng
Dayou Du
Shijie Cao
Hayden Kwok-Hay So
...
Junjie Lai
Mao Yang
Ting Cao
Fan Yang
M. Yang
52
19
0
17 Oct 2024
MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems
Nandan Thakur
Suleman Kazi
Ge Luo
Jimmy J. Lin
Amin Ahmad
VLM
RALM
28
7
0
17 Oct 2024
In-context KV-Cache Eviction for LLMs via Attention-Gate
Zihao Zeng
Bokai Lin
Tianqi Hou
Hao Zhang
Zhijie Deng
38
1
0
15 Oct 2024
Liger Kernel: Efficient Triton Kernels for LLM Training
Pin-Lun Hsu
Yun Dai
Vignesh Kothapalli
Qingquan Song
Shao Tang
Siyu Zhu
Steven Shimizu
Shivam Sahni
Haowen Ning
Yanning Chen
53
30
0
14 Oct 2024
Language Imbalance Driven Rewarding for Multilingual Self-improving
Wen Yang
Junhong Wu
Chen Wang
Chengqing Zong
J.N. Zhang
ALM
LRM
74
4
0
11 Oct 2024
TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text
Songshuo Lu
Hua Wang
Yutian Rong
Zhi Chen
Yaohua Tang
VLM
31
16
0
10 Oct 2024
CursorCore: Assist Programming through Aligning Anything
Hao Jiang
Qi Liu
Rui Li
Shengyu Ye
Shijin Wang
53
1
0
09 Oct 2024
Learning Evolving Tools for Large Language Models
Guoxin Chen
Zhong Zhang
Xin Cong
Fangda Guo
Yesai Wu
Yankai Lin
Wenzheng Feng
Yasheng Wang
KELM
52
1
0
09 Oct 2024
Parameter Efficient Fine-tuning via Explained Variance Adaptation
Fabian Paischer
Lukas Hauzenberger
Thomas Schmied
Benedikt Alkin
Marc Peter Deisenroth
Sepp Hochreiter
42
4
0
09 Oct 2024
Differential Transformer
Tianzhu Ye
Li Dong
Yuqing Xia
Yutao Sun
Yi Zhu
Gao Huang
Furu Wei
203
0
0
07 Oct 2024
Presto! Distilling Steps and Layers for Accelerating Music Generation
Zachary Novack
Ge Zhu
Jonah Casebeer
Julian McAuley
Taylor Berg-Kirkpatrick
Nicholas J. Bryan
53
5
0
07 Oct 2024
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
62
17
0
06 Oct 2024
Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval
Pengcheng Jiang
Cao Xiao
Minhao Jiang
Parminder Bhatia
Taha A. Kass-Hout
Jimeng Sun
Jiawei Han
RALM
AI4MH
51
4
0
06 Oct 2024
LongGenBench: Long-context Generation Benchmark
Xiang Liu
Peijie Dong
Xuming Hu
Xiaowen Chu
RALM
55
8
0
05 Oct 2024
System 2 Reasoning Capabilities Are Nigh
Scott C. Lowe
VLM
LRM
51
0
0
04 Oct 2024
ToolGen: Unified Tool Retrieval and Calling via Generation
Renxi Wang
Xudong Han
Lei Ji
Shu Wang
Timothy Baldwin
Haonan Li
LLMAG
75
6
0
04 Oct 2024
HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly
Howard Yen
Tianyu Gao
Minmin Hou
Ke Ding
Daniel Fleischer
Peter Izsak
Moshe Wasserblat
Danqi Chen
ALM
ELM
62
26
0
03 Oct 2024
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Jintao Zhang
Jia wei
Pengle Zhang
Jun-Jie Zhu
Jun Zhu
Jianfei Chen
VLM
MQ
84
19
0
03 Oct 2024
How to Train Long-Context Language Models (Effectively)
Tianyu Gao
Alexander Wettig
Howard Yen
Danqi Chen
RALM
72
39
0
03 Oct 2024
Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads on Consumer-Grade Devices
Yuxiang Huang
Binhang Yuan
Xu Han
Chaojun Xiao
Zhiyuan Liu
RALM
87
1
0
02 Oct 2024
FlashMask: Efficient and Rich Mask Extension of FlashAttention
Guoxia Wang
Jinle Zeng
Xiyuan Xiao
Siming Wu
Jiabin Yang
Lujing Zheng
Zeyu Chen
Jiang Bian
Dianhai Yu
Haifeng Wang
166
2
0
02 Oct 2024
MIO: A Foundation Model on Multimodal Tokens
Zekun Wang
King Zhu
Chunpu Xu
Wangchunshu Zhou
Jiaheng Liu
...
Yuanxing Zhang
Ge Zhang
Ke Xu
Jie Fu
Wenhao Huang
MLLM
AuLLM
60
11
0
26 Sep 2024
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
Xiaoming Shi
Shiyu Wang
Yuqi Nie
Dianqi Li
Zhou Ye
Qingsong Wen
Ming Jin
AI4TS
46
32
0
24 Sep 2024
KodeXv0.1: A Family of State-of-the-Art Financial Large Language Models
Neel Rajani
Lilli Kiessling
Aleksandr Ogaltsov
Claus Lang
ALM
33
0
0
13 Sep 2024
What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices
Zhi Chen
Qiguang Chen
Libo Qin
Qipeng Guo
Haijun Lv
Yicheng Zou
Wanxiang Che
Hang Yan
K. Chen
Dahua Lin
SyDa
56
4
0
03 Sep 2024
Masked Mixers for Language Generation and Retrieval
Benjamin L. Badger
47
0
0
02 Sep 2024
Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
Jinghan Yao
Sam Ade Jacobs
Masahiro Tanaka
Olatunji Ruwase
Hari Subramoni
D. Panda
33
2
0
30 Aug 2024
MAPF-GPT: Imitation Learning for Multi-Agent Pathfinding at Scale
Anton Andreychuk
Konstantin Yakovlev
Aleksandr I. Panov
A. Skrynnik
AI4CE
65
3
0
29 Aug 2024
IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities
Bin Wang
Chunyu Xie
Dawei Leng
Yuhui Yin
MLLM
54
1
0
23 Aug 2024
Previous
1
2
3
4
5
Next