ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.08691
  4. Cited By
FlashAttention-2: Faster Attention with Better Parallelism and Work
  Partitioning

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

17 July 2023
Tri Dao
    LRM
ArXiv (abs)PDFHTML

Papers citing "FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning"

50 / 329 papers shown
Title
LLäMmlein: Transparent, Compact and Competitive German-Only Language Models from Scratch
Jan Pfister
Julia Wunderle
Andreas Hotho
120
0
0
17 Nov 2024
Squeezed Attention: Accelerating Long Context Length LLM Inference
Squeezed Attention: Accelerating Long Context Length LLM Inference
Coleman Hooper
Sehoon Kim
Hiva Mohammadzadeh
Monishwaran Maheswaran
June Paik
Michael W. Mahoney
Kemal Kurniawan
Amir Gholami
Amir Gholami
178
16
0
14 Nov 2024
Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings
Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings
Miguel Moura Ramos
Tomás Almeida
Daniel Vareta
Filipe Azevedo
Sweta Agrawal
Patrick Fernandes
André F. T. Martins
123
4
0
08 Nov 2024
Exploring Hierarchical Molecular Graph Representation in Multimodal LLMs
Exploring Hierarchical Molecular Graph Representation in Multimodal LLMs
Chengxin Hu
Hao Li
Yihe Yuan
Jing Li
Ivor Tsang
127
1
0
07 Nov 2024
DiMSUM: Diffusion Mamba -- A Scalable and Unified Spatial-Frequency Method for Image Generation
DiMSUM: Diffusion Mamba -- A Scalable and Unified Spatial-Frequency Method for Image Generation
Hao Phung
Quan Dao
T. Dao
Hoang Phan
Dimitris Metaxas
Anh Tran
Mamba
170
5
0
06 Nov 2024
MEG: Medical Knowledge-Augmented Large Language Models for Question Answering
MEG: Medical Knowledge-Augmented Large Language Models for Question Answering
Laura Cabello
Carmen Martin-Turrero
Uchenna Akujuobi
Anders Søgaard
Carlos Bobed
AI4MH
346
1
0
06 Nov 2024
TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
Wei Wu
Zhuoshi Pan
Chao Wang
L. Chen
Y. Bai
Kun Fu
Zehua Wang
Hui Xiong
Hui Xiong
LLMAG
178
7
0
05 Nov 2024
Context Parallelism for Scalable Million-Token Inference
Context Parallelism for Scalable Million-Token Inference
Amy Yang
Jingyi Yang
Aya Ibrahim
Xinfeng Xie
Bangsheng Tang
Grigory Sizov
Jeremy Reizenstein
Jongsoo Park
Jianyu Huang
MoELRM
177
7
0
04 Nov 2024
A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks
A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks
Thomas Schmied
Thomas Adler
Vihang Patil
M. Beck
Korbinian Poppel
Johannes Brandstetter
Günter Klambauer
Razvan Pascanu
Sepp Hochreiter
203
7
0
29 Oct 2024
GenUP: Generative User Profilers as In-Context Learners for Next POI Recommender Systems
GenUP: Generative User Profilers as In-Context Learners for Next POI Recommender Systems
Wilson Wongso
Hao Xue
Flora D. Salim
59
2
0
28 Oct 2024
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
Hanshi Sun
Li-Wen Chang
Yiyuan Ma
Wenlei Bao
Ningxin Zheng
Xin Liu
Harry Dong
Yuejie Chi
Beidi Chen
VLM
165
21
0
28 Oct 2024
Markov Chain of Thought for Efficient Mathematical Reasoning
Markov Chain of Thought for Efficient Mathematical Reasoning
Wen Yang
Kai Fan
Minpeng Liao
LRM
54
5
0
23 Oct 2024
Scaling Diffusion Language Models via Adaptation from Autoregressive Models
Scaling Diffusion Language Models via Adaptation from Autoregressive Models
Shansan Gong
Shivam Agarwal
Yizhe Zhang
Jiacheng Ye
Lin Zheng
...
Peilin Zhao
W. Bi
Jiawei Han
Hao Peng
Dianbo Sui
AI4CE
140
27
0
23 Oct 2024
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Michael Noukhovitch
Shengyi Huang
Sophie Xhonneux
Arian Hosseini
Rishabh Agarwal
Rameswar Panda
OffRL
183
11
0
23 Oct 2024
Comprehensive benchmarking of large language models for RNA secondary structure prediction
Comprehensive benchmarking of large language models for RNA secondary structure prediction
L. I. Zablocki
L. A. Bugnon
M. Gerard
L. Di Persia
G. Stegmayer
D. H. Milone
AI4TS
66
6
0
21 Oct 2024
Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation
Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation
Shaonan Wu
Shuai Lu
Yeyun Gong
Nan Duan
Ping Wei
AIMat
100
1
0
21 Oct 2024
A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference
A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference
You Wu
Haoyi Wu
Kewei Tu
81
3
0
18 Oct 2024
MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems
MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems
Nandan Thakur
Suleman Kazi
Ge Luo
Jimmy J. Lin
Amin Ahmad
VLMRALM
210
7
0
17 Oct 2024
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
Yizhao Gao
Zhichen Zeng
Dayou Du
Shijie Cao
Hayden Kwok-Hay So
...
Junjie Lai
Mao Yang
Ting Cao
Fan Yang
M. Yang
148
28
0
17 Oct 2024
In-context KV-Cache Eviction for LLMs via Attention-Gate
In-context KV-Cache Eviction for LLMs via Attention-Gate
Zihao Zeng
Bokai Lin
Tianqi Hou
Hao Zhang
Zhijie Deng
123
2
0
15 Oct 2024
Liger Kernel: Efficient Triton Kernels for LLM Training
Liger Kernel: Efficient Triton Kernels for LLM Training
Pin-Lun Hsu
Yun Dai
Vignesh Kothapalli
Qingquan Song
Shao Tang
Siyu Zhu
Steven Shimizu
Shivam Sahni
Haowen Ning
Yanning Chen
127
46
0
14 Oct 2024
big.LITTLE Vision Transformer for Efficient Visual Recognition
big.LITTLE Vision Transformer for Efficient Visual Recognition
He Guo
Yulong Wang
Zixuan Ye
Jifeng Dai
Yuwen Xiong
ViT
94
0
0
14 Oct 2024
Language Imbalance Driven Rewarding for Multilingual Self-improving
Language Imbalance Driven Rewarding for Multilingual Self-improving
Wen Yang
Junhong Wu
Chen Wang
Chengqing Zong
J.N. Zhang
ALMLRM
215
7
0
11 Oct 2024
Reducing the Cost of Dropout in Flash-Attention by Hiding RNG with GEMM
Reducing the Cost of Dropout in Flash-Attention by Hiding RNG with GEMM
Haiyue Ma
Jian Liu
Ronny Krashinsky
50
0
0
10 Oct 2024
CursorCore: Assist Programming through Aligning Anything
CursorCore: Assist Programming through Aligning Anything
Hao Jiang
Qi Liu
Rui Li
Shengyu Ye
Shijin Wang
136
1
0
09 Oct 2024
Learning Evolving Tools for Large Language Models
Learning Evolving Tools for Large Language Models
Guoxin Chen
Zhong Zhang
Xin Cong
Fangda Guo
Yesai Wu
Yankai Lin
Wenzheng Feng
Yasheng Wang
KELM
143
2
0
09 Oct 2024
Differential Transformer
Differential Transformer
Tianzhu Ye
Li Dong
Yuqing Xia
Yutao Sun
Yi Zhu
Gao Huang
Furu Wei
513
0
0
07 Oct 2024
Presto! Distilling Steps and Layers for Accelerating Music Generation
Presto! Distilling Steps and Layers for Accelerating Music Generation
Cheng-i Wang
Ge Zhu
Jonah Casebeer
Julian McAuley
Taylor Berg-Kirkpatrick
Nicholas J. Bryan
128
7
0
07 Oct 2024
Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval
Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval
Pengcheng Jiang
Cao Xiao
Minhao Jiang
Parminder Bhatia
Taha A. Kass-Hout
Jimeng Sun
Jiawei Han
RALMAI4MH
193
7
0
06 Oct 2024
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
189
20
0
06 Oct 2024
LongGenBench: Long-context Generation Benchmark
LongGenBench: Long-context Generation Benchmark
Xiang Liu
Peijie Dong
Xuming Hu
Xiaowen Chu
RALM
103
9
0
05 Oct 2024
System 2 Reasoning Capabilities Are Nigh
System 2 Reasoning Capabilities Are Nigh
Scott C. Lowe
VLMLRM
98
0
0
04 Oct 2024
ToolGen: Unified Tool Retrieval and Calling via Generation
ToolGen: Unified Tool Retrieval and Calling via Generation
Renxi Wang
Xudong Han
Lei Ji
Shu Wang
Timothy Baldwin
Haonan Li
LLMAG
162
9
0
04 Oct 2024
Margin Matching Preference Optimization: Enhanced Model Alignment with Granular Feedback
Margin Matching Preference Optimization: Enhanced Model Alignment with Granular Feedback
Kyuyoung Kim
Ah Jeong Seo
Hao Liu
Jinwoo Shin
Kimin Lee
50
5
0
04 Oct 2024
HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly
HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly
Howard Yen
Tianyu Gao
Minmin Hou
Ke Ding
Daniel Fleischer
Peter Izsak
Moshe Wasserblat
Danqi Chen
ALMELM
141
37
0
03 Oct 2024
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Jintao Zhang
Jia Wei
Pengle Zhang
Jun-Jie Zhu
Jun Zhu
Jianfei Chen
VLMMQ
182
39
0
03 Oct 2024
How to Train Long-Context Language Models (Effectively)
How to Train Long-Context Language Models (Effectively)
Tianyu Gao
Alexander Wettig
Howard Yen
Danqi Chen
RALM
202
48
0
03 Oct 2024
Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads on Consumer-Grade Devices
Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads on Consumer-Grade Devices
Yuxiang Huang
Binhang Yuan
Xu Han
Chaojun Xiao
Zhiyuan Liu
RALM
168
3
0
02 Oct 2024
Extending Context Window of Large Language Models from a Distributional
  Perspective
Extending Context Window of Large Language Models from a Distributional Perspective
Yingsheng Wu
Yuxuan Gu
Xiaocheng Feng
Weihong Zhong
Dongliang Xu
Qing Yang
Hongtao Liu
Bing Qin
42
2
0
02 Oct 2024
FlashMask: Efficient and Rich Mask Extension of FlashAttention
FlashMask: Efficient and Rich Mask Extension of FlashAttention
Guoxia Wang
Jinle Zeng
Xiyuan Xiao
Siming Wu
Jiabin Yang
Lujing Zheng
Zeyu Chen
Jiang Bian
Dianhai Yu
Haifeng Wang
386
3
0
02 Oct 2024
MIO: A Foundation Model on Multimodal Tokens
MIO: A Foundation Model on Multimodal Tokens
Zekun Wang
King Zhu
Chunpu Xu
Wangchunshu Zhou
Jiaheng Liu
...
Yuanxing Zhang
Ge Zhang
Ke Xu
Jie Fu
Wenhao Huang
MLLMAuLLM
173
12
0
26 Sep 2024
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
Xiaoming Shi
Shiyu Wang
Yuqi Nie
Dianqi Li
Zhou Ye
Qingsong Wen
Ming Jin
AI4TS
167
56
0
24 Sep 2024
PecSched: Preemptive and Efficient Cluster Scheduling for LLM Inference
PecSched: Preemptive and Efficient Cluster Scheduling for LLM Inference
Zeyu Zhang
Haiying Shen
VLM
80
1
0
23 Sep 2024
What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices
What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices
Zhi Chen
Qiguang Chen
Libo Qin
Qipeng Guo
Haijun Lv
Yicheng Zou
Wanxiang Che
Hang Yan
Kai Chen
Dahua Lin
SyDa
126
4
0
03 Sep 2024
Masked Mixers for Language Generation and Retrieval
Masked Mixers for Language Generation and Retrieval
Benjamin L. Badger
167
0
0
02 Sep 2024
Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
Jinghan Yao
Sam Ade Jacobs
Masahiro Tanaka
Olatunji Ruwase
Hari Subramoni
D. Panda
102
2
0
30 Aug 2024
MAPF-GPT: Imitation Learning for Multi-Agent Pathfinding at Scale
MAPF-GPT: Imitation Learning for Multi-Agent Pathfinding at Scale
Anton Andreychuk
Konstantin Yakovlev
Aleksandr I. Panov
A. Skrynnik
AI4CE
159
5
0
29 Aug 2024
IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities
IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities
Bin Wang
Chunyu Xie
Dawei Leng
Yuhui Yin
MLLM
179
1
0
23 Aug 2024
ParGo: Bridging Vision-Language with Partial and Global Views
ParGo: Bridging Vision-Language with Partial and Global Views
An-Lan Wang
Bin Shan
Wei Shi
Kun-Yu Lin
Xiang Fei
Guozhi Tang
Lei Liao
Jingqun Tang
Can Huang
Wei-Shi Zheng
MLLMVLM
185
17
0
23 Aug 2024
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
Jian Chen
Vashisth Tiwari
Ranajoy Sadhukhan
Zhuoming Chen
Jinyuan Shi
Ian En-Hsu Yen
Ian En-Hsu Yen
Avner May
Tianqi Chen
Beidi Chen
LRM
156
32
0
20 Aug 2024
Previous
1234567
Next