ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.08691
  4. Cited By
FlashAttention-2: Faster Attention with Better Parallelism and Work
  Partitioning

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

17 July 2023
Tri Dao
    LRM
ArXiv (abs)PDFHTML

Papers citing "FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning"

50 / 329 papers shown
Title
Slamming: Training a Speech Language Model on One GPU in a Day
Slamming: Training a Speech Language Model on One GPU in a Day
Gallil Maimon
Avishai Elmakies
Yossi Adi
95
3
0
19 Feb 2025
NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation
NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation
Zhiyuan Liu
Yanchen Luo
Han Huang
Enzhi Zhang
Changhao Nai
Sihang Li
Yaorui Shi
Xiang Wang
Kenji Kawaguchi
Tat-Seng Chua
198
4
0
18 Feb 2025
Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from Jailbreaking
Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from Jailbreaking
Junda Zhu
Lingyong Yan
Shuaiqiang Wang
Dawei Yin
Lei Sha
AAMLLRM
98
6
0
18 Feb 2025
HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading
HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading
Cheng Luo
Zefan Cai
Hanshi Sun
Jinqi Xiao
Bo Yuan
Wen Xiao
Junjie Hu
Jiawei Zhao
Beidi Chen
Anima Anandkumar
126
2
0
18 Feb 2025
APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs
APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs
Yuxiang Huang
Mingye Li
Xu Han
Chaojun Xiao
Weilin Zhao
Sun Ao
Hao Zhou
Jie Zhou
Zhiyuan Liu
Maosong Sun
126
0
0
17 Feb 2025
Following the Autoregressive Nature of LLM Embeddings via Compression and Alignment
Following the Autoregressive Nature of LLM Embeddings via Compression and Alignment
Jingcheng Deng
Zhongtao Jiang
Liang Pang
Liwei Chen
Kun Xu
Zihao Wei
Huawei Shen
Xueqi Cheng
112
3
0
17 Feb 2025
Strada-LLM: Graph LLM for traffic prediction
Strada-LLM: Graph LLM for traffic prediction
Seyed Mohamad Moghadas
Yangxintong Lyu
Bruno Cornelis
Alexandre Alahi
Adrian Munteanu
AI4TS
104
1
0
17 Feb 2025
Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
Fan Zhou
Zengzhi Wang
Qian Liu
Junlong Li
Pengfei Liu
ALM
219
15
0
17 Feb 2025
Low-Rank Thinning
Low-Rank Thinning
Annabelle Michael Carrell
Albert Gong
Abhishek Shetty
Raaz Dwivedi
Lester W. Mackey
110
0
0
17 Feb 2025
Token Pruning in Multimodal Large Language Models: Are We Solving the Right Problem?
Token Pruning in Multimodal Large Language Models: Are We Solving the Right Problem?
Zichen Wen
Yifeng Gao
Weijia Li
Conghui He
Linfeng Zhang
LRM
176
6
0
17 Feb 2025
RaaS: Reasoning-Aware Attention Sparsity for Efficient LLM Reasoning
RaaS: Reasoning-Aware Attention Sparsity for Efficient LLM Reasoning
Junhao Hu
Wenrui Huang
Weidong Wang
Zhenwen Li
Tiancheng Hu
Zhixia Liu
Xusheng Chen
Tao Xie
Yizhou Shan
LRM
140
2
0
16 Feb 2025
KernelBench: Can LLMs Write Efficient GPU Kernels?
KernelBench: Can LLMs Write Efficient GPU Kernels?
Anne Ouyang
Simon Guo
Simran Arora
Alex L. Zhang
William Hu
Christopher Ré
Azalia Mirhoseini
ALM
137
4
0
14 Feb 2025
GoRA: Gradient-driven Adaptive Low Rank Adaptation
GoRA: Gradient-driven Adaptive Low Rank Adaptation
Haonan He
Peng Ye
Yuchen Ren
Yuan Yuan
Luyang Zhou
Shucun Ju
Lei Chen
AI4TSAI4CE
472
1
0
13 Feb 2025
Typhoon T1: An Open Thai Reasoning Model
Typhoon T1: An Open Thai Reasoning Model
Pittawat Taveekitworachai
Potsawee Manakul
Kasima Tharnpipitchai
Kunat Pipatanakul
OffRLLRM
276
0
0
13 Feb 2025
Self-Training Large Language Models for Tool-Use Without Demonstrations
Self-Training Large Language Models for Tool-Use Without Demonstrations
Ne Luo
Aryo Pradipta Gema
Xuanli He
Emile van Krieken
Pietro Lesci
Pasquale Minervini
LLMAG
156
2
0
09 Feb 2025
Twilight: Adaptive Attention Sparsity with Hierarchical Top-$p$ Pruning
Twilight: Adaptive Attention Sparsity with Hierarchical Top-ppp Pruning
C. Lin
Jiaming Tang
Shuo Yang
Hanshuo Wang
Tian Tang
Boyu Tian
Ion Stoica
Enze Xie
Mingyu Gao
177
5
0
04 Feb 2025
LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models
LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models
Tzu-Tao Chang
Shivaram Venkataraman
VLM
559
0
0
04 Feb 2025
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?
Xiang Liu
Zhenheng Tang
Hong Chen
Peijie Dong
Zeyu Li
Xiuze Zhou
Bo Li
Xuming Hu
Xiaowen Chu
473
7
0
04 Feb 2025
Longer Attention Span: Increasing Transformer Context Length with Sparse Graph Processing Techniques
Longer Attention Span: Increasing Transformer Context Length with Sparse Graph Processing Techniques
Nathaniel Tomczak
Sanmukh Kuppannagari
214
0
0
31 Jan 2025
KVDirect: Distributed Disaggregated LLM Inference
Shiyang Chen
Rain Jiang
Dezhi Yu
Jinlai Xu
Mengyuan Chao
Fanlong Meng
Chenyu Jiang
Wei Xu
Hang Liu
92
2
0
28 Jan 2025
SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs
SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs
Mohammad Mozaffari
Amir Yazdanbakhsh
Zhao Zhang
M. Dehnavi
146
7
0
28 Jan 2025
360Brew: A Decoder-only Foundation Model for Personalized Ranking and Recommendation
360Brew: A Decoder-only Foundation Model for Personalized Ranking and Recommendation
Hamed Firooz
Maziar Sanjabi
Adrian Englhardt
Aman Gupta
Ben Levine
...
Xiaoling Zhai
Ya Xu
Yu Wang
Yun Dai
Yun Dai
ALM
145
4
0
27 Jan 2025
LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion
LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion
Zhan Ling
Kang Liu
Kai Yan
Yue Yang
Weijian Lin
Ting-Han Fan
Lingfeng Shen
Zhengyin Du
Jiecao Chen
ReLMELMLRM
107
8
0
25 Jan 2025
ZETA: Leveraging Z-order Curves for Efficient Top-k Attention
ZETA: Leveraging Z-order Curves for Efficient Top-k Attention
Qiuhao Zeng
Jerry Huang
Peng Lu
Gezheng Xu
Boxing Chen
Charles Ling
Boyu Wang
195
3
0
24 Jan 2025
Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
Jianing Yang
Alexander Sax
Kevin J. Liang
Mikael Henaff
Hao Tang
Ang Cao
J. Chai
Franziska Meier
Matt Feiszli
3DGS
193
31
0
23 Jan 2025
NExtLong: Toward Effective Long-Context Training without Long Documents
NExtLong: Toward Effective Long-Context Training without Long Documents
Chaochen Gao
Xing Wu
Zijia Lin
Debing Zhang
Songlin Hu
SyDa
186
2
0
22 Jan 2025
Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models
Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models
Junyu Chen
Han Cai
Junsong Chen
Enze Xie
Shang Yang
Haotian Tang
Zhekai Zhang
Yaojie Lu
Song Han
DiffM
176
7
0
20 Jan 2025
From Arabic Text to Puzzles: LLM-Driven Development of Arabic Educational Crosswords
From Arabic Text to Puzzles: LLM-Driven Development of Arabic Educational Crosswords
Kamyar Zeinalipour
M. Saad
Marco Maggini
Marco Gori
86
2
0
19 Jan 2025
Few-Shot Adaptation of Training-Free Foundation Model for 3D Medical Image Segmentation
Few-Shot Adaptation of Training-Free Foundation Model for 3D Medical Image Segmentation
Xingxin He
Yifan Hu
Zhaoye Zhou
Mohamed Jarraya
Fang Liu
VLMMedIm
105
2
0
17 Jan 2025
PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological Measurements
PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological Measurements
Xueyan Li
Xinyan Chen
Yazhe Niu
Shuai Hu
Yu Liu
OffRL
141
3
0
17 Jan 2025
FocalPO: Enhancing Preference Optimizing by Focusing on Correct Preference Rankings
FocalPO: Enhancing Preference Optimizing by Focusing on Correct Preference Rankings
Tong Liu
Xiao Yu
Wenxuan Zhou
Jindong Gu
Volker Tresp
82
1
0
11 Jan 2025
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Oleg Filatov
Jan Ebert
Jiangtao Wang
Stefan Kesselheim
115
4
0
10 Jan 2025
IDEAL: Leveraging Infinite and Dynamic Characterizations of Large Language Models for Query-focused Summarization
IDEAL: Leveraging Infinite and Dynamic Characterizations of Large Language Models for Query-focused Summarization
Jie Cao
Dian Jiao
Qiang Yan
Wenqiao Zhang
Siliang Tang
Yueting Zhuang
90
1
0
08 Jan 2025
Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum
Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum
Hadi Pouransari
Chun-Liang Li
Jen-Hao Rick Chang
Pavan Kumar Anasosalu Vasu
Cem Koc
Vaishaal Shankar
Oncel Tuzel
95
11
0
08 Jan 2025
FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance
FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance
Haicheng Wang
Zhemeng Yu
Gabriele Spadaro
Chen Ju
Victor Quétu
Enzo Tartaglione
Enzo Tartaglione
VLM
433
6
0
05 Jan 2025
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Zihao Ye
Lequn Chen
Ruihang Lai
Wuwei Lin
Yineng Zhang
...
Tianqi Chen
Baris Kasikci
Vinod Grover
Arvind Krishnamurthy
Luis Ceze
142
35
0
02 Jan 2025
DiC: Rethinking Conv3x3 Designs in Diffusion Models
DiC: Rethinking Conv3x3 Designs in Diffusion Models
Yuchuan Tian
Jing Han
Chengcheng Wang
Yuchen Liang
Chao Xu
Hanting Chen
DiffM
149
2
0
31 Dec 2024
VMamba: Visual State Space Model
VMamba: Visual State Space Model
Yue Liu
Yunjie Tian
Yuzhong Zhao
Hongtian Yu
Lingxi Xie
Yaowei Wang
Qixiang Ye
Jianbin Jiao
Yunfan Liu
Mamba
318
725
0
31 Dec 2024
Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders
Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders
Rui Chen
Jianfeng Zhang
Yixun Liang
Guan Luo
Weiyu Li
Jiarui Liu
Xiu Li
Xiaoxiao Long
Jiashi Feng
P. Tan
235
18
0
23 Dec 2024
FlexCache: Flexible Approximate Cache System for Video Diffusion
FlexCache: Flexible Approximate Cache System for Video Diffusion
Desen Sun
Henry Tian
Tim Lu
Sihang Liu
DiffM
164
1
0
18 Dec 2024
Expansion Span: Combining Fading Memory and Retrieval in Hybrid State Space Models
Expansion Span: Combining Fading Memory and Retrieval in Hybrid State Space Models
Elvis Nunez
Luca Zancato
Benjamin Bowman
Aditya Golatkar
Wei Xia
Stefano Soatto
217
4
0
17 Dec 2024
Wonderland: Navigating 3D Scenes from a Single Image
Wonderland: Navigating 3D Scenes from a Single Image
Hanwen Liang
Junli Cao
Vidit Goel
Guocheng Qian
Sergei Korolev
Demetri Terzopoulos
Konstantinos N. Plataniotis
Sergey Tulyakov
Jian Ren
VGen
208
14
0
16 Dec 2024
AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration
AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration
Wenhao Sun
Rong-Cheng Tu
Jingyi Liao
Zhao Jin
Dacheng Tao
VGen
256
1
0
16 Dec 2024
What Makes Cryptic Crosswords Challenging for LLMs?
What Makes Cryptic Crosswords Challenging for LLMs?
Abdelrahman Sadallah
Daria Kotova
Ekaterina Kochmar
AAML
164
0
0
12 Dec 2024
Unifying KV Cache Compression for Large Language Models with LeanKV
Unifying KV Cache Compression for Large Language Models with LeanKV
Yanqi Zhang
Yuwei Hu
Runyuan Zhao
John C. S. Lui
Haibo Chen
MQ
286
7
0
04 Dec 2024
Marconi: Prefix Caching for the Era of Hybrid LLMs
Marconi: Prefix Caching for the Era of Hybrid LLMs
Rui Pan
Zhuang Wang
Zhen Jia
Can Karakus
Luca Zancato
Tri Dao
Ravi Netravali
Yida Wang
195
4
0
28 Nov 2024
MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion Model
MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion Model
Chenjie Cao
Chaohui Yu
Shang Liu
Fan Wang
Xiangyang Xue
Yanwei Fu
150
2
0
25 Nov 2024
Disentangling the Complex Multiplexed DIA Spectra in De Novo Peptide Sequencing
Disentangling the Complex Multiplexed DIA Spectra in De Novo Peptide Sequencing
Zheng Ma
Zeping Mao
Ruixue Zhang
Jiazhen Chen
L. Xin
Paul Shan
A. Ghodsi
Ming Li
96
0
0
24 Nov 2024
MAS-Attention: Memory-Aware Stream Processing for Attention Acceleration on Resource-Constrained Edge Devices
MAS-Attention: Memory-Aware Stream Processing for Attention Acceleration on Resource-Constrained Edge Devices
Mohammadali Shakerdargah
Shan Lu
Chao Gao
Di Niu
168
0
0
20 Nov 2024
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Jared Fernandez
Luca Wehrstedt
Leonid Shamis
Mostafa Elhoushi
Kalyan Saladi
Yonatan Bisk
Emma Strubell
Jacob Kahn
553
4
0
20 Nov 2024
Previous
1234567
Next