ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.14135
  4. Cited By
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

27 May 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
    VLM
ArXivPDFHTML

Papers citing "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"

50 / 1,439 papers shown
Title
Whole Genome Transformer for Gene Interaction Effects in Microbiome Habitat Specificity
Whole Genome Transformer for Gene Interaction Effects in Microbiome Habitat Specificity
Zhufeng Li
S. S. Cranganore
Nicholas D. Youngblut
Niki Kilbertus
47
2
0
09 May 2024
KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache
  Generation
KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation
Minsik Cho
Mohammad Rastegari
Devang Naik
32
4
0
08 May 2024
Bridging the Bosphorus: Advancing Turkish Large Language Models through
  Strategies for Low-Resource Language Adaptation and Benchmarking
Bridging the Bosphorus: Advancing Turkish Large Language Models through Strategies for Low-Resource Language Adaptation and Benchmarking
Emre Can Acikgoz
Mete Erdogan
Deniz Yuret
44
7
0
07 May 2024
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
Ramya Prabhu
Ajay Nayak
Jayashree Mohan
Ramachandran Ramjee
Ashish Panwar
VLM
68
26
0
07 May 2024
Granite Code Models: A Family of Open Foundation Models for Code
  Intelligence
Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Mayank Mishra
Matt Stallone
Gaoyuan Zhang
Songlin Yang
Aditya Prasad
...
Amith Singhee
Nirmit Desai
David D. Cox
Ruchir Puri
Yikang Shen
AI4TS
68
59
0
07 May 2024
KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference
  with Coupled Quantization
KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization
Tianyi Zhang
Jonah Yi
Zhaozhuo Xu
Anshumali Shrivastava
MQ
31
26
0
07 May 2024
Data-Efficient Molecular Generation with Hierarchical Textual Inversion
Data-Efficient Molecular Generation with Hierarchical Textual Inversion
Seojin Kim
Jaehyun Nam
Sihyun Yu
Younghoon Shin
Jinwoo Shin
45
3
0
05 May 2024
IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs
IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs
Yuzhen Mao
Martin Ester
Ke Li
38
6
0
05 May 2024
Is Flash Attention Stable?
Is Flash Attention Stable?
Alicia Golden
Samuel Hsia
Fei Sun
Bilge Acun
Basil Hosmer
...
Zachary DeVito
Jeff Johnson
Gu-Yeon Wei
David Brooks
Carole-Jean Wu
37
5
0
05 May 2024
DynaMo: Accelerating Language Model Inference with Dynamic Multi-Token
  Sampling
DynaMo: Accelerating Language Model Inference with Dynamic Multi-Token Sampling
Shikhar Tuli
Chi-Heng Lin
Yen-Chang Hsu
N. Jha
Yilin Shen
Hongxia Jin
AI4CE
38
1
0
01 May 2024
DAM: A Universal Dual Attention Mechanism for Multimodal Timeseries
  Cryptocurrency Trend Forecasting
DAM: A Universal Dual Attention Mechanism for Multimodal Timeseries Cryptocurrency Trend Forecasting
Yihang Fu
Mingyu Zhou
Luyao Zhang
AI4TS
31
6
0
01 May 2024
Lightplane: Highly-Scalable Components for Neural 3D Fields
Lightplane: Highly-Scalable Components for Neural 3D Fields
Ang Cao
Justin Johnson
Andrea Vedaldi
David Novotny
41
8
0
30 Apr 2024
ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized
  Transformers
ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers
Yuzhe Gu
Enmao Diao
42
4
0
30 Apr 2024
Revenge of the Fallen? Recurrent Models Match Transformers at Predicting
  Human Language Comprehension Metrics
Revenge of the Fallen? Recurrent Models Match Transformers at Predicting Human Language Comprehension Metrics
J. Michaelov
Catherine Arnett
Benjamin Bergen
34
4
0
30 Apr 2024
Building a Large Japanese Web Corpus for Large Language Models
Building a Large Japanese Web Corpus for Large Language Models
Naoaki Okazaki
Kakeru Hattori
Hirai Shota
Hiroki Iida
Masanari Ohi
Kazuki Fujii
Taishi Nakamura
Mengsay Loem
Rio Yokota
Sakae Mizuki
60
7
0
27 Apr 2024
Andes: Defining and Enhancing Quality-of-Experience in LLM-Based Text
  Streaming Services
Andes: Defining and Enhancing Quality-of-Experience in LLM-Based Text Streaming Services
Jiachen Liu
Zhiyu Wu
Jae-Won Chung
Fan Lai
Myungjin Lee
Mosharaf Chowdhury
56
26
0
25 Apr 2024
CORM: Cache Optimization with Recent Message for Large Language Model
  Inference
CORM: Cache Optimization with Recent Message for Large Language Model Inference
Jincheng Dai
Zhuowei Huang
Haiyun Jiang
Chen Chen
Deng Cai
Wei Bi
Shuming Shi
43
3
0
24 Apr 2024
Nyonic Technical Report
Nyonic Technical Report
Junfeng Tian
Rui Wang
Cong Li
Yudong Zhou
Jun Liu
Jun Wang
41
0
0
24 Apr 2024
Automated Creation of Source Code Variants of a Cryptographic Hash
  Function Implementation Using Generative Pre-Trained Transformer Models
Automated Creation of Source Code Variants of a Cryptographic Hash Function Implementation Using Generative Pre-Trained Transformer Models
Elijah Pelofske
Vincent Urias
L. Liebrock
40
0
0
24 Apr 2024
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster
  Pre-training on Web-scale Image-Text Data
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Sachin Mehta
Maxwell Horton
Fartash Faghri
Mohammad Hossein Sekhavat
Mahyar Najibi
Mehrdad Farajtabar
Oncel Tuzel
Mohammad Rastegari
VLM
CLIP
49
6
0
24 Apr 2024
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference
Jo˜ao Monteiro
Étienne Marcotte
Pierre-Andre Noel
Valentina Zantedeschi
David Vázquez
Nicolas Chapados
Christopher Pal
Perouz Taslakian
41
5
0
23 Apr 2024
Automated Multi-Language to English Machine Translation Using Generative
  Pre-Trained Transformers
Automated Multi-Language to English Machine Translation Using Generative Pre-Trained Transformers
Elijah Pelofske
Vincent Urias
L. Liebrock
42
0
0
23 Apr 2024
OpenELM: An Efficient Language Model Family with Open Training and
  Inference Framework
OpenELM: An Efficient Language Model Family with Open Training and Inference Framework
Sachin Mehta
Mohammad Hossein Sekhavat
Qingqing Cao
Maxwell Horton
Yanzi Jin
...
Iman Mirzadeh
Mahyar Najibi
Dmitry Belenko
Peter Zatloukal
Mohammad Rastegari
OSLM
AIFin
45
51
0
22 Apr 2024
Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU
  Heterogeneity
Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity
Tyler Griggs
Xiaoxuan Liu
Jiaxiang Yu
Doyoung Kim
Wei-Lin Chiang
Alvin Cheung
Ion Stoica
54
16
0
22 Apr 2024
SpaceByte: Towards Deleting Tokenization from Large Language Modeling
SpaceByte: Towards Deleting Tokenization from Large Language Modeling
Kevin Slagle
42
3
0
22 Apr 2024
SnapKV: LLM Knows What You are Looking for Before Generation
SnapKV: LLM Knows What You are Looking for Before Generation
Yuhong Li
Yingbing Huang
Bowen Yang
Bharat Venkitesh
Acyr Locatelli
Hanchen Ye
Tianle Cai
Patrick Lewis
Deming Chen
VLM
79
161
0
22 Apr 2024
A Survey on Efficient Inference for Large Language Models
A Survey on Efficient Inference for Large Language Models
Zixuan Zhou
Xuefei Ning
Ke Hong
Tianyu Fu
Jiaming Xu
...
Shengen Yan
Guohao Dai
Xiao-Ping Zhang
Yuhan Dong
Yu Wang
51
87
0
22 Apr 2024
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your
  Phone
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Marah Abdin
Sam Ade Jacobs
A. A. Awan
J. Aneja
Ahmed Hassan Awadallah
...
Li Lyna Zhang
Yi Zhang
Yue Zhang
Yunan Zhang
Xiren Zhou
LRM
ALM
64
1,076
0
22 Apr 2024
SmartMem: Layout Transformation Elimination and Adaptation for Efficient
  DNN Execution on Mobile
SmartMem: Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile
Wei Niu
Md. Musfiqur Rahman Sanim
Zhihao Shu
Jiexiong Guan
Xipeng Shen
Miao Yin
Gagan Agrawal
Bin Ren
45
6
0
21 Apr 2024
Large Language Models for Next Point-of-Interest Recommendation
Large Language Models for Next Point-of-Interest Recommendation
Peibo Li
Maarten de Rijke
Hao Xue
Shuang Ao
Yang Song
Flora D. Salim
85
18
0
19 Apr 2024
MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware
  State Space Model
MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model
Kang Zeng
Haowen Shi
Jiacheng Lin
Siyu Li
Jintao Cheng
Kaiwei Wang
Zhiyong Li
Kailun Yang
Mamba
51
6
0
19 Apr 2024
EdgeFusion: On-Device Text-to-Image Generation
EdgeFusion: On-Device Text-to-Image Generation
Thibault Castells
Hyoung-Kyu Song
Tairen Piao
Shinkook Choi
Bo-Kyeong Kim
Hanyoung Yim
Changgwun Lee
Jae Gon Kim
Tae-Ho Kim
VLM
34
6
0
18 Apr 2024
TriForce: Lossless Acceleration of Long Sequence Generation with
  Hierarchical Speculative Decoding
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Hanshi Sun
Zhuoming Chen
Xinyu Yang
Yuandong Tian
Beidi Chen
51
49
0
18 Apr 2024
Sequence Length Scaling in Vision Transformers for Scientific Images on
  Frontier
Sequence Length Scaling in Vision Transformers for Scientific Images on Frontier
A. Tsaris
Chengming Zhang
Xiao Wang
Junqi Yin
Siyan Liu
...
Jong Youl Choi
Mohamed Wahib
Dan Lu
Prasanna Balaprakash
Feiyi Wang
32
1
0
17 Apr 2024
Towards Coarse-to-Fine Evaluation of Inference Efficiency for Large
  Language Models
Towards Coarse-to-Fine Evaluation of Inference Efficiency for Large Language Models
Yushuo Chen
Tianyi Tang
Erge Xiang
Linjiang Li
Wayne Xin Zhao
Jing Wang
Yunpeng Chai
Ji-Rong Wen
18
1
0
17 Apr 2024
In-Context Learning State Vector with Inner and Momentum Optimization
In-Context Learning State Vector with Inner and Momentum Optimization
Dongfang Li
Zhenyu Liu
Xinshuo Hu
Zetian Sun
Baotian Hu
Min Zhang
49
5
0
17 Apr 2024
ScaleFold: Reducing AlphaFold Initial Training Time to 10 Hours
ScaleFold: Reducing AlphaFold Initial Training Time to 10 Hours
Feiwen Zhu
Arkadiusz Nowaczynski
Rundong Li
Jie Xin
Yifei Song
Michal Marcinkiewicz
S. Eryilmaz
June Yang
M. Andersch
54
4
0
17 Apr 2024
Unveiling the Misuse Potential of Base Large Language Models via
  In-Context Learning
Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning
Xiao Wang
Tianze Chen
Xianjun Yang
Qi Zhang
Xun Zhao
Dahua Lin
ELM
52
7
0
16 Apr 2024
Referring Flexible Image Restoration
Referring Flexible Image Restoration
Runwei Guan
Rongsheng Hu
Zhuhao Zhou
Tianlang Xue
Ka Lok Man
Jeremy S. Smith
Eng Gee Lim
Weiping Ding
Yutao Yue
39
0
0
16 Apr 2024
Long-form music generation with latent diffusion
Long-form music generation with latent diffusion
Zach Evans
Julian Parker
CJ Carr
Zack Zukowski
Josiah Taylor
Jordi Pons
MGen
DiffM
46
40
0
16 Apr 2024
Improving the Capabilities of Large Language Model Based Marketing
  Analytics Copilots With Semantic Search And Fine-Tuning
Improving the Capabilities of Large Language Model Based Marketing Analytics Copilots With Semantic Search And Fine-Tuning
Yilin Gao
Arava Sai Kumar
Yancheng Li
James W. Snyder
AI4MH
49
2
0
16 Apr 2024
Masked Autoencoders for Microscopy are Scalable Learners of Cellular
  Biology
Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology
Oren Z. Kraus
Kian Kenyon-Dean
Saber Saberian
Maryam Fallah
Peter McLean
...
Chi Vicky Cheng
Kristen Morse
Maureen Makes
Ben Mabey
Berton Earnshaw
42
28
0
16 Apr 2024
I/O in Machine Learning Applications on HPC Systems: A 360-degree Survey
I/O in Machine Learning Applications on HPC Systems: A 360-degree Survey
Noah Lewis
J. L. Bez
Suren Byna
68
0
0
16 Apr 2024
Adaptive Patching for High-resolution Image Segmentation with
  Transformers
Adaptive Patching for High-resolution Image Segmentation with Transformers
Enzhi Zhang
Isaac Lyngaas
Peng Chen
Xiao Wang
Jun Igarashi
Yuankai Huo
Mohamed Wahib
M. Munetomo
MedIm
32
2
0
15 Apr 2024
LoongServe: Efficiently Serving Long-context Large Language Models with
  Elastic Sequence Parallelism
LoongServe: Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism
Bingya Wu
Shengyu Liu
Yinmin Zhong
Peng Sun
Xuanzhe Liu
Xin Jin
RALM
48
54
0
15 Apr 2024
Exploring and Improving Drafts in Blockwise Parallel Decoding
Exploring and Improving Drafts in Blockwise Parallel Decoding
Taehyeon Kim
A. Suresh
Kishore Papineni
Michael Riley
Sanjiv Kumar
Adrian Benton
AI4TS
52
2
0
14 Apr 2024
CATS: Contextually-Aware Thresholding for Sparsity in Large Language
  Models
CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models
Je-Yong Lee
Donghyun Lee
Genghan Zhang
Mo Tiwari
Azalia Mirhoseini
44
15
0
12 Apr 2024
Reducing hallucination in structured outputs via Retrieval-Augmented
  Generation
Reducing hallucination in structured outputs via Retrieval-Augmented Generation
Patrice Béchard
Orlando Marquez Ayala
LLMAG
42
52
0
12 Apr 2024
Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models
Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models
Tanmay Gautam
Youngsuk Park
Hao Zhou
Parameswaran Raman
Wooseok Ha
53
11
0
11 Apr 2024
Behavior Trees Enable Structured Programming of Language Model Agents
Behavior Trees Enable Structured Programming of Language Model Agents
Richard Kelley
AI4CE
LM&Ro
LLMAG
45
0
0
11 Apr 2024
Previous
123...151617...272829
Next