Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.09431
Cited By
Striped Attention: Faster Ring Attention for Causal Transformers
15 November 2023
William Brandon
Aniruddha Nrusimha
Kevin Qian
Zack Ankner
Tian Jin
Zhiye Song
Jonathan Ragan-Kelley
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Striped Attention: Faster Ring Attention for Causal Transformers"
8 / 8 papers shown
Title
Efficient Pretraining Length Scaling
Bohong Wu
Shen Yan
Sijun Zhang
Jianqiao Lu
Yutao Zeng
Ya Wang
Xun Zhou
384
0
0
21 Apr 2025
LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models
Tzu-Tao Chang
Shivaram Venkataraman
VLM
435
0
0
04 Feb 2025
Context Parallelism for Scalable Million-Token Inference
Amy Yang
Jingyi Yang
Aya Ibrahim
Xinfeng Xie
Bangsheng Tang
Grigory Sizov
Jeremy Reizenstein
Jongsoo Park
Jianyu Huang
MoE
LRM
94
7
0
04 Nov 2024
Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers
Rya Sanovar
Srikant Bharadwaj
Renée St. Amant
Victor Rühle
Saravan Rajmohan
111
7
0
17 May 2024
World Model on Million-Length Video And Language With Blockwise RingAttention
Hao Liu
Wilson Yan
Matei A. Zaharia
Pieter Abbeel
VGen
86
77
0
13 Feb 2024
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
233
2,440
0
20 Apr 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
310
1,899
0
17 Sep 2019
Generating Long Sequences with Sparse Transformers
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
99
1,896
0
23 Apr 2019
1