ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.10509
  4. Cited By
Generating Long Sequences with Sparse Transformers

Generating Long Sequences with Sparse Transformers

23 April 2019
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
ArXivPDFHTML

Papers citing "Generating Long Sequences with Sparse Transformers"

50 / 1,140 papers shown
Title
Fast RoPE Attention: Combining the Polynomial Method and Fast Fourier Transform
Fast RoPE Attention: Combining the Polynomial Method and Fast Fourier Transform
Josh Alman
Zhao Song
4
0
0
17 May 2025
Learning Advanced Self-Attention for Linear Transformers in the Singular Value Domain
Learning Advanced Self-Attention for Linear Transformers in the Singular Value Domain
Hyowon Wi
Jeongwhan Choi
Noseong Park
33
0
0
13 May 2025
Lost in Transmission: When and Why LLMs Fail to Reason Globally
Lost in Transmission: When and Why LLMs Fail to Reason Globally
Tobias Schnabel
Kiran Tomlinson
Adith Swaminathan
Jennifer Neville
LRM
35
0
0
13 May 2025
Fused3S: Fast Sparse Attention on Tensor Cores
Fused3S: Fast Sparse Attention on Tensor Cores
Zitong Li
Aparna Chandramowlishwaran
GNN
47
0
0
12 May 2025
A Split-then-Join Approach to Abstractive Summarization for Very Long Documents in a Low Resource Setting
A Split-then-Join Approach to Abstractive Summarization for Very Long Documents in a Low Resource Setting
Lhuqita Fazry
VLM
35
0
0
11 May 2025
Graph Laplacian Wavelet Transformer via Learnable Spectral Decomposition
Graph Laplacian Wavelet Transformer via Learnable Spectral Decomposition
Andrew Kiruluta
Eric Lundy
Priscilla Burity
29
0
0
09 May 2025
Small Clips, Big Gains: Learning Long-Range Refocused Temporal Information for Video Super-Resolution
Small Clips, Big Gains: Learning Long-Range Refocused Temporal Information for Video Super-Resolution
Xingyu Zhou
Wei Long
Jingbo Lu
Shiyin Jiang
Weiyi You
Haifeng Wu
Shuhang Gu
41
0
0
04 May 2025
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
Piotr Piekos
Róbert Csordás
Jürgen Schmidhuber
MoE
VLM
104
1
0
01 May 2025
Polysemy of Synthetic Neurons Towards a New Type of Explanatory Categorical Vector Spaces
Polysemy of Synthetic Neurons Towards a New Type of Explanatory Categorical Vector Spaces
Michael Pichat
William Pogrund
Paloma Pichat
Judicael Poumay
Armanouche Gasparian
Samuel Demarchi
Martin Corbet
Alois Georgeon
Michael Veillet-Guillem
MILM
29
0
0
30 Apr 2025
From Attention to Atoms: Spectral Dictionary Learning for Fast, Interpretable Language Models
From Attention to Atoms: Spectral Dictionary Learning for Fast, Interpretable Language Models
Andrew Kiruluta
29
0
0
29 Apr 2025
Revisiting Transformers through the Lens of Low Entropy and Dynamic Sparsity
Revisiting Transformers through the Lens of Low Entropy and Dynamic Sparsity
Ruifeng Ren
Yong Liu
153
0
0
26 Apr 2025
The Rise of Small Language Models in Healthcare: A Comprehensive Survey
The Rise of Small Language Models in Healthcare: A Comprehensive Survey
Muskan Garg
Shaina Raza
Shebuti Rayana
Xingyi Liu
Sunghwan Sohn
LM&MA
AILaw
92
0
0
23 Apr 2025
Random Long-Context Access for Mamba via Hardware-aligned Hierarchical Sparse Attention
Random Long-Context Access for Mamba via Hardware-aligned Hierarchical Sparse Attention
Xiang Hu
Jiaqi Leng
Jun Zhao
Kewei Tu
Wei Wu
Mamba
50
0
0
23 Apr 2025
Generalized Neighborhood Attention: Multi-dimensional Sparse Attention at the Speed of Light
Generalized Neighborhood Attention: Multi-dimensional Sparse Attention at the Speed of Light
Ali Hassani
Fengzhe Zhou
Aditya Kane
Jiannan Huang
Chieh-Yun Chen
...
Bing Xu
Haicheng Wu
Wen-mei W. Hwu
Xuan Li
Humphrey Shi
31
0
0
23 Apr 2025
MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention
MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention
Yucheng Li
Huiqiang Jiang
Chengruidong Zhang
Qianhui Wu
Xufang Luo
...
Amir H. Abdi
Dongsheng Li
Jianfeng Gao
Yuqing Yang
Lili Qiu
35
1
0
22 Apr 2025
Efficient Pretraining Length Scaling
Efficient Pretraining Length Scaling
Bohong Wu
Shen Yan
Sijun Zhang
Jianqiao Lu
Yutao Zeng
Ya Wang
Xun Zhou
153
0
0
21 Apr 2025
AttentionDrop: A Novel Regularization Method for Transformer Models
AttentionDrop: A Novel Regularization Method for Transformer Models
Mirza Samad Ahmed Baig
Syeda Anshrah Gillani
Abdul Akbar Khan
Shahid Munir Shah
31
0
0
16 Apr 2025
Analysis of Attention in Video Diffusion Transformers
Analysis of Attention in Video Diffusion Transformers
Yuxin Wen
Jim Wu
Ajay Jain
Tom Goldstein
Ashwinee Panda
53
1
0
14 Apr 2025
Local Temporal Feature Enhanced Transformer with ROI-rank Based Masking for Diagnosis of ADHD
Local Temporal Feature Enhanced Transformer with ROI-rank Based Masking for Diagnosis of ADHD
Byunggun Kim
Younghun Kwon
MedIm
21
0
0
12 Apr 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao Wang
Songruoyao Wu
Jiaxing Yu
Kaipeng Zhang
MGen
VGen
70
1
0
01 Apr 2025
SQuat: Subspace-orthogonal KV Cache Quantization
SQuat: Subspace-orthogonal KV Cache Quantization
Hao Wang
Ligong Han
Kai Xu
Akash Srivastava
MQ
51
0
0
31 Mar 2025
DiTFastAttnV2: Head-wise Attention Compression for Multi-Modality Diffusion Transformers
DiTFastAttnV2: Head-wise Attention Compression for Multi-Modality Diffusion Transformers
Hao Zhang
R. Su
Zhihang Yuan
Pengtao Chen
Mingzhu Shen Yibo Fan
Shengen Yan
Guohao Dai
Yu Wang
39
0
0
28 Mar 2025
Exploring the Roles of Large Language Models in Reshaping Transportation Systems: A Survey, Framework, and Roadmap
Exploring the Roles of Large Language Models in Reshaping Transportation Systems: A Survey, Framework, and Roadmap
Tong Nie
Jian Sun
Wei Ma
72
1
0
27 Mar 2025
XAttention: Block Sparse Attention with Antidiagonal Scoring
XAttention: Block Sparse Attention with Antidiagonal Scoring
Ruyi Xu
Guangxuan Xiao
Haofeng Huang
Junxian Guo
Enze Xie
74
4
0
20 Mar 2025
DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding
DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding
Keyan Chen
Chenyang Liu
Bowen Chen
Wenyuan Li
Zhengxia Zou
Zhenwei Shi
50
2
0
20 Mar 2025
Intra-neuronal attention within language models Relationships between activation and semantics
Intra-neuronal attention within language models Relationships between activation and semantics
Michael Pichat
William Pogrund
Paloma Pichat
Armanouche Gasparian
Samuel Demarchi
Corbet Alois Georgeon
Michael Veillet-Guillem
MILM
41
0
0
17 Mar 2025
CAKE: Cascading and Adaptive KV Cache Eviction with Layer Preferences
CAKE: Cascading and Adaptive KV Cache Eviction with Layer Preferences
Ziran Qin
Yuchen Cao
Mingbao Lin
Wen Hu
Shixuan Fan
Ke Cheng
Weiyao Lin
Jianguo Li
71
3
0
16 Mar 2025
Ensemble Learning for Large Language Models in Text and Code Generation: A Survey
Ensemble Learning for Large Language Models in Text and Code Generation: A Survey
Mari Ashiga
Wei Jie
Fan Wu
Vardan K. Voskanyan
Fateme Dinmohammadi
P. Brookes
Jingzhi Gong
Zheng Wang
44
0
0
13 Mar 2025
Learning to Inference Adaptively for Multimodal Large Language Models
Learning to Inference Adaptively for Multimodal Large Language Models
Zhuoyan Xu
Khoi Duc Nguyen
Preeti Mukherjee
Saurabh Bagchi
Somali Chaterji
Yingyu Liang
Yin Li
LRM
49
1
0
13 Mar 2025
Efficient Many-Shot In-Context Learning with Dynamic Block-Sparse Attention
Efficient Many-Shot In-Context Learning with Dynamic Block-Sparse Attention
Emily Xiao
Chin-Jou Li
Yilin Zhang
Graham Neubig
Amanda Bertsch
BDL
75
0
0
11 Mar 2025
eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference
Suraiya Tairin
Shohaib Mahmud
Haiying Shen
Anand Iyer
MoE
176
0
0
10 Mar 2025
TokenButler: Token Importance is Predictable
Yash Akhauri
Ahmed F. AbouElhamayed
Yifei Gao
Chi-chih Chang
Nilesh Jain
Mohamed S. Abdelfattah
50
0
0
10 Mar 2025
Multimodal Emotion Recognition and Sentiment Analysis in Multi-Party Conversation Contexts
Aref Farhadipour
Hossein Ranjbar
Masoumeh Chapariniya
Teodora Vukovic
Sarah Ebling
Volker Dellwo
41
0
0
09 Mar 2025
Spectral Informed Mamba for Robust Point Cloud Processing
Spectral Informed Mamba for Robust Point Cloud Processing
Ali Bahri
Moslem Yazdanpanah
Mehrdad Noori
Sahar Dastani
Milad Cheraghalikhani
David Osowiechi
G. A. V. Hakim
Farzad Beizaee
Ismail ben Ayed
Christian Desrosiers
Mamba
3DPC
74
0
0
06 Mar 2025
SED2AM: Solving Multi-Trip Time-Dependent Vehicle Routing Problem using Deep Reinforcement Learning
Arash Mozhdehi
Yansen Wang
Sun Sun
Xin Wang
AI4TS
68
0
0
06 Mar 2025
L2^22M: Mutual Information Scaling Law for Long-Context Language Modeling
Zhuo Chen
Oriol Mayné i Comas
Zhuotao Jin
Di Luo
Marin Soljacic
67
1
0
06 Mar 2025
Boltzmann Attention Sampling for Image Analysis with Small Objects
Boltzmann Attention Sampling for Image Analysis with Small Objects
Theodore Zhao
Sid Kiblawi
Naoto Usuyama
Ho Hin Lee
Sam Preston
Hoifung Poon
Mu-Hsin Wei
MedIm
73
0
0
04 Mar 2025
Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer
Yujiao Yang
Jing Lian
Linhui Li
MoE
82
0
0
04 Mar 2025
Attention Condensation via Sparsity Induced Regularized Training
Eli Sason
Darya Frolova
Boris Nazarov
Felix Goldberd
204
0
0
03 Mar 2025
Prior-Fitted Networks Scale to Larger Datasets When Treated as Weak Learners
Yuxin Wang
Botian Jiang
Yiran Guo
Quan Gan
David Wipf
Xuanjing Huang
Xipeng Qiu
AI4CE
57
2
0
03 Mar 2025
SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures
SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures
Hui Liu
Chen Jia
Fan Shi
Xu Cheng
Shengyong Chen
Mamba
47
0
0
03 Mar 2025
Transformers with Joint Tokens and Local-Global Attention for Efficient Human Pose Estimation
K. A. Kinfu
René Vidal
ViT
26
0
0
28 Feb 2025
Training-free and Adaptive Sparse Attention for Efficient Long Video Generation
Training-free and Adaptive Sparse Attention for Efficient Long Video Generation
Yifei Xia
Suhan Ling
Fangcheng Fu
Yijiao Wang
Huixia Li
Xuefeng Xiao
Bin Cui
VGen
65
2
0
28 Feb 2025
FANformer: Improving Large Language Models Through Effective Periodicity Modeling
FANformer: Improving Large Language Models Through Effective Periodicity Modeling
Yihong Dong
Ge Li
Xue Jiang
Yongding Tao
Kechi Zhang
...
Huanyu Liu
Jiazheng Ding
Jia Li
Jinliang Deng
Hong Mei
AI4TS
41
0
0
28 Feb 2025
FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
Xunhao Lai
Jianqiao Lu
Yao Luo
Yiyuan Ma
Xun Zhou
71
5
0
28 Feb 2025
Beyond Worst-Case Dimensionality Reduction for Sparse Vectors
Beyond Worst-Case Dimensionality Reduction for Sparse Vectors
Sandeep Silwal
David P. Woodruff
Qiuyi Zhang
59
0
0
27 Feb 2025
Sliding Window Attention Training for Efficient Large Language Models
Sliding Window Attention Training for Efficient Large Language Models
Zichuan Fu
Wentao Song
Yansen Wang
X. Wu
Yefeng Zheng
Yingying Zhang
Derong Xu
Xuetao Wei
Tong Xu
Xiangyu Zhao
81
1
0
26 Feb 2025
Self-Adjust Softmax
Self-Adjust Softmax
Chuanyang Zheng
Yihang Gao
Guoxuan Chen
Han Shi
Jing Xiong
Xiaozhe Ren
Chao Huang
Xin Jiang
Zhiyu Li
Yu Li
50
0
0
25 Feb 2025
The Role of Sparsity for Length Generalization in Transformers
The Role of Sparsity for Length Generalization in Transformers
Noah Golowich
Samy Jelassi
David Brandfonbrener
Sham Kakade
Eran Malach
42
0
0
24 Feb 2025
RhythmFormer: Extracting Patterned rPPG Signals based on Periodic Sparse Attention
RhythmFormer: Extracting Patterned rPPG Signals based on Periodic Sparse Attention
Bochao Zou
Zizheng Guo
Jiansheng Chen
Junbao Zhuo
Weiran Huang
Huimin Ma
ViT
AI4TS
115
0
0
21 Feb 2025
1234...212223
Next