Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.10509
Cited By
Generating Long Sequences with Sparse Transformers
23 April 2019
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Generating Long Sequences with Sparse Transformers"
50 / 1,140 papers shown
Title
RhythmFormer: Extracting Patterned rPPG Signals based on Periodic Sparse Attention
Bochao Zou
Zizheng Guo
Jiansheng Chen
Junbao Zhuo
Weiran Huang
Huimin Ma
ViT
AI4TS
115
0
0
21 Feb 2025
Protein Large Language Models: A Comprehensive Survey
Yijia Xiao
Wanjia Zhao
Junkai Zhang
Yiqiao Jin
Han Zhang
...
Xiao Luo
Yu-Jie Zhang
James Zou
Yizhou Sun
Wei Wang
LM&MA
AI4CE
73
3
0
21 Feb 2025
Neural Attention Search
Difan Deng
Marius Lindauer
93
0
0
21 Feb 2025
Hybrid Offline-online Scheduling Method for Large Language Model Inference Optimization
Bowen Pang
Kai Li
Ruifeng She
Feifan Wang
OffRL
51
2
0
14 Feb 2025
A Survey on Mamba Architecture for Vision Applications
Fady Ibrahim
Guangjun Liu
Guanghui Wang
Mamba
62
3
0
11 Feb 2025
LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs
Sumin An
Junyoung Sung
Wonpyo Park
Chanjun Park
Paul Hongsuck Seo
102
0
0
10 Feb 2025
Context-Aware Hierarchical Merging for Long Document Summarization
Litu Ou
Mirella Lapata
MoMe
262
1
0
03 Feb 2025
Longer Attention Span: Increasing Transformer Context Length with Sparse Graph Processing Techniques
Nathaniel Tomczak
Sanmukh Kuppannagari
98
0
0
31 Jan 2025
ZETA: Leveraging Z-order Curves for Efficient Top-k Attention
Qiuhao Zeng
Jerry Huang
Peng Lu
Gezheng Xu
Boxing Chen
Charles Ling
Boyu Wang
57
1
0
24 Jan 2025
Parallel Sequence Modeling via Generalized Spatial Propagation Network
Hongjun Wang
Wonmin Byeon
Jiarui Xu
Liang Feng
Ka Chun Cheung
Xiaolong Wang
Kai Han
Jan Kautz
Sifei Liu
187
0
0
21 Jan 2025
Episodic Memories Generation and Evaluation Benchmark for Large Language Models
Alexis Huet
Zied Ben-Houidi
Dario Rossi
LLMAG
59
0
0
21 Jan 2025
ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language Models
Thibaut Thonet
Jos Rozen
Laurent Besacier
RALM
145
2
0
20 Jan 2025
Simplified and Generalized Masked Diffusion for Discrete Data
Jiaxin Shi
Kehang Han
Zehao Wang
Arnaud Doucet
Michalis K. Titsias
DiffM
90
63
0
17 Jan 2025
Likelihood Training of Cascaded Diffusion Models via Hierarchical Volume-preserving Maps
Henry Li
Ronen Basri
Y. Kluger
DiffM
62
2
0
13 Jan 2025
Tensor Product Attention Is All You Need
Yifan Zhang
Yifeng Liu
Huizhuo Yuan
Zhen Qin
Yang Yuan
Q. Gu
Andrew Chi-Chih Yao
96
9
0
11 Jan 2025
Hidden Entity Detection from GitHub Leveraging Large Language Models
Lu Gan
Martin Blum
Danilo Dessi
Brigitte Mathiak
Ralf Schenkel
Stefan Dietze
38
1
0
08 Jan 2025
Powerful Design of Small Vision Transformer on CIFAR10
Gent Wu
ViT
47
0
0
07 Jan 2025
Single-Channel Distance-Based Source Separation for Mobile GPU in Outdoor and Indoor Environments
Hanbin Bae
Byungjun Kang
Jiwon Kim
Jaeyong Hwang
Hosang Sung
Hoon-Young Cho
3DV
30
0
0
06 Jan 2025
Foundations of GenIR
Qingyao Ai
Jingtao Zhan
Yong-Jin Liu
51
0
0
06 Jan 2025
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
Di Liu
Meng Chen
Baotong Lu
Huiqiang Jiang
Zhenhua Han
...
Kaipeng Zhang
Chong Chen
Fan Yang
Yuqing Yang
Lili Qiu
60
30
0
03 Jan 2025
Attention Entropy is a Key Factor: An Analysis of Parallel Context Encoding with Full-attention-based Pre-trained Language Models
Zhisong Zhang
Yan Wang
Xinting Huang
Tianqing Fang
Han Zhang
Chenlong Deng
Shuaiyi Li
Dong Yu
90
2
0
21 Dec 2024
Advances in Transformers for Robotic Applications: A Review
Nikunj Sanghai
Nik Bear Brown
AI4CE
86
0
0
13 Dec 2024
Non-Normal Diffusion Models
Henry Li
VLM
DiffM
116
1
0
10 Dec 2024
Knowledge-Enhanced Conversational Recommendation via Transformer-based Sequential Modelling
Jie Zou
Aixin Sun
Cheng Long
Evangelos Kanoulas
LMTD
103
4
0
03 Dec 2024
Rank It, Then Ask It: Input Reranking for Maximizing the Performance of LLMs on Symmetric Tasks
Mohsen Dehghankar
Abolfazl Asudeh
69
1
0
30 Nov 2024
Does Self-Attention Need Separate Weights in Transformers?
Md. Kowsher
Nusrat Jahan Prottasha
Chun-Nam Yu
O. Garibay
Niloofar Yousefi
259
0
0
30 Nov 2024
StructFormer: Document Structure-based Masked Attention and its Impact on Language Model Pre-Training
Kaustubh Ponkshe
Venkatapathy Subramanian
Natwar Modani
Ganesh Ramakrishnan
72
0
0
25 Nov 2024
Selective Attention: Enhancing Transformer through Principled Context Control
Xuechen Zhang
Xiangyu Chang
Mingchen Li
A. Roy-Chowdhury
Jiacheng Chen
Samet Oymak
78
3
0
19 Nov 2024
Squeezed Attention: Accelerating Long Context Length LLM Inference
Coleman Hooper
Sehoon Kim
Hiva Mohammadzadeh
Monishwaran Maheswaran
June Paik
Michael W. Mahoney
Kemal Kurniawan
Amir Gholami
63
9
0
14 Nov 2024
TempCharBERT: Keystroke Dynamics for Continuous Access Control Based on Pre-trained Language Models
Matheus Simão
Fabiano Prado
Omar Abdul Wahab
Anderson Avila
26
0
0
11 Nov 2024
SPARTAN: A Sparse Transformer Learning Local Causation
Anson Lei
Bernhard Schölkopf
Ingmar Posner
52
2
0
11 Nov 2024
Reducing Distraction in Long-Context Language Models by Focused Learning
Zijun Wu
Bingyuan Liu
Ran Yan
Lei Chen
Thomas Delteil
RALM
44
2
0
08 Nov 2024
k
k
k
NN Attention Demystified: A Theoretical Exploration for Scalable Transformers
Themistoklis Haris
39
0
0
06 Nov 2024
LASER: Attention with Exponential Transformation
Sai Surya Duvvuri
Inderjit Dhillon
43
1
0
05 Nov 2024
LiVOS: Light Video Object Segmentation with Gated Linear Matching
Qin Liu
Jianfeng Wang
Zhiyong Yang
Linjie Li
Kevin Qinghong Lin
Marc Niethammer
Lijuan Wang
VOS
57
1
0
05 Nov 2024
The Evolution of RWKV: Advancements in Efficient Language Modeling
Akul Datta
VLM
50
1
0
05 Nov 2024
Training Compute-Optimal Protein Language Models
Xingyi Cheng
Bo Chen
Pan Li
Jing Gong
Jie Tang
Le Song
84
13
0
04 Nov 2024
Music Foundation Model as Generic Booster for Music Downstream Tasks
Weihsiang Liao
Yuhta Takida
Yukara Ikemiya
Zhi-Wei Zhong
Chieh-Hsin Lai
...
Stefan Uhlich
Taketo Akama
Woosung Choi
Yuichiro Koyama
Yuki Mitsufuji
56
0
0
02 Nov 2024
Context-Aware Token Selection and Packing for Enhanced Vision Transformer
Tianyi Zhang
B. Li
Jae-sun Seo
Yu Cao
38
0
0
31 Oct 2024
ALISE: Accelerating Large Language Model Serving with Speculative Scheduling
Youpeng Zhao
Jun Wang
37
0
0
31 Oct 2024
BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference
Junqi Zhao
Zhijin Fang
Shu Li
Shaohui Yang
Shichao He
42
2
0
30 Oct 2024
Scalable Message Passing Neural Networks: No Need for Attention in Large Graph Representation Learning
Haitz Sáez de Ocáriz Borde
Artem Lukoianov
Anastasis Kratsios
Michael M. Bronstein
Xiaowen Dong
GNN
43
1
0
29 Oct 2024
Long Sequence Modeling with Attention Tensorization: From Sequence to Tensor Learning
Aosong Feng
Rex Ying
Leandros Tassiulas
32
2
0
28 Oct 2024
Meta-Learning for Speeding Up Large Model Inference in Decentralized Environments
Yuzhe Yang
Yipeng Du
Ahmad Farhan
Claudio Angione
Yue Zhao
Harry Yang
Fielding Johnston
James Buban
Patrick Colangelo
29
0
0
28 Oct 2024
Bielik 7B v0.1: A Polish Language Model -- Development, Insights, and Evaluation
Krzysztof Ociepa
Łukasz Flis
Krzysztof Wróbel
Adrian Gwoździej
Remigiusz Kinas
27
1
0
24 Oct 2024
The Nature of Mathematical Modeling and Probabilistic Optimization Engineering in Generative AI
Fulu Li
22
0
0
24 Oct 2024
TabDPT: Scaling Tabular Foundation Models
Junwei Ma
Valentin Thomas
Rasa Hosseinzadeh
Hamidreza Kamkari
Alex Labach
Jesse C. Cresswell
Keyvan Golestan
Guangwei Yu
M. Volkovs
Anthony L. Caterini
LMTD
36
4
0
23 Oct 2024
CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts
Zhenpeng Su
Xing Wu
Zijia Lin
Yizhe Xiong
Minxuan Lv
Guangyuan Ma
Hui Chen
Songlin Hu
Guiguang Ding
MoE
29
3
0
21 Oct 2024
HyQE: Ranking Contexts with Hypothetical Query Embeddings
Weichao Zhou
Jiaxin Zhang
Hilaf Hasson
Anu Singh
Wenchao Li
RALM
30
1
0
20 Oct 2024
MoDification: Mixture of Depths Made Easy
C. Zhang
M. Zhong
Qimeng Wang
Xuantao Lu
Zheyu Ye
...
Yan Gao
Yao Hu
Kehai Chen
Min Zhang
Dawei Song
VLM
MoE
38
2
0
18 Oct 2024
Previous
1
2
3
4
5
...
21
22
23
Next