Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2007.14062
Cited By
Big Bird: Transformers for Longer Sequences
28 July 2020
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
Santiago Ontanon
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Big Bird: Transformers for Longer Sequences"
50 / 312 papers shown
Title
A Split-then-Join Approach to Abstractive Summarization for Very Long Documents in a Low Resource Setting
Lhuqita Fazry
VLM
25
0
0
11 May 2025
Learn to Think: Bootstrapping LLM Reasoning Capability Through Graph Learning
Hang Gao
Chenhao Zhang
Tie Wang
Junsuo Zhao
Fengge Wu
Changwen Zheng
Huaping Liu
LRM
29
0
0
09 May 2025
T-T: Table Transformer for Tagging-based Aspect Sentiment Triplet Extraction
Kun Peng
Chaodong Tong
Cong Cao
Hao Peng
Q. Li
Guanlin Wu
Lei Jiang
Yanbing Liu
Philip S. Yu
LMTD
48
0
0
08 May 2025
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
Piotr Piekos
Róbert Csordás
Jürgen Schmidhuber
MoE
VLM
96
1
0
01 May 2025
SFi-Former: Sparse Flow Induced Attention for Graph Transformer
Z. Li
J. Q. Shi
X. Zhang
Miao Zhang
B. Li
44
0
0
29 Apr 2025
Efficient Pretraining Length Scaling
Bohong Wu
Shen Yan
Sijun Zhang
Jianqiao Lu
Yutao Zeng
Ya Wang
Xun Zhou
127
0
0
21 Apr 2025
A Hybrid Architecture with Efficient Fine Tuning for Abstractive Patent Document Summarization
Nevidu Jayatilleke
Ruvan Weerasinghe
AILaw
77
0
0
13 Mar 2025
Attention Condensation via Sparsity Induced Regularized Training
Eli Sason
Darya Frolova
Boris Nazarov
Felix Goldberd
175
0
0
03 Mar 2025
Revisiting Convolution Architecture in the Realm of DNA Foundation Models
Yu Bo
Weian Mao
Yanjun Shao
Weiqiang Bai
Peng Ye
Xinzhu Ma
Junbo Zhao
Hao Chen
Chunhua Shen
3DV
58
1
0
25 Feb 2025
LongAttn: Selecting Long-context Training Data via Token-level Attention
Longyun Wu
Dawei Zhu
Guangxiang Zhao
Zhuocheng Yu
Junfeng Ran
Xiangyu Wong
Lin Sun
Sujian Li
41
0
0
24 Feb 2025
AttentionEngine: A Versatile Framework for Efficient Attention Mechanisms on Diverse Hardware Platforms
Feiyang Chen
Yu Cheng
Lei Wang
Yuqing Xia
Ziming Miao
...
Fan Yang
J. Xue
Zhi Yang
M. Yang
H. Chen
71
1
0
24 Feb 2025
A Survey of Graph Transformers: Architectures, Theories and Applications
Chaohao Yuan
Kangfei Zhao
Ercan Engin Kuruoglu
Liang Wang
Tingyang Xu
Wenbing Huang
Deli Zhao
Hong Cheng
Yu Rong
51
4
0
23 Feb 2025
AttentionSmithy: A Modular Framework for Rapid Transformer Development and Customization
Caleb Cranney
Jesse G. Meyer
85
0
0
13 Feb 2025
MTPChat: A Multimodal Time-Aware Persona Dataset for Conversational Agents
Wanqi Yang
Y. Li
Meng Fang
L. Chen
59
1
0
09 Feb 2025
LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models
Tzu-Tao Chang
Shivaram Venkataraman
VLM
162
0
0
04 Feb 2025
Longer Attention Span: Increasing Transformer Context Length with Sparse Graph Processing Techniques
Nathaniel Tomczak
Sanmukh Kuppannagari
91
0
0
31 Jan 2025
Understanding Why Adam Outperforms SGD: Gradient Heterogeneity in Transformers
Akiyoshi Tomihari
Issei Sato
ODL
59
0
0
31 Jan 2025
ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language Models
Thibaut Thonet
Jos Rozen
Laurent Besacier
RALM
132
2
0
20 Jan 2025
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
Di Liu
Meng Chen
Baotong Lu
Huiqiang Jiang
Zhenhua Han
...
K. Zhang
C. L. P. Chen
Fan Yang
Y. Yang
Lili Qiu
46
29
0
03 Jan 2025
Deep Neural Networks and Brain Alignment: Brain Encoding and Decoding (Survey)
S. Oota
Zijiao Chen
Manish Gupta
R. Bapi
G. Jobard
F. Alexandre
X. Hinaut
3DV
AI4CE
49
11
0
31 Dec 2024
ForPKG: A Framework for Constructing Forestry Policy Knowledge Graph and Application Analysis
Jingyun Sun
Zhongze Luo
42
0
0
17 Nov 2024
TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
Wei Yu Wu
Zhuoshi Pan
Chao Wang
L. Chen
Y. Bai
Kun Fu
Z. Wang
Hui Xiong
Hui Xiong
LLMAG
34
5
0
05 Nov 2024
Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large Language Models
Qitan Lv
Jie Wang
Hanzhu Chen
Bin Li
Yongdong Zhang
Feng Wu
HILM
21
3
0
19 Oct 2024
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
Yizhao Gao
Zhichen Zeng
Dayou Du
Shijie Cao
Hayden Kwok-Hay So
...
Junjie Lai
Mao Yang
Ting Cao
Fan Yang
M. Yang
50
18
0
17 Oct 2024
Token Pruning using a Lightweight Background Aware Vision Transformer
Sudhakar Sah
Ravish Kumar
Honnesh Rohmetra
Ehsan Saboori
ViT
21
0
0
12 Oct 2024
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity
Mutian He
Philip N. Garner
80
0
0
09 Oct 2024
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
59
15
0
06 Oct 2024
Selective Attention Improves Transformer
Yaniv Leviathan
Matan Kalman
Yossi Matias
49
8
0
03 Oct 2024
House of Cards: Massive Weights in LLMs
Jaehoon Oh
Seungjun Shin
Dokwan Oh
35
1
0
02 Oct 2024
Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
Jinghan Yao
Sam Ade Jacobs
Masahiro Tanaka
Olatunji Ruwase
A. Shafi
D. Panda
28
2
0
30 Aug 2024
Snuffy: Efficient Whole Slide Image Classifier
Hossein Jafarinia
Alireza Alipanah
Danial Hamdi
Saeed Razavi
Nahal Mirzaie
M. Rohban
3DH
42
1
0
15 Aug 2024
Sampling Foundational Transformer: A Theoretical Perspective
Viet Anh Nguyen
Minh Lenhat
Khoa Nguyen
Duong Duc Hieu
Dao Huu Hung
Truong Son-Hy
44
0
0
11 Aug 2024
Explanation Regularisation through the Lens of Attributions
Pedro Ferreira
Wilker Aziz
Ivan Titov
36
1
0
23 Jul 2024
Genomic Language Models: Opportunities and Challenges
Gonzalo Benegas
Chengzhong Ye
C. Albors
Jianan Canal Li
Yun S. Song
AI4CE
LM&MA
ELM
41
18
0
16 Jul 2024
HDT: Hierarchical Document Transformer
Haoyu He
Markus Flicke
Jan Buchmann
Iryna Gurevych
Andreas Geiger
35
0
0
11 Jul 2024
LPViT: Low-Power Semi-structured Pruning for Vision Transformers
Kaixin Xu
Zhe Wang
Chunyun Chen
Xue Geng
Jie Lin
Xulei Yang
Min-man Wu
Min Wu
Xiaoli Li
Weisi Lin
ViT
VLM
43
7
0
02 Jul 2024
eagerlearners at SemEval2024 Task 5: The Legal Argument Reasoning Task in Civil Procedure
Hoorieh Sabzevari
Mohammadmostafa Rostamkhani
Sauleh Eetemadi
AILaw
ELM
34
0
0
24 Jun 2024
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack
Yuri Kuratov
Aydar Bulatov
Petr Anokhin
Ivan Rodkin
Dmitry Sorokin
Artyom Sorokin
Mikhail Burtsev
RALM
ALM
LRM
ReLM
ELM
44
58
0
14 Jun 2024
DR-RAG: Applying Dynamic Document Relevance to Retrieval-Augmented Generation for Question-Answering
Zijian Hei
Weiling Liu
Wenjie Ou
Juyi Qiao
Junming Jiao
Guowen Song
Ting Tian
Yi Lin
RALM
38
5
0
11 Jun 2024
Leveraging Large Language Models for Efficient Failure Analysis in Game Development
Leonardo Marini
Linus Gisslén
Alessandro Sestini
43
0
0
11 Jun 2024
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Liliang Ren
Yang Liu
Yadong Lu
Yelong Shen
Chen Liang
Weizhu Chen
Mamba
66
55
0
11 Jun 2024
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Zeyue Tian
Zhaoyang Liu
Ruibin Yuan
Jiahao Pan
Xiaoqiang Huang
Xu Tan
Xu Tan
Qifeng Chen
Y. Guo
VGen
100
16
0
06 Jun 2024
Wavelet-Based Image Tokenizer for Vision Transformers
Zhenhai Zhu
Radu Soricut
ViT
42
3
0
28 May 2024
SWAT: Scalable and Efficient Window Attention-based Transformers Acceleration on FPGAs
Zhenyu Bai
Pranav Dangi
Huize Li
Tulika Mitra
29
5
0
27 May 2024
Scorch: A Library for Sparse Deep Learning
Bobby Yan
Alexander J. Root
Trevor Gale
David Broman
Fredrik Kjolstad
25
0
0
27 May 2024
Exploration of Masked and Causal Language Modelling for Text Generation
Nicolo Micheletti
Samuel Belkadi
Lifeng Han
Goran Nenadic
44
6
0
21 May 2024
Whole Genome Transformer for Gene Interaction Effects in Microbiome Habitat Specificity
Zhufeng Li
S. S. Cranganore
Nicholas D. Youngblut
Niki Kilbertus
47
2
0
09 May 2024
EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization
Jianzong Wang
Ziqi Liang
Xulong Zhang
Ning Cheng
Jing Xiao
30
0
0
30 Apr 2024
PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering
Yihao Ding
Kaixuan Ren
Jiabin Huang
Siwen Luo
S. Han
35
1
0
19 Apr 2024
Auto-configuring Exploration-Exploitation Tradeoff in Evolutionary Computation via Deep Reinforcement Learning
Zeyuan Ma
Jiacheng Chen
Hongshu Guo
Yining Ma
Yue-jiao Gong
38
6
0
12 Apr 2024
1
2
3
4
5
6
7
Next