ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2007.14062
  4. Cited By
Big Bird: Transformers for Longer Sequences

Big Bird: Transformers for Longer Sequences

28 July 2020
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
Santiago Ontanon
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
    VLM
ArXivPDFHTML

Papers citing "Big Bird: Transformers for Longer Sequences"

50 / 345 papers shown
Title
UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding
  with Text-to-Text Language Models
UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models
Tianbao Xie
Chen Henry Wu
Peng Shi
Ruiqi Zhong
Torsten Scholak
...
Lingpeng Kong
Rui Zhang
Noah A. Smith
Luke Zettlemoyer
Tao Yu
LMTD
28
297
0
16 Jan 2022
Datasheet for the Pile
Datasheet for the Pile
Stella Biderman
Kieran Bicheno
Leo Gao
52
35
0
13 Jan 2022
A Unified Review of Deep Learning for Automated Medical Coding
A Unified Review of Deep Learning for Automated Medical Coding
Shaoxiong Ji
Wei Sun
Xiaobo Li
Hang Dong
Ara Taalas
Yijia Zhang
Honghan Wu
Esa Pitkänen
Pekka Marttinen
AI4TS
MedIm
24
27
0
08 Jan 2022
QuadTree Attention for Vision Transformers
QuadTree Attention for Vision Transformers
Shitao Tang
Jiahui Zhang
Siyu Zhu
Ping Tan
ViT
166
156
0
08 Jan 2022
Classification of Long Sequential Data using Circular Dilated
  Convolutional Neural Networks
Classification of Long Sequential Data using Circular Dilated Convolutional Neural Networks
Lei Cheng
Ruslan Khalitov
Tong Yu
Zhirong Yang
25
32
0
06 Jan 2022
Measuring Attribution in Natural Language Generation Models
Measuring Attribution in Natural Language Generation Models
Hannah Rashkin
Vitaly Nikolaev
Matthew Lamm
Lora Aroyo
Michael Collins
Dipanjan Das
Slav Petrov
Gaurav Singh Tomar
Iulia Turc
David Reitter
25
172
0
23 Dec 2021
Distilling the Knowledge of Romanian BERTs Using Multiple Teachers
Distilling the Knowledge of Romanian BERTs Using Multiple Teachers
Andrei-Marius Avram
Darius Catrina
Dumitru-Clementin Cercel
Mihai Dascualu
Traian Rebedea
Vasile Puaics
Dan Tufics
22
12
0
23 Dec 2021
Domain Adaptation with Pre-trained Transformers for Query Focused
  Abstractive Text Summarization
Domain Adaptation with Pre-trained Transformers for Query Focused Abstractive Text Summarization
Md Tahmid Rahman Laskar
Enamul Hoque
J. Huang
28
44
0
22 Dec 2021
UNIREX: A Unified Learning Framework for Language Model Rationale
  Extraction
UNIREX: A Unified Learning Framework for Language Model Rationale Extraction
Aaron Chan
Maziar Sanjabi
Lambert Mathias
L Tan
Shaoliang Nie
Xiaochang Peng
Xiang Ren
Hamed Firooz
38
41
0
16 Dec 2021
Long Context Question Answering via Supervised Contrastive Learning
Long Context Question Answering via Supervised Contrastive Learning
Avi Caciularu
Ido Dagan
Jacob Goldberger
Arman Cohan
RALM
19
23
0
16 Dec 2021
Unsupervised Matching of Data and Text
Unsupervised Matching of Data and Text
N. Ahmadi
H. Sand
Paolo Papotti
32
19
0
16 Dec 2021
LongT5: Efficient Text-To-Text Transformer for Long Sequences
LongT5: Efficient Text-To-Text Transformer for Long Sequences
Mandy Guo
Joshua Ainslie
David C. Uthus
Santiago Ontanon
Jianmo Ni
Yun-hsuan Sung
Yinfei Yang
VLM
31
307
0
15 Dec 2021
Roof-Transformer: Divided and Joined Understanding with Knowledge
  Enhancement
Roof-Transformer: Divided and Joined Understanding with Knowledge Enhancement
Wei-Lin Liao
Chengwei Su
Wei-Yun Ma
22
0
0
13 Dec 2021
Discourse-Aware Soft Prompting for Text Generation
Discourse-Aware Soft Prompting for Text Generation
Marjan Ghazvininejad
Vladimir Karpukhin
Vera Gor
Asli Celikyilmaz
25
6
0
10 Dec 2021
Self-attention Does Not Need $O(n^2)$ Memory
Self-attention Does Not Need O(n2)O(n^2)O(n2) Memory
M. Rabe
Charles Staats
LRM
18
139
0
10 Dec 2021
Couplformer:Rethinking Vision Transformer with Coupling Attention Map
Couplformer:Rethinking Vision Transformer with Coupling Attention Map
Hai Lan
Xihao Wang
Xian Wei
ViT
28
3
0
10 Dec 2021
Linear algebra with transformers
Linear algebra with transformers
Franccois Charton
AIMat
29
56
0
03 Dec 2021
MultiVerS: Improving scientific claim verification with weak supervision
  and full-document context
MultiVerS: Improving scientific claim verification with weak supervision and full-document context
David Wadden
Bertie Vidgen
Lucy Lu Wang
Dirk Hovy
J. Pierrehumbert
Hannaneh Hajishirzi
27
151
0
02 Dec 2021
Pixelated Butterfly: Simple and Efficient Sparse training for Neural
  Network Models
Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models
Tri Dao
Beidi Chen
Kaizhao Liang
Jiaming Yang
Zhao-quan Song
Atri Rudra
Christopher Ré
30
75
0
30 Nov 2021
Sparse Fusion for Multimodal Transformers
Sparse Fusion for Multimodal Transformers
Yi Ding
Alex Rich
Mason Wang
Noah Stier
M. Turk
P. Sen
Tobias Höllerer
ViT
27
7
0
23 Nov 2021
Can Vision Transformers Perform Convolution?
Can Vision Transformers Perform Convolution?
Shanda Li
Xiangning Chen
Di He
Cho-Jui Hsieh
ViT
41
19
0
02 Nov 2021
Comparative Study of Long Document Classification
Comparative Study of Long Document Classification
Vedangi Wagh
Snehal Khandve
Isha Joshi
Apurva Wani
Geetanjali Kale
Raviraj Joshi
24
25
0
01 Nov 2021
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel
  Training
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training
Yongbin Li
Hongxin Liu
Zhengda Bian
Boxiang Wang
Haichen Huang
Fan Cui
Chuan-Qing Wang
Yang You
GNN
19
143
0
28 Oct 2021
Transformer Acceleration with Dynamic Sparse Attention
Transformer Acceleration with Dynamic Sparse Attention
Liu Liu
Zheng Qu
Zhaodong Chen
Yufei Ding
Yuan Xie
19
20
0
21 Oct 2021
Contrastive Document Representation Learning with Graph Attention
  Networks
Contrastive Document Representation Learning with Graph Attention Networks
Peng-Tao Xu
Xinchi Chen
Xiaofei Ma
Zhiheng Huang
Bing Xiang
14
9
0
20 Oct 2021
ASFormer: Transformer for Action Segmentation
ASFormer: Transformer for Action Segmentation
Fangqiu Yi
Hongyu Wen
Tingting Jiang
ViT
73
172
0
16 Oct 2021
Hey AI, Can You Solve Complex Tasks by Talking to Agents?
Hey AI, Can You Solve Complex Tasks by Talking to Agents?
Tushar Khot
Kyle Richardson
Daniel Khashabi
Ashish Sabharwal
RALM
LRM
13
14
0
16 Oct 2021
Coherence boosting: When your pretrained language model is not paying
  enough attention
Coherence boosting: When your pretrained language model is not paying enough attention
Nikolay Malkin
Zhen Wang
Nebojsa Jojic
RALM
19
35
0
15 Oct 2021
MDERank: A Masked Document Embedding Rank Approach for Unsupervised
  Keyphrase Extraction
MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction
Linhan Zhang
Qian Chen
Wen Wang
Chong Deng
Shiliang Zhang
Bing Li
Wei Wang
Xin Cao
45
56
0
13 Oct 2021
Speech Summarization using Restricted Self-Attention
Speech Summarization using Restricted Self-Attention
Roshan S. Sharma
Shruti Palaskar
A. Black
Florian Metze
30
33
0
12 Oct 2021
Token Pooling in Vision Transformers
Token Pooling in Vision Transformers
D. Marin
Jen-Hao Rick Chang
Anurag Ranjan
Anish K. Prabhu
Mohammad Rastegari
Oncel Tuzel
ViT
76
66
0
08 Oct 2021
Language Modeling using LMUs: 10x Better Data Efficiency or Improved
  Scaling Compared to Transformers
Language Modeling using LMUs: 10x Better Data Efficiency or Improved Scaling Compared to Transformers
Narsimha Chilkuri
Eric Hunsberger
Aaron R. Voelker
G. Malik
C. Eliasmith
30
7
0
05 Oct 2021
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
Zhengyan Zhang
Yankai Lin
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou
MoE
27
117
0
05 Oct 2021
LexGLUE: A Benchmark Dataset for Legal Language Understanding in English
LexGLUE: A Benchmark Dataset for Legal Language Understanding in English
Ilias Chalkidis
Abhik Jana
D. Hartung
M. Bommarito
Ion Androutsopoulos
Daniel Martin Katz
Nikolaos Aletras
AILaw
ELM
130
248
0
03 Oct 2021
MiRANews: Dataset and Benchmarks for Multi-Resource-Assisted News
  Summarization
MiRANews: Dataset and Benchmarks for Multi-Resource-Assisted News Summarization
Xinnuo Xu
Ondrej Dusek
Shashi Narayan
Verena Rieser
Ioannis Konstas
HILM
28
6
0
22 Sep 2021
Do Long-Range Language Models Actually Use Long-Range Context?
Do Long-Range Language Models Actually Use Long-Range Context?
Simeng Sun
Kalpesh Krishna
Andrew Mattarella-Micke
Mohit Iyyer
RALM
25
80
0
19 Sep 2021
SHAPE: Shifted Absolute Position Embedding for Transformers
SHAPE: Shifted Absolute Position Embedding for Transformers
Shun Kiyono
Sosuke Kobayashi
Jun Suzuki
Kentaro Inui
233
45
0
13 Sep 2021
An Exploratory Study on Long Dialogue Summarization: What Works and
  What's Next
An Exploratory Study on Long Dialogue Summarization: What Works and What's Next
Yusen Zhang
Ansong Ni
Tao Yu
Rui Zhang
Chenguang Zhu
Budhaditya Deb
Asli Celikyilmaz
Ahmed Hassan Awadallah
Dragomir R. Radev
RALM
72
56
0
10 Sep 2021
Sparsity and Sentence Structure in Encoder-Decoder Attention of
  Summarization Systems
Sparsity and Sentence Structure in Encoder-Decoder Attention of Summarization Systems
Potsawee Manakul
Mark J. F. Gales
13
5
0
08 Sep 2021
ProoFVer: Natural Logic Theorem Proving for Fact Verification
ProoFVer: Natural Logic Theorem Proving for Fact Verification
Amrith Krishna
Sebastian Riedel
Andreas Vlachos
21
61
0
25 Aug 2021
Greenformers: Improving Computation and Memory Efficiency in Transformer
  Models via Low-Rank Approximation
Greenformers: Improving Computation and Memory Efficiency in Transformer Models via Low-Rank Approximation
Samuel Cahyawijaya
26
12
0
24 Aug 2021
sigmoidF1: A Smooth F1 Score Surrogate Loss for Multilabel
  Classification
sigmoidF1: A Smooth F1 Score Surrogate Loss for Multilabel Classification
Gabriel Bénédict
Vincent Koops
Daan Odijk
Maarten de Rijke
29
30
0
24 Aug 2021
Regularizing Transformers With Deep Probabilistic Layers
Regularizing Transformers With Deep Probabilistic Layers
Aurora Cobo Aguilera
Pablo Martínez Olmos
Antonio Artés-Rodríguez
Fernando Pérez-Cruz
21
7
0
23 Aug 2021
Fastformer: Additive Attention Can Be All You Need
Fastformer: Additive Attention Can Be All You Need
Chuhan Wu
Fangzhao Wu
Tao Qi
Yongfeng Huang
Xing Xie
46
117
0
20 Aug 2021
Making Transformers Solve Compositional Tasks
Making Transformers Solve Compositional Tasks
Santiago Ontañón
Joshua Ainslie
Vaclav Cvicek
Zachary Kenneth Fisher
33
70
0
09 Aug 2021
CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale
  Attention
CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention
Wenxiao Wang
Lulian Yao
Long Chen
Binbin Lin
Deng Cai
Xiaofei He
Wei Liu
32
257
0
31 Jul 2021
PKSpell: Data-Driven Pitch Spelling and Key Signature Estimation
PKSpell: Data-Driven Pitch Spelling and Key Signature Estimation
Francesco Foscarin
Nicolas Audebert
Raphaël Fournier-S’niehotta
15
11
0
27 Jul 2021
Graph-free Multi-hop Reading Comprehension: A Select-to-Guide Strategy
Graph-free Multi-hop Reading Comprehension: A Select-to-Guide Strategy
Bohong Wu
Zhuosheng Zhang
Hai Zhao
24
20
0
25 Jul 2021
Video Crowd Localization with Multi-focus Gaussian Neighborhood
  Attention and a Large-Scale Benchmark
Video Crowd Localization with Multi-focus Gaussian Neighborhood Attention and a Large-Scale Benchmark
Haopeng Li
Lingbo Liu
Kunlin Yang
Shinan Liu
Junyuan Gao
Bin Zhao
Rui Zhang
Jun Hou
44
14
0
19 Jul 2021
Combiner: Full Attention Transformer with Sparse Computation Cost
Combiner: Full Attention Transformer with Sparse Computation Cost
Hongyu Ren
H. Dai
Zihang Dai
Mengjiao Yang
J. Leskovec
Dale Schuurmans
Bo Dai
76
77
0
12 Jul 2021
Previous
1234567
Next