ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1911.05507
  4. Cited By
Compressive Transformers for Long-Range Sequence Modelling

Compressive Transformers for Long-Range Sequence Modelling

13 November 2019
Jack W. Rae
Anna Potapenko
Siddhant M. Jayakumar
Timothy Lillicrap
    RALMVLMKELM
ArXiv (abs)PDFHTML

Papers citing "Compressive Transformers for Long-Range Sequence Modelling"

50 / 232 papers shown
Title
General-purpose, long-context autoregressive modeling with Perceiver AR
General-purpose, long-context autoregressive modeling with Perceiver AR
Curtis Hawthorne
Andrew Jaegle
Cătălina Cangea
Sebastian Borgeaud
C. Nash
...
Hannah R. Sheahan
Neil Zeghidour
Jean-Baptiste Alayrac
João Carreira
Jesse Engel
115
66
0
15 Feb 2022
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient
  Long-Term Video Recognition
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
Chao-Yuan Wu
Yanghao Li
K. Mangalam
Haoqi Fan
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
127
201
0
20 Jan 2022
Datasheet for the Pile
Datasheet for the Pile
Stella Biderman
Kieran Bicheno
Leo Gao
102
36
0
13 Jan 2022
Simple Local Attentions Remain Competitive for Long-Context Tasks
Simple Local Attentions Remain Competitive for Long-Context Tasks
Wenhan Xiong
Barlas Ouguz
Anchit Gupta
Xilun Chen
Diana Liskovich
Omer Levy
Wen-tau Yih
Yashar Mehdad
99
29
0
14 Dec 2021
Couplformer:Rethinking Vision Transformer with Coupling Attention Map
Couplformer:Rethinking Vision Transformer with Coupling Attention Map
Hai Lan
Xihao Wang
Xian Wei
ViT
89
3
0
10 Dec 2021
Sparse Fusion for Multimodal Transformers
Sparse Fusion for Multimodal Transformers
Yi Ding
Alex Rich
Mason Wang
Noah Stier
M. Turk
P. Sen
Tobias Höllerer
ViT
60
7
0
23 Nov 2021
GNN-LM: Language Modeling based on Global Contexts via GNN
GNN-LM: Language Modeling based on Global Contexts via GNN
Yuxian Meng
Shi Zong
Xiaoya Li
Xiaofei Sun
Tianwei Zhang
Leilei Gan
Jiwei Li
LRM
127
39
0
17 Oct 2021
Speech Summarization using Restricted Self-Attention
Speech Summarization using Restricted Self-Attention
Roshan S. Sharma
Shruti Palaskar
A. Black
Florian Metze
58
34
0
12 Oct 2021
Token Pooling in Vision Transformers
Token Pooling in Vision Transformers
D. Marin
Jen-Hao Rick Chang
Anurag Ranjan
Anish K. Prabhu
Mohammad Rastegari
Oncel Tuzel
ViT
143
71
0
08 Oct 2021
ABC: Attention with Bounded-memory Control
ABC: Attention with Bounded-memory Control
Hao Peng
Jungo Kasai
Nikolaos Pappas
Dani Yogatama
Zhaofeng Wu
Lingpeng Kong
Roy Schwartz
Noah A. Smith
125
22
0
06 Oct 2021
Survey: Transformer based Video-Language Pre-training
Survey: Transformer based Video-Language Pre-training
Ludan Ruan
Qin Jin
VLMViT
125
45
0
21 Sep 2021
Do Long-Range Language Models Actually Use Long-Range Context?
Do Long-Range Language Models Actually Use Long-Range Context?
Simeng Sun
Kalpesh Krishna
Andrew Mattarella-Micke
Mohit Iyyer
RALM
91
84
0
19 Sep 2021
Primer: Searching for Efficient Transformers for Language Modeling
Primer: Searching for Efficient Transformers for Language Modeling
David R. So
Wojciech Mañke
Hanxiao Liu
Zihang Dai
Noam M. Shazeer
Quoc V. Le
VLM
277
156
0
17 Sep 2021
Space Time Recurrent Memory Network
Space Time Recurrent Memory Network
Hung-Cuong Nguyen
Chanho Kim
Fuxin Li
139
3
0
14 Sep 2021
$\infty$-former: Infinite Memory Transformer
∞\infty∞-former: Infinite Memory Transformer
Pedro Henrique Martins
Zita Marinho
André F. T. Martins
98
11
0
01 Sep 2021
LOT: A Story-Centric Benchmark for Evaluating Chinese Long Text
  Understanding and Generation
LOT: A Story-Centric Benchmark for Evaluating Chinese Long Text Understanding and Generation
Jian Guan
Zhuoer Feng
Yamei Chen
Ru He
Xiaoxi Mao
Changjie Fan
Minlie Huang
120
33
0
30 Aug 2021
Train Short, Test Long: Attention with Linear Biases Enables Input
  Length Extrapolation
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Ofir Press
Noah A. Smith
M. Lewis
350
779
0
27 Aug 2021
Greenformers: Improving Computation and Memory Efficiency in Transformer
  Models via Low-Rank Approximation
Greenformers: Improving Computation and Memory Efficiency in Transformer Models via Low-Rank Approximation
Samuel Cahyawijaya
103
12
0
24 Aug 2021
Making Transformers Solve Compositional Tasks
Making Transformers Solve Compositional Tasks
Santiago Ontañón
Joshua Ainslie
Vaclav Cvicek
Zachary Kenneth Fisher
109
74
0
09 Aug 2021
FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks
FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks
Sheng-Chun Kao
Suvinay Subramanian
Gaurav Agrawal
Amir Yazdanbakhsh
T. Krishna
130
64
0
13 Jul 2021
Combiner: Full Attention Transformer with Sparse Computation Cost
Combiner: Full Attention Transformer with Sparse Computation Cost
Hongyu Ren
H. Dai
Zihang Dai
Mengjiao Yang
J. Leskovec
Dale Schuurmans
Bo Dai
166
80
0
12 Jul 2021
Long Short-Term Transformer for Online Action Detection
Long Short-Term Transformer for Online Action Detection
Mingze Xu
Yuanjun Xiong
Hao Chen
Xinyu Li
Wei Xia
Zhuowen Tu
Stefano Soatto
ViT
154
137
0
07 Jul 2021
Long-Short Transformer: Efficient Transformers for Language and Vision
Long-Short Transformer: Efficient Transformers for Language and Vision
Chen Zhu
Ming-Yu Liu
Chaowei Xiao
Mohammad Shoeybi
Tom Goldstein
Anima Anandkumar
Bryan Catanzaro
ViTVLM
119
133
0
05 Jul 2021
CSWin Transformer: A General Vision Transformer Backbone with
  Cross-Shaped Windows
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
Xiaoyi Dong
Jianmin Bao
Dongdong Chen
Weiming Zhang
Nenghai Yu
Lu Yuan
Dong Chen
B. Guo
ViT
187
995
0
01 Jul 2021
Focal Self-attention for Local-Global Interactions in Vision
  Transformers
Focal Self-attention for Local-Global Interactions in Vision Transformers
Jianwei Yang
Chunyuan Li
Pengchuan Zhang
Xiyang Dai
Bin Xiao
Lu Yuan
Jianfeng Gao
ViT
86
437
0
01 Jul 2021
Sawtooth Factorial Topic Embeddings Guided Gamma Belief Network
Sawtooth Factorial Topic Embeddings Guided Gamma Belief Network
Zhibin Duan
Dongsheng Wang
Bo Chen
Chaojie Wang
Wenchao Chen
Yewen Li
Jie Ren
Mingyuan Zhou
BDL
98
42
0
30 Jun 2021
What Context Features Can Transformer Language Models Use?
What Context Features Can Transformer Language Models Use?
J. O'Connor
Jacob Andreas
KELM
77
79
0
15 Jun 2021
An Automated Quality Evaluation Framework of Psychotherapy Conversations
  with Local Quality Estimates
An Automated Quality Evaluation Framework of Psychotherapy Conversations with Local Quality Estimates
Zhuohao Chen
Nikolaos Flemotomos
Karan Singla
Torrey A. Creed
David C. Atkins
Shrikanth Narayanan
65
5
0
15 Jun 2021
Going Beyond Linear Transformers with Recurrent Fast Weight Programmers
Going Beyond Linear Transformers with Recurrent Fast Weight Programmers
Kazuki Irie
Imanol Schlag
Róbert Csordás
Jürgen Schmidhuber
118
64
0
11 Jun 2021
A Survey of Transformers
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
202
1,150
0
08 Jun 2021
Chasing Sparsity in Vision Transformers: An End-to-End Exploration
Chasing Sparsity in Vision Transformers: An End-to-End Exploration
Tianlong Chen
Yu Cheng
Zhe Gan
Lu Yuan
Lei Zhang
Zhangyang Wang
ViT
70
224
0
08 Jun 2021
Luna: Linear Unified Nested Attention
Luna: Linear Unified Nested Attention
Xuezhe Ma
Xiang Kong
Sinong Wang
Chunting Zhou
Jonathan May
Hao Ma
Luke Zettlemoyer
91
113
0
03 Jun 2021
Learning to Rehearse in Long Sequence Memorization
Learning to Rehearse in Long Sequence Memorization
Zhu Zhang
Chang Zhou
Jianxin Ma
Zhijie Lin
Jingren Zhou
Hongxia Yang
Zhou Zhao
RALM
33
9
0
02 Jun 2021
Less is More: Pay Less Attention in Vision Transformers
Less is More: Pay Less Attention in Vision Transformers
Zizheng Pan
Bohan Zhuang
Haoyu He
Jing Liu
Jianfei Cai
ViT
141
87
0
29 May 2021
An Attention Free Transformer
An Attention Free Transformer
Shuangfei Zhai
Walter A. Talbott
Nitish Srivastava
Chen Huang
Hanlin Goh
Ruixiang Zhang
J. Susskind
ViT
94
132
0
28 May 2021
Towards mental time travel: a hierarchical memory for reinforcement
  learning agents
Towards mental time travel: a hierarchical memory for reinforcement learning agents
Andrew Kyle Lampinen
Stephanie C. Y. Chan
Andrea Banino
Felix Hill
92
47
0
28 May 2021
Unsupervised Pronoun Resolution via Masked Noun-Phrase Prediction
Unsupervised Pronoun Resolution via Masked Noun-Phrase Prediction
Minghan Shen
Pratyay Banerjee
Chitta Baral
SSL
55
5
0
26 May 2021
ReadTwice: Reading Very Large Documents with Memories
ReadTwice: Reading Very Large Documents with Memories
Yury Zemlyanskiy
Joshua Ainslie
Michiel de Jong
Philip Pham
Ilya Eckstein
Fei Sha
AIMatRALM
88
18
0
10 May 2021
Efficient Transformers in Reinforcement Learning using Actor-Learner
  Distillation
Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation
Emilio Parisotto
Ruslan Salakhutdinov
108
46
0
04 Apr 2021
Multi-Scale Vision Longformer: A New Vision Transformer for
  High-Resolution Image Encoding
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding
Pengchuan Zhang
Xiyang Dai
Jianwei Yang
Bin Xiao
Lu Yuan
Lei Zhang
Jianfeng Gao
ViT
116
337
0
29 Mar 2021
A Practical Survey on Faster and Lighter Transformers
A Practical Survey on Faster and Lighter Transformers
Quentin Fournier
G. Caron
Daniel Aloise
137
104
0
26 Mar 2021
Finetuning Pretrained Transformers into RNNs
Finetuning Pretrained Transformers into RNNs
Jungo Kasai
Hao Peng
Yizhe Zhang
Dani Yogatama
Gabriel Ilharco
Nikolaos Pappas
Yi Mao
Weizhu Chen
Noah A. Smith
114
67
0
24 Mar 2021
Scalable Vision Transformers with Hierarchical Pooling
Scalable Vision Transformers with Hierarchical Pooling
Zizheng Pan
Bohan Zhuang
Jing Liu
Haoyu He
Jianfei Cai
ViT
95
130
0
19 Mar 2021
Hurdles to Progress in Long-form Question Answering
Hurdles to Progress in Long-form Question Answering
Kalpesh Krishna
Aurko Roy
Mohit Iyyer
72
200
0
10 Mar 2021
Perceiver: General Perception with Iterative Attention
Perceiver: General Perception with Iterative Attention
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLMViTMDE
214
1,030
0
04 Mar 2021
Random Feature Attention
Random Feature Attention
Hao Peng
Nikolaos Pappas
Dani Yogatama
Roy Schwartz
Noah A. Smith
Lingpeng Kong
136
362
0
03 Mar 2021
Coordination Among Neural Modules Through a Shared Global Workspace
Coordination Among Neural Modules Through a Shared Global Workspace
Anirudh Goyal
Aniket Didolkar
Alex Lamb
Kartikeya Badola
Nan Rosemary Ke
Nasim Rahaman
Jonathan Binas
Charles Blundell
Michael C. Mozer
Yoshua Bengio
226
99
0
01 Mar 2021
Linear Transformers Are Secretly Fast Weight Programmers
Linear Transformers Are Secretly Fast Weight Programmers
Imanol Schlag
Kazuki Irie
Jürgen Schmidhuber
153
252
0
22 Feb 2021
Dynamic Memory based Attention Network for Sequential Recommendation
Dynamic Memory based Attention Network for Sequential Recommendation
Qiaoyu Tan
Jianwei Zhang
Ninghao Liu
Xiao Shi Huang
Hongxia Yang
Jingren Zhou
Helen Zhou
HAI
168
66
0
18 Feb 2021
Thank you for Attention: A survey on Attention-based Artificial Neural
  Networks for Automatic Speech Recognition
Thank you for Attention: A survey on Attention-based Artificial Neural Networks for Automatic Speech Recognition
Priyabrata Karmakar
S. Teng
Guojun Lu
50
27
0
14 Feb 2021
Previous
12345
Next