Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1911.05507
Cited By
Compressive Transformers for Long-Range Sequence Modelling
13 November 2019
Jack W. Rae
Anna Potapenko
Siddhant M. Jayakumar
Timothy Lillicrap
RALM
VLM
KELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Compressive Transformers for Long-Range Sequence Modelling"
50 / 232 papers shown
Title
The Stable Entropy Hypothesis and Entropy-Aware Decoding: An Analysis and Algorithm for Robust Natural Language Generation
Kushal Arora
Timothy J. O'Donnell
Doina Precup
Jason Weston
Jackie C.K.Cheung
66
2
0
14 Feb 2023
Symbolic Discovery of Optimization Algorithms
Xiangning Chen
Chen Liang
Da Huang
Esteban Real
Kaiyuan Wang
...
Xuanyi Dong
Thang Luong
Cho-Jui Hsieh
Yifeng Lu
Quoc V. Le
178
383
0
13 Feb 2023
In-Context Learning with Many Demonstration Examples
Mukai Li
Shansan Gong
Jiangtao Feng
Yiheng Xu
Jinchao Zhang
Zhiyong Wu
Lingpeng Kong
111
38
0
09 Feb 2023
GLADIS: A General and Large Acronym Disambiguation Benchmark
Lihu Chen
Gaël Varoquaux
Fabian M. Suchanek
ELM
105
4
0
03 Feb 2023
Exploring the Constructicon: Linguistic Analysis of a Computational CxG
Jonathan Dunn
71
5
0
30 Jan 2023
Memory Augmented Lookup Dictionary based Language Modeling for Automatic Speech Recognition
Yukun Feng
Ming Tu
Rui Xia
Chuanzeng Huang
Yuxuan Wang
RALM
88
0
0
30 Dec 2022
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Daniel Y. Fu
Tri Dao
Khaled Kamal Saab
A. Thomas
Atri Rudra
Christopher Ré
159
404
0
28 Dec 2022
Scalable Adaptive Computation for Iterative Generation
Allan Jabri
David Fleet
Ting-Li Chen
DiffM
78
117
0
22 Dec 2022
Training Trajectories of Language Models Across Scales
Mengzhou Xia
Mikel Artetxe
Chunting Zhou
Xi Lin
Ramakanth Pasunuru
Danqi Chen
Luke Zettlemoyer
Ves Stoyanov
AIFin
LRM
98
64
0
19 Dec 2022
Full Contextual Attention for Multi-resolution Transformers in Semantic Segmentation
Loic Themyr
Clément Rambour
Nicolas Thome
Toby Collins
Alexandre Hostettler
ViT
49
10
0
15 Dec 2022
Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention
Zineng Tang
Jaemin Cho
Jie Lei
Joey Tianyi Zhou
VLM
84
9
0
21 Nov 2022
Token Turing Machines
Michael S. Ryoo
K. Gopalakrishnan
Kumara Kahatapitiya
Ted Xiao
Kanishka Rao
Austin Stone
Yao Lu
Julian Ibarz
Anurag Arnab
61
21
0
16 Nov 2022
XNOR-FORMER: Learning Accurate Approximations in Long Speech Transformers
Roshan S. Sharma
Bhiksha Raj
50
3
0
29 Oct 2022
Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation
Botao Yu
Peiling Lu
Rui Wang
Wei Hu
Xu Tan
Wei Ye
Shikun Zhang
Tao Qin
Tie-Yan Liu
MGen
104
60
0
19 Oct 2022
CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling
Jinchao Zhang
Shuyang Jiang
Jiangtao Feng
Lin Zheng
Dianbo Sui
3DV
197
9
0
14 Oct 2022
LSG Attention: Extrapolation of pretrained Transformers to long sequences
Charles Condevaux
S. Harispe
84
24
0
13 Oct 2022
Memory transformers for full context and high-resolution 3D Medical Segmentation
Loic Themyr
Clément Rambour
Nicolas Thome
Toby Collins
Alexandre Hostettler
ViT
MedIm
53
5
0
11 Oct 2022
Memory in humans and deep language models: Linking hypotheses for model augmentation
Omri Raccah
Pheobe Chen
Ted Willke
David Poeppel
Vy A. Vo
RALM
79
1
0
04 Oct 2022
Grouped self-attention mechanism for a memory-efficient Transformer
Bumjun Jung
Yusuke Mukuta
Tatsuya Harada
AI4TS
26
3
0
02 Oct 2022
AudioGen: Textually Guided Audio Generation
Felix Kreuk
Gabriel Synnaeve
Adam Polyak
Uriel Singer
Alexandre Défossez
Jade Copet
Devi Parikh
Yaniv Taigman
Yossi Adi
DiffM
139
309
0
30 Sep 2022
Mega: Moving Average Equipped Gated Attention
Xuezhe Ma
Chunting Zhou
Xiang Kong
Junxian He
Liangke Gui
Graham Neubig
Jonathan May
Luke Zettlemoyer
143
185
0
21 Sep 2022
Stateful Memory-Augmented Transformers for Efficient Dialogue Modeling
Qingyang Wu
Zhou Yu
RALM
29
0
0
15 Sep 2022
Sparse Attention Acceleration with Synergistic In-Memory Pruning and On-Chip Recomputation
Amir Yazdanbakhsh
Ashkan Moradifirouzabadi
Zheng Li
Mingu Kang
84
33
0
01 Sep 2022
Efficient Methods for Natural Language Processing: A Survey
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
156
114
0
31 Aug 2022
A Circular Window-based Cascade Transformer for Online Action Detection
Shuyuan Cao
Weihua Luo
Bairui Wang
Wei Emma Zhang
Lin Ma
87
6
0
30 Aug 2022
SpanDrop: Simple and Effective Counterfactual Learning for Long Sequences
Peng Qi
Guangtao Wang
Jing Huang
48
0
0
03 Aug 2022
NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis
Chenfei Wu
Jian Liang
Xiaowei Hu
Zhe Gan
Jianfeng Wang
Lijuan Wang
Zicheng Liu
Yuejian Fang
Nan Duan
VGen
89
74
0
20 Jul 2022
Recurrent Memory Transformer
Aydar Bulatov
Yuri Kuratov
Andrey Kravchenko
CLL
49
112
0
14 Jul 2022
Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism
Kun Wei
Pengcheng Guo
Ning Jiang
84
11
0
02 Jul 2022
Long Range Language Modeling via Gated State Spaces
Harsh Mehta
Ankit Gupta
Ashok Cutkosky
Behnam Neyshabur
Mamba
145
243
0
27 Jun 2022
Language Models are General-Purpose Interfaces
Y. Hao
Haoyu Song
Li Dong
Shaohan Huang
Zewen Chi
Wenhui Wang
Shuming Ma
Furu Wei
MLLM
78
102
0
13 Jun 2022
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
445
2,297
0
27 May 2022
Fast Vision Transformers with HiLo Attention
Zizheng Pan
Jianfei Cai
Bohan Zhuang
67
168
0
26 May 2022
Training Language Models with Memory Augmentation
Zexuan Zhong
Tao Lei
Danqi Chen
RALM
334
133
0
25 May 2022
RankGen: Improving Text Generation with Large Ranking Models
Kalpesh Krishna
Yapei Chang
John Wieting
Mohit Iyyer
AIMat
83
69
0
19 May 2022
A Corpus for Understanding and Generating Moral Stories
Jian Guan
Ziqi Liu
Minlie Huang
75
10
0
20 Apr 2022
Efficient Linear Attention for Fast and Accurate Keypoint Matching
Suwichaya Suwanwimolkul
S. Komorita
3DPC
3DV
72
11
0
16 Apr 2022
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
Sid Black
Stella Biderman
Eric Hallahan
Quentin G. Anthony
Leo Gao
...
Shivanshu Purohit
Laria Reynolds
J. Tow
Benqi Wang
Samuel Weinbach
189
841
0
14 Apr 2022
A Call for Clarity in Beam Search: How It Works and When It Stops
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Dragomir R. Radev
Yejin Choi
Noah A. Smith
104
9
0
11 Apr 2022
Can language models learn from explanations in context?
Andrew Kyle Lampinen
Ishita Dasgupta
Stephanie C. Y. Chan
Kory Matthewson
Michael Henry Tessler
Antonia Creswell
James L. McClelland
Jane X. Wang
Felix Hill
LRM
ReLM
190
302
0
05 Apr 2022
A Fast Transformer-based General-Purpose Lossless Compressor
Yushun Mao
Yufei Cui
Tei-Wei Kuo
Chun Jason Xue
ViT
AI4CE
89
34
0
30 Mar 2022
Training Compute-Optimal Large Language Models
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
...
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
AI4TS
217
1,993
0
29 Mar 2022
Unsupervised Learning of Temporal Abstractions with Slot-based Transformers
Anand Gopalakrishnan
Kazuki Irie
Jürgen Schmidhuber
Sjoerd van Steenkiste
OffRL
114
16
0
25 Mar 2022
Better Language Model with Hypernym Class Prediction
Richard He Bai
Tong Wang
Alessandro Sordoni
Peng Shi
139
16
0
21 Mar 2022
Local-Global Context Aware Transformer for Language-Guided Video Segmentation
Chen Liang
Wenguan Wang
Tianfei Zhou
Jiaxu Miao
Yawei Luo
Yi Yang
VOS
100
79
0
18 Mar 2022
Memorizing Transformers
Yuhuai Wu
M. Rabe
DeLesley S. Hutchins
Christian Szegedy
RALM
109
179
0
16 Mar 2022
Recurrence-in-Recurrence Networks for Video Deblurring
J. Park
Seungjun Nah
Kyoung Mu Lee
52
5
0
12 Mar 2022
Block-Recurrent Transformers
DeLesley S. Hutchins
Imanol Schlag
Yuhuai Wu
Ethan Dyer
Behnam Neyshabur
103
100
0
11 Mar 2022
Transformer Quality in Linear Time
Weizhe Hua
Zihang Dai
Hanxiao Liu
Quoc V. Le
147
233
0
21 Feb 2022
The NLP Task Effectiveness of Long-Range Transformers
Guanghui Qin
Yukun Feng
Benjamin Van Durme
67
30
0
16 Feb 2022
Previous
1
2
3
4
5
Next