Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.10509
Cited By
Generating Long Sequences with Sparse Transformers
23 April 2019
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Generating Long Sequences with Sparse Transformers"
50 / 1,140 papers shown
Title
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
255
4,805
0
24 Feb 2021
Unsupervised Brain Anomaly Detection and Segmentation with Transformers
W. H. Pinaya
Petru-Daniel Tudosiu
Robert J. Gray
G. Rees
P. Nachev
Sebastien Ourselin
M. Jorge Cardoso
ViT
MedIm
30
59
0
23 Feb 2021
TransMask: A Compact and Fast Speech Separation Model Based on Transformer
Zining Zhang
Bingsheng He
Zhenjie Zhang
36
21
0
19 Feb 2021
Improved Denoising Diffusion Probabilistic Models
Alex Nichol
Prafulla Dhariwal
DiffM
60
3,541
0
18 Feb 2021
Composable Generative Models
Johan Leduc
Nicolas Grislain
SyDa
38
4
0
18 Feb 2021
LambdaNetworks: Modeling Long-Range Interactions Without Attention
Irwan Bello
281
179
0
17 Feb 2021
TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up
Yi Ding
Shiyu Chang
Zhangyang Wang
ViT
29
383
0
14 Feb 2021
Transformer Language Models with LSTM-based Cross-utterance Information Representation
G. Sun
C. Zhang
P. Woodland
76
32
0
12 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
283
1,989
0
09 Feb 2021
Colorization Transformer
Manoj Kumar
Dirk Weissenborn
Nal Kalchbrenner
ViT
232
156
0
08 Feb 2021
TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation
Jieneng Chen
Yongyi Lu
Qihang Yu
Xiangde Luo
Ehsan Adeli
Yan Wang
Le Lu
Alan Yuille
Yuyin Zhou
ViT
MedIm
21
3,366
0
08 Feb 2021
Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention
Yunyang Xiong
Zhanpeng Zeng
Rudrasis Chakraborty
Mingxing Tan
G. Fung
Yin Li
Vikas Singh
47
508
0
07 Feb 2021
Mind the Gap: Assessing Temporal Generalization in Neural Language Models
Angeliki Lazaridou
A. Kuncoro
E. Gribovskaya
Devang Agrawal
Adam Liska
...
Sebastian Ruder
Dani Yogatama
Kris Cao
Susannah Young
Phil Blunsom
VLM
41
207
0
03 Feb 2021
TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models
Chunxing Yin
Bilge Acun
Xing Liu
Carole-Jean Wu
50
103
0
25 Jan 2021
Maximum Likelihood Training of Score-Based Diffusion Models
Yang Song
Conor Durkan
Iain Murray
Stefano Ermon
DiffM
64
627
0
22 Jan 2021
SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
Brendan Duke
Abdalla Ahmed
Christian Wolf
P. Aarabi
Graham W. Taylor
VOS
22
164
0
21 Jan 2021
PGT: Pseudo Relevance Feedback Using a Graph-Based Transformer
HongChien Yu
Zhuyun Dai
Jamie Callan
16
22
0
20 Jan 2021
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
W. Fedus
Barret Zoph
Noam M. Shazeer
MoE
11
2,087
0
11 Jan 2021
Compound Word Transformer: Learning to Compose Full-Song Music over Dynamic Directed Hypergraphs
Wen-Yi Hsiao
Jen-Yu Liu
Yin-Cheng Yeh
Yi-Hsuan Yang
118
181
0
07 Jan 2021
Transformers in Vision: A Survey
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
Fahad Shahbaz Khan
M. Shah
ViT
227
2,434
0
04 Jan 2021
Shortformer: Better Language Modeling using Shorter Inputs
Ofir Press
Noah A. Smith
M. Lewis
230
89
0
31 Dec 2020
ERNIE-Doc: A Retrospective Long-Document Modeling Transformer
Siyu Ding
Junyuan Shang
Shuohuan Wang
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
73
52
0
31 Dec 2020
RealFormer: Transformer Likes Residual Attention
Ruining He
Anirudh Ravula
Bhargav Kanagal
Joshua Ainslie
27
108
0
21 Dec 2020
Sub-Linear Memory: How to Make Performers SLiM
Valerii Likhosherstov
K. Choromanski
Jared Davis
Xingyou Song
Adrian Weller
23
19
0
21 Dec 2020
Taming Transformers for High-Resolution Image Synthesis
Patrick Esser
Robin Rombach
Bjorn Ommer
ViT
64
2,837
0
17 Dec 2020
SceneFormer: Indoor Scene Generation with Transformers
Xinpeng Wang
Chandan Yeshwanth
Matthias Nießner
ViT
3DPC
29
147
0
17 Dec 2020
Revisiting Linformer with a modified self-attention with linear complexity
Madhusudan Verma
8
8
0
16 Dec 2020
Learning Energy-Based Models by Diffusion Recovery Likelihood
Ruiqi Gao
Yang Song
Ben Poole
Ying Nian Wu
Diederik P. Kingma
DiffM
16
124
0
15 Dec 2020
MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers
Huiyu Wang
Yukun Zhu
Hartwig Adam
Alan Yuille
Liang-Chieh Chen
ViT
43
527
0
01 Dec 2020
Multi-stage Attention ResU-Net for Semantic Segmentation of Fine-Resolution Remote Sensing Images
Rui Li
Shunyi Zheng
Chenxi Duan
Jianlin Su
Ce Zhang
35
187
0
29 Nov 2020
Direct Evolutionary Optimization of Variational Autoencoders With Binary Latents
E. Guiraud
Jakob Drefs
Jörg Lücke
DRL
40
3
0
27 Nov 2020
A Survey of Deep Learning Approaches for OCR and Document Understanding
Nishant Subramani
Alexandre Matton
Malcolm Greaves
Adrian Lam
19
48
0
27 Nov 2020
Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images
R. Child
BDL
VLM
56
339
0
20 Nov 2020
Data-Informed Global Sparseness in Attention Mechanisms for Deep Neural Networks
Ileana Rugina
Rumen Dangovski
L. Jing
Preslav Nakov
Marin Soljacic
26
0
0
20 Nov 2020
EasyTransfer -- A Simple and Scalable Deep Transfer Learning Platform for NLP Applications
Minghui Qiu
Peng Li
Chengyu Wang
Hanjie Pan
Yaliang Li
...
Jun Yang
Yaliang Li
Jun Huang
Deng Cai
Wei Lin
VLM
SyDa
39
20
0
18 Nov 2020
s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis
Xi Wang
Huaiping Ming
Lei He
Frank Soong
19
5
0
17 Nov 2020
Long Range Arena: A Benchmark for Efficient Transformers
Yi Tay
Mostafa Dehghani
Samira Abnar
Songlin Yang
Dara Bahri
Philip Pham
J. Rao
Liu Yang
Sebastian Ruder
Donald Metzler
53
694
0
08 Nov 2020
Revisiting Stereo Depth Estimation From a Sequence-to-Sequence Perspective with Transformers
Zhaoshuo Li
Xingtong Liu
Nathan G. Drenkow
Andy S Ding
Francis X. Creighton
Russell H. Taylor
Mathias Unberath
MDE
ViT
50
278
0
05 Nov 2020
Deep Learning in Computer-Aided Diagnosis and Treatment of Tumors: A Survey
Dan Zhao
Guizhi Xu
Xu Zhenghua
Thomas Lukasiewicz
Minmin Xue
Zhigang Fu
OOD
16
2
0
02 Nov 2020
Scaling Laws for Autoregressive Generative Modeling
T. Henighan
Jared Kaplan
Mor Katz
Mark Chen
Christopher Hesse
...
Nick Ryder
Daniel M. Ziegler
John Schulman
Dario Amodei
Sam McCandlish
53
408
0
28 Oct 2020
Memory Optimization for Deep Networks
Aashaka Shah
Chaoxia Wu
Jayashree Mohan
Vijay Chidambaram
Philipp Krahenbuhl
19
24
0
27 Oct 2020
Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping
Minjia Zhang
Yuxiong He
AI4CE
13
100
0
26 Oct 2020
Long Document Ranking with Query-Directed Sparse Transformer
Jyun-Yu Jiang
Chenyan Xiong
Chia-Jung Lee
Wei Wang
33
25
0
23 Oct 2020
Limitations of Autoregressive Models and Their Alternatives
Chu-cheng Lin
Aaron Jaech
Xin Li
Matthew R. Gormley
Jason Eisner
29
58
0
22 Oct 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
41
39,428
0
22 Oct 2020
N-ODE Transformer: A Depth-Adaptive Variant of the Transformer Using Neural Ordinary Differential Equations
Aaron Baier-Reinio
H. Sterck
24
9
0
22 Oct 2020
Open Question Answering over Tables and Text
Wenhu Chen
Ming-Wei Chang
Eva Schlinger
Wenjie Wang
William W. Cohen
LMTD
RALM
31
194
0
20 Oct 2020
Rethinking Document-level Neural Machine Translation
Zewei Sun
Mingxuan Wang
Hao Zhou
Chengqi Zhao
Shujian Huang
Jiajun Chen
Lei Li
VLM
83
47
0
18 Oct 2020
Adaptive Feature Selection for End-to-End Speech Translation
Biao Zhang
Ivan Titov
Barry Haddow
Rico Sennrich
13
40
0
16 Oct 2020
Neural Function Modules with Sparse Arguments: A Dynamic Approach to Integrating Information across Layers
Alex Lamb
Anirudh Goyal
A. Slowik
Michael C. Mozer
Philippe Beaudoin
Yoshua Bengio
11
3
0
15 Oct 2020
Previous
1
2
3
...
19
20
21
22
23
Next