Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1807.03819
Cited By
Universal Transformers
10 July 2018
Mostafa Dehghani
Stephan Gouws
Oriol Vinyals
Jakob Uszkoreit
Lukasz Kaiser
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Universal Transformers"
50 / 459 papers shown
Title
Semantic Communication with Adaptive Universal Transformer
Qingyang Zhou
Rongpeng Li
Zhifeng Zhao
Chenghui Peng
Honggang Zhang
22
88
0
20 Aug 2021
AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing
Katikapalli Subramanyam Kalyan
A. Rajasekharan
S. Sangeetha
VLM
LM&MA
26
260
0
12 Aug 2021
Go Wider Instead of Deeper
Fuzhao Xue
Ziji Shi
Futao Wei
Yuxuan Lou
Yong Liu
Yang You
ViT
MoE
17
80
0
25 Jul 2021
The Benchmark Lottery
Mostafa Dehghani
Yi Tay
A. Gritsenko
Zhe Zhao
N. Houlsby
Fernando Diaz
Donald Metzler
Oriol Vinyals
39
89
0
14 Jul 2021
Learning Algebraic Recombination for Compositional Generalization
Chenyao Liu
Shengnan An
Zeqi Lin
Qian Liu
Bei Chen
Jian-Guang Lou
Lijie Wen
Nanning Zheng
Dongmei Zhang
CoGe
196
36
0
14 Jul 2021
Transformer-F: A Transformer network with effective methods for learning universal sentence representation
Yu Shi
13
1
0
02 Jul 2021
Probabilistic Attention for Interactive Segmentation
Prasad Gabbur
Manjot Bilkhu
J. Movellan
26
13
0
23 Jun 2021
End-to-End Task-Oriented Dialog Modeling with Semi-Structured Knowledge Management
Silin Gao
Ryuichi Takanobu
Antoine Bosselut
Minlie Huang
19
3
0
22 Jun 2021
Improving Compositional Generalization in Classification Tasks via Structure Annotations
Juyong Kim
Pradeep Ravikumar
Joshua Ainslie
Santiago Ontañón
CoGe
16
18
0
19 Jun 2021
Recurrent Stacking of Layers in Neural Networks: An Application to Neural Machine Translation
Raj Dabre
Atsushi Fujita
14
1
0
18 Jun 2021
Deep Learning Through the Lens of Example Difficulty
R. Baldock
Hartmut Maennel
Behnam Neyshabur
44
155
0
17 Jun 2021
Thinking Like Transformers
Gail Weiss
Yoav Goldberg
Eran Yahav
AI4CE
35
127
0
13 Jun 2021
Modeling Hierarchical Structures with Continuous Recursive Neural Networks
Jishnu Ray Chowdhury
Cornelia Caragea
18
15
0
10 Jun 2021
FastSeq: Make Sequence Generation Faster
Yu Yan
Fei Hu
Jiusheng Chen
Nikhil Bhendawade
Ting Ye
Yeyun Gong
Nan Duan
Desheng Cui
Bingyu Chi
Ruifei Zhang
VLM
24
15
0
08 Jun 2021
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
32
1,087
0
08 Jun 2021
Staircase Attention for Recurrent Processing of Sequences
Da Ju
Stephen Roller
Sainbayar Sukhbaatar
Jason Weston
24
11
0
08 Jun 2021
CAPE: Encoding Relative Positions with Continuous Augmented Positional Embeddings
Tatiana Likhomanenko
Qiantong Xu
Gabriel Synnaeve
R. Collobert
A. Rogozhnikov
OOD
ViT
25
54
0
06 Jun 2021
Scalable Transformers for Neural Machine Translation
Peng Gao
Shijie Geng
Yu Qiao
Xiaogang Wang
Jifeng Dai
Hongsheng Li
31
13
0
04 Jun 2021
The Case for Translation-Invariant Self-Attention in Transformer-Based Language Models
Ulme Wennberg
G. Henter
MILM
27
21
0
03 Jun 2021
Luna: Linear Unified Nested Attention
Xuezhe Ma
Xiang Kong
Sinong Wang
Chunting Zhou
Jonathan May
Hao Ma
Luke Zettlemoyer
25
114
0
03 Jun 2021
Choose a Transformer: Fourier or Galerkin
Shuhao Cao
39
221
0
31 May 2021
Cascaded Head-colliding Attention
Lin Zheng
Zhiyong Wu
Lingpeng Kong
11
2
0
31 May 2021
Early Exiting with Ensemble Internal Classifiers
Tianxiang Sun
Yunhua Zhou
Xiangyang Liu
Xinyu Zhang
Hao Jiang
Zhao Cao
Xuanjing Huang
Xipeng Qiu
24
30
0
28 May 2021
The Power of Log-Sum-Exp: Sequential Density Ratio Matrix Estimation for Speed-Accuracy Optimization
Taiki Miyagawa
Akinori F. Ebihara
17
3
0
28 May 2021
Continual Learning for Real-World Autonomous Systems: Algorithms, Challenges and Frameworks
Khadija Shaheen
Muhammad Abdullah Hanif
Osman Hasan
Muhammad Shafique
CLL
11
90
0
26 May 2021
TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference
Deming Ye
Yankai Lin
Yufei Huang
Maosong Sun
MQ
14
63
0
25 May 2021
Self-Attention Networks Can Process Bounded Hierarchical Languages
Shunyu Yao
Binghui Peng
Christos H. Papadimitriou
Karthik Narasimhan
31
73
0
24 May 2021
Relative Positional Encoding for Transformers with Linear Complexity
Antoine Liutkus
Ondřej Cífka
Shih-Lun Wu
Umut Simsekli
Yi-Hsuan Yang
Gaël Richard
25
44
0
18 May 2021
Consistent Accelerated Inference via Confident Adaptive Transformers
Tal Schuster
Adam Fisch
Tommi Jaakkola
Regina Barzilay
AI4TS
184
69
0
18 Apr 2021
TransVG: End-to-End Visual Grounding with Transformers
Jiajun Deng
Zhengyuan Yang
Tianlang Chen
Wen-gang Zhou
Houqiang Li
ViT
28
329
0
17 Apr 2021
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language Tasks
Hung Le
Nancy F. Chen
S. Hoi
MLLM
18
19
0
16 Apr 2021
Unlocking Compositional Generalization in Pre-trained Models Using Intermediate Representations
Jonathan Herzig
Peter Shaw
Ming-Wei Chang
Kelvin Guu
Panupong Pasupat
Yuan Zhang
AI4CE
8
67
0
15 Apr 2021
Lessons on Parameter Sharing across Layers in Transformers
Sho Takase
Shun Kiyono
17
84
0
13 Apr 2021
UniDrop: A Simple yet Effective Technique to Improve Transformer without Extra Cost
Zhen Wu
Lijun Wu
Qi Meng
Yingce Xia
Shufang Xie
Tao Qin
Xinyu Dai
Tie-Yan Liu
10
22
0
11 Apr 2021
Extended Parallel Corpus for Amharic-English Machine Translation
A. Gezmu
A. Nürnberger
T. Bati
6
16
0
08 Apr 2021
Revisiting Simple Neural Probabilistic Language Models
Simeng Sun
Mohit Iyyer
24
14
0
08 Apr 2021
ODE Transformer: An Ordinary Differential Equation-Inspired Model for Neural Machine Translation
Bei Li
Quan Du
Tao Zhou
Shuhan Zhou
Xin Zeng
Tong Xiao
Jingbo Zhu
15
22
0
06 Apr 2021
Attention, please! A survey of Neural Attention Models in Deep Learning
Alana de Santana Correia
Esther Luna Colombini
HAI
23
175
0
31 Mar 2021
ViViT: A Video Vision Transformer
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
ViT
30
2,087
0
29 Mar 2021
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS
Ye Jia
Heiga Zen
Jonathan Shen
Yu Zhang
Yonghui Wu
SSL
22
81
0
28 Mar 2021
A Practical Survey on Faster and Lighter Transformers
Quentin Fournier
G. Caron
Daniel Aloise
14
93
0
26 Mar 2021
Finetuning Pretrained Transformers into RNNs
Jungo Kasai
Hao Peng
Yizhe Zhang
Dani Yogatama
Gabriel Ilharco
Nikolaos Pappas
Yi Mao
Weizhu Chen
Noah A. Smith
28
63
0
24 Mar 2021
PAC-learning gains of Turing machines over circuits and neural networks
Brieuc Pinon
Raphaël Jungers
Jean-Charles Delvenne
AI4CE
11
1
0
23 Mar 2021
Set-to-Sequence Methods in Machine Learning: a Review
Mateusz Jurewicz
Leon Derczynski
BDL
27
9
0
17 Mar 2021
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth
Yihe Dong
Jean-Baptiste Cordonnier
Andreas Loukas
34
373
0
05 Mar 2021
Perceiver: General Perception with Iterative Attention
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLM
ViT
MDE
59
973
0
04 Mar 2021
Random Feature Attention
Hao Peng
Nikolaos Pappas
Dani Yogatama
Roy Schwartz
Noah A. Smith
Lingpeng Kong
19
348
0
03 Mar 2021
A Minimalist Dataset for Systematic Generalization of Perception, Syntax, and Semantics
Qing Li
Siyuan Huang
Yining Hong
Yixin Zhu
Ying Nian Wu
Song-Chun Zhu
AIMat
19
6
0
02 Mar 2021
OmniNet: Omnidirectional Representations from Transformers
Yi Tay
Mostafa Dehghani
V. Aribandi
Jai Gupta
Philip Pham
Zhen Qin
Dara Bahri
Da-Cheng Juan
Donald Metzler
39
26
0
01 Mar 2021
Transformers with Competitive Ensembles of Independent Mechanisms
Alex Lamb
Di He
Anirudh Goyal
Guolin Ke
Chien-Feng Liao
Mirco Ravanelli
Yoshua Bengio
MoE
21
23
0
27 Feb 2021
Previous
1
2
3
...
10
5
6
7
8
9
Next