ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1807.03819
  4. Cited By
Universal Transformers

Universal Transformers

10 July 2018
Mostafa Dehghani
Stephan Gouws
Oriol Vinyals
Jakob Uszkoreit
Lukasz Kaiser
ArXivPDFHTML

Papers citing "Universal Transformers"

50 / 459 papers shown
Title
Semantic Communication with Adaptive Universal Transformer
Semantic Communication with Adaptive Universal Transformer
Qingyang Zhou
Rongpeng Li
Zhifeng Zhao
Chenghui Peng
Honggang Zhang
22
88
0
20 Aug 2021
AMMUS : A Survey of Transformer-based Pretrained Models in Natural
  Language Processing
AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing
Katikapalli Subramanyam Kalyan
A. Rajasekharan
S. Sangeetha
VLM
LM&MA
26
260
0
12 Aug 2021
Go Wider Instead of Deeper
Go Wider Instead of Deeper
Fuzhao Xue
Ziji Shi
Futao Wei
Yuxuan Lou
Yong Liu
Yang You
ViT
MoE
17
80
0
25 Jul 2021
The Benchmark Lottery
The Benchmark Lottery
Mostafa Dehghani
Yi Tay
A. Gritsenko
Zhe Zhao
N. Houlsby
Fernando Diaz
Donald Metzler
Oriol Vinyals
39
89
0
14 Jul 2021
Learning Algebraic Recombination for Compositional Generalization
Learning Algebraic Recombination for Compositional Generalization
Chenyao Liu
Shengnan An
Zeqi Lin
Qian Liu
Bei Chen
Jian-Guang Lou
Lijie Wen
Nanning Zheng
Dongmei Zhang
CoGe
196
36
0
14 Jul 2021
Transformer-F: A Transformer network with effective methods for learning
  universal sentence representation
Transformer-F: A Transformer network with effective methods for learning universal sentence representation
Yu Shi
13
1
0
02 Jul 2021
Probabilistic Attention for Interactive Segmentation
Probabilistic Attention for Interactive Segmentation
Prasad Gabbur
Manjot Bilkhu
J. Movellan
26
13
0
23 Jun 2021
End-to-End Task-Oriented Dialog Modeling with Semi-Structured Knowledge
  Management
End-to-End Task-Oriented Dialog Modeling with Semi-Structured Knowledge Management
Silin Gao
Ryuichi Takanobu
Antoine Bosselut
Minlie Huang
19
3
0
22 Jun 2021
Improving Compositional Generalization in Classification Tasks via
  Structure Annotations
Improving Compositional Generalization in Classification Tasks via Structure Annotations
Juyong Kim
Pradeep Ravikumar
Joshua Ainslie
Santiago Ontañón
CoGe
16
18
0
19 Jun 2021
Recurrent Stacking of Layers in Neural Networks: An Application to
  Neural Machine Translation
Recurrent Stacking of Layers in Neural Networks: An Application to Neural Machine Translation
Raj Dabre
Atsushi Fujita
14
1
0
18 Jun 2021
Deep Learning Through the Lens of Example Difficulty
Deep Learning Through the Lens of Example Difficulty
R. Baldock
Hartmut Maennel
Behnam Neyshabur
44
155
0
17 Jun 2021
Thinking Like Transformers
Thinking Like Transformers
Gail Weiss
Yoav Goldberg
Eran Yahav
AI4CE
35
127
0
13 Jun 2021
Modeling Hierarchical Structures with Continuous Recursive Neural
  Networks
Modeling Hierarchical Structures with Continuous Recursive Neural Networks
Jishnu Ray Chowdhury
Cornelia Caragea
18
15
0
10 Jun 2021
FastSeq: Make Sequence Generation Faster
FastSeq: Make Sequence Generation Faster
Yu Yan
Fei Hu
Jiusheng Chen
Nikhil Bhendawade
Ting Ye
Yeyun Gong
Nan Duan
Desheng Cui
Bingyu Chi
Ruifei Zhang
VLM
24
15
0
08 Jun 2021
A Survey of Transformers
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
32
1,087
0
08 Jun 2021
Staircase Attention for Recurrent Processing of Sequences
Staircase Attention for Recurrent Processing of Sequences
Da Ju
Stephen Roller
Sainbayar Sukhbaatar
Jason Weston
24
11
0
08 Jun 2021
CAPE: Encoding Relative Positions with Continuous Augmented Positional
  Embeddings
CAPE: Encoding Relative Positions with Continuous Augmented Positional Embeddings
Tatiana Likhomanenko
Qiantong Xu
Gabriel Synnaeve
R. Collobert
A. Rogozhnikov
OOD
ViT
25
54
0
06 Jun 2021
Scalable Transformers for Neural Machine Translation
Scalable Transformers for Neural Machine Translation
Peng Gao
Shijie Geng
Yu Qiao
Xiaogang Wang
Jifeng Dai
Hongsheng Li
31
13
0
04 Jun 2021
The Case for Translation-Invariant Self-Attention in Transformer-Based
  Language Models
The Case for Translation-Invariant Self-Attention in Transformer-Based Language Models
Ulme Wennberg
G. Henter
MILM
27
21
0
03 Jun 2021
Luna: Linear Unified Nested Attention
Luna: Linear Unified Nested Attention
Xuezhe Ma
Xiang Kong
Sinong Wang
Chunting Zhou
Jonathan May
Hao Ma
Luke Zettlemoyer
25
114
0
03 Jun 2021
Choose a Transformer: Fourier or Galerkin
Choose a Transformer: Fourier or Galerkin
Shuhao Cao
39
221
0
31 May 2021
Cascaded Head-colliding Attention
Cascaded Head-colliding Attention
Lin Zheng
Zhiyong Wu
Lingpeng Kong
11
2
0
31 May 2021
Early Exiting with Ensemble Internal Classifiers
Early Exiting with Ensemble Internal Classifiers
Tianxiang Sun
Yunhua Zhou
Xiangyang Liu
Xinyu Zhang
Hao Jiang
Zhao Cao
Xuanjing Huang
Xipeng Qiu
24
30
0
28 May 2021
The Power of Log-Sum-Exp: Sequential Density Ratio Matrix Estimation for
  Speed-Accuracy Optimization
The Power of Log-Sum-Exp: Sequential Density Ratio Matrix Estimation for Speed-Accuracy Optimization
Taiki Miyagawa
Akinori F. Ebihara
17
3
0
28 May 2021
Continual Learning for Real-World Autonomous Systems: Algorithms,
  Challenges and Frameworks
Continual Learning for Real-World Autonomous Systems: Algorithms, Challenges and Frameworks
Khadija Shaheen
Muhammad Abdullah Hanif
Osman Hasan
Muhammad Shafique
CLL
11
90
0
26 May 2021
TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference
TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference
Deming Ye
Yankai Lin
Yufei Huang
Maosong Sun
MQ
14
63
0
25 May 2021
Self-Attention Networks Can Process Bounded Hierarchical Languages
Self-Attention Networks Can Process Bounded Hierarchical Languages
Shunyu Yao
Binghui Peng
Christos H. Papadimitriou
Karthik Narasimhan
31
73
0
24 May 2021
Relative Positional Encoding for Transformers with Linear Complexity
Relative Positional Encoding for Transformers with Linear Complexity
Antoine Liutkus
Ondřej Cífka
Shih-Lun Wu
Umut Simsekli
Yi-Hsuan Yang
Gaël Richard
25
44
0
18 May 2021
Consistent Accelerated Inference via Confident Adaptive Transformers
Consistent Accelerated Inference via Confident Adaptive Transformers
Tal Schuster
Adam Fisch
Tommi Jaakkola
Regina Barzilay
AI4TS
184
69
0
18 Apr 2021
TransVG: End-to-End Visual Grounding with Transformers
TransVG: End-to-End Visual Grounding with Transformers
Jiajun Deng
Zhengyuan Yang
Tianlang Chen
Wen-gang Zhou
Houqiang Li
ViT
28
329
0
17 Apr 2021
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language
  Tasks
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language Tasks
Hung Le
Nancy F. Chen
S. Hoi
MLLM
18
19
0
16 Apr 2021
Unlocking Compositional Generalization in Pre-trained Models Using
  Intermediate Representations
Unlocking Compositional Generalization in Pre-trained Models Using Intermediate Representations
Jonathan Herzig
Peter Shaw
Ming-Wei Chang
Kelvin Guu
Panupong Pasupat
Yuan Zhang
AI4CE
8
67
0
15 Apr 2021
Lessons on Parameter Sharing across Layers in Transformers
Lessons on Parameter Sharing across Layers in Transformers
Sho Takase
Shun Kiyono
17
84
0
13 Apr 2021
UniDrop: A Simple yet Effective Technique to Improve Transformer without
  Extra Cost
UniDrop: A Simple yet Effective Technique to Improve Transformer without Extra Cost
Zhen Wu
Lijun Wu
Qi Meng
Yingce Xia
Shufang Xie
Tao Qin
Xinyu Dai
Tie-Yan Liu
10
22
0
11 Apr 2021
Extended Parallel Corpus for Amharic-English Machine Translation
Extended Parallel Corpus for Amharic-English Machine Translation
A. Gezmu
A. Nürnberger
T. Bati
6
16
0
08 Apr 2021
Revisiting Simple Neural Probabilistic Language Models
Revisiting Simple Neural Probabilistic Language Models
Simeng Sun
Mohit Iyyer
24
14
0
08 Apr 2021
ODE Transformer: An Ordinary Differential Equation-Inspired Model for
  Neural Machine Translation
ODE Transformer: An Ordinary Differential Equation-Inspired Model for Neural Machine Translation
Bei Li
Quan Du
Tao Zhou
Shuhan Zhou
Xin Zeng
Tong Xiao
Jingbo Zhu
15
22
0
06 Apr 2021
Attention, please! A survey of Neural Attention Models in Deep Learning
Attention, please! A survey of Neural Attention Models in Deep Learning
Alana de Santana Correia
Esther Luna Colombini
HAI
23
175
0
31 Mar 2021
ViViT: A Video Vision Transformer
ViViT: A Video Vision Transformer
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
ViT
30
2,087
0
29 Mar 2021
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS
Ye Jia
Heiga Zen
Jonathan Shen
Yu Zhang
Yonghui Wu
SSL
22
81
0
28 Mar 2021
A Practical Survey on Faster and Lighter Transformers
A Practical Survey on Faster and Lighter Transformers
Quentin Fournier
G. Caron
Daniel Aloise
14
93
0
26 Mar 2021
Finetuning Pretrained Transformers into RNNs
Finetuning Pretrained Transformers into RNNs
Jungo Kasai
Hao Peng
Yizhe Zhang
Dani Yogatama
Gabriel Ilharco
Nikolaos Pappas
Yi Mao
Weizhu Chen
Noah A. Smith
28
63
0
24 Mar 2021
PAC-learning gains of Turing machines over circuits and neural networks
PAC-learning gains of Turing machines over circuits and neural networks
Brieuc Pinon
Raphaël Jungers
Jean-Charles Delvenne
AI4CE
11
1
0
23 Mar 2021
Set-to-Sequence Methods in Machine Learning: a Review
Set-to-Sequence Methods in Machine Learning: a Review
Mateusz Jurewicz
Leon Derczynski
BDL
27
9
0
17 Mar 2021
Attention is Not All You Need: Pure Attention Loses Rank Doubly
  Exponentially with Depth
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth
Yihe Dong
Jean-Baptiste Cordonnier
Andreas Loukas
34
373
0
05 Mar 2021
Perceiver: General Perception with Iterative Attention
Perceiver: General Perception with Iterative Attention
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLM
ViT
MDE
59
973
0
04 Mar 2021
Random Feature Attention
Random Feature Attention
Hao Peng
Nikolaos Pappas
Dani Yogatama
Roy Schwartz
Noah A. Smith
Lingpeng Kong
19
348
0
03 Mar 2021
A Minimalist Dataset for Systematic Generalization of Perception,
  Syntax, and Semantics
A Minimalist Dataset for Systematic Generalization of Perception, Syntax, and Semantics
Qing Li
Siyuan Huang
Yining Hong
Yixin Zhu
Ying Nian Wu
Song-Chun Zhu
AIMat
19
6
0
02 Mar 2021
OmniNet: Omnidirectional Representations from Transformers
OmniNet: Omnidirectional Representations from Transformers
Yi Tay
Mostafa Dehghani
V. Aribandi
Jai Gupta
Philip Pham
Zhen Qin
Dara Bahri
Da-Cheng Juan
Donald Metzler
39
26
0
01 Mar 2021
Transformers with Competitive Ensembles of Independent Mechanisms
Transformers with Competitive Ensembles of Independent Mechanisms
Alex Lamb
Di He
Anirudh Goyal
Guolin Ke
Chien-Feng Liao
Mirco Ravanelli
Yoshua Bengio
MoE
21
23
0
27 Feb 2021
Previous
123...1056789
Next