ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1807.03819
  4. Cited By
Universal Transformers

Universal Transformers

10 July 2018
Mostafa Dehghani
Stephan Gouws
Oriol Vinyals
Jakob Uszkoreit
Lukasz Kaiser
ArXivPDFHTML

Papers citing "Universal Transformers"

50 / 459 papers shown
Title
Depth-Adaptive Transformer
Depth-Adaptive Transformer
Maha Elbayad
Jiatao Gu
Edouard Grave
Michael Auli
19
188
0
22 Oct 2019
Stabilizing Transformers for Reinforcement Learning
Stabilizing Transformers for Reinforcement Learning
Emilio Parisotto
H. F. Song
Jack W. Rae
Razvan Pascanu
Çağlar Gülçehre
...
Aidan Clark
Seb Noury
M. Botvinick
N. Heess
R. Hadsell
OffRL
22
359
0
13 Oct 2019
Neural Language Priors
Neural Language Priors
Joseph Enguehard
Dan Busbridge
V. Zhelezniak
Nils Y. Hammerla
23
3
0
04 Oct 2019
Universal Graph Transformer Self-Attention Networks
Universal Graph Transformer Self-Attention Networks
D. Q. Nguyen
T. Nguyen
Dinh Q. Phung
ViT
28
63
0
26 Sep 2019
TinyBERT: Distilling BERT for Natural Language Understanding
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
11
1,815
0
23 Sep 2019
Adaptively Aligned Image Captioning via Adaptive Attention Time
Adaptively Aligned Image Captioning via Adaptive Attention Time
Lun Huang
Wenmin Wang
Yaxian Xia
Jie Chen
8
60
0
19 Sep 2019
Hybrid Neural Models For Sequence Modelling: The Best Of Three Worlds
Hybrid Neural Models For Sequence Modelling: The Best Of Three Worlds
Marco Dinarelli
Loïc Grobol
3DV
9
5
0
16 Sep 2019
How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer
  Representations
How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations
Betty van Aken
B. Winter
Alexander Loser
Felix Alexander Gers
19
152
0
11 Sep 2019
Self-Attentional Models Application in Task-Oriented Dialogue Generation
  Systems
Self-Attentional Models Application in Task-Oriented Dialogue Generation Systems
Mansour Saffar Mehrjardi
Amine Trabelsi
Osmar R. Zaiane
LRM
11
5
0
11 Sep 2019
Logic and the $2$-Simplicial Transformer
Logic and the 222-Simplicial Transformer
James Clift
D. Doryn
Daniel Murfet
James Wallbridge
NAI
16
3
0
02 Sep 2019
Multiresolution Transformer Networks: Recurrence is Not Essential for
  Modeling Hierarchical Structure
Multiresolution Transformer Networks: Recurrence is Not Essential for Modeling Hierarchical Structure
Vikas K. Garg
Inderjit S. Dhillon
Hsiang-Fu Yu
10
7
0
27 Aug 2019
Hard but Robust, Easy but Sensitive: How Encoder and Decoder Perform in
  Neural Machine Translation
Hard but Robust, Easy but Sensitive: How Encoder and Decoder Perform in Neural Machine Translation
Tianyu He
Xu Tan
Tao Qin
17
12
0
17 Aug 2019
On Identifiability in Transformers
On Identifiability in Transformers
Gino Brunner
Yang Liu
Damian Pascual
Oliver Richter
Massimiliano Ciaramita
Roger Wattenhofer
ViT
22
186
0
12 Aug 2019
Universal Transforming Geometric Network
Universal Transforming Geometric Network
Jin Li
15
9
0
02 Aug 2019
Agglomerative Attention
Agglomerative Attention
Matthew Spellings
14
0
0
15 Jul 2019
R-Transformer: Recurrent Neural Network Enhanced Transformer
R-Transformer: Recurrent Neural Network Enhanced Transformer
Z. Wang
Yao Ma
Zitao Liu
Jiliang Tang
ViT
11
105
0
12 Jul 2019
Object Detection in Video with Spatial-temporal Context Aggregation
Object Detection in Video with Spatial-temporal Context Aggregation
Hao Luo
Lichao Huang
Han Shen
Yuan Li
Chang Huang
Xinggang Wang
15
14
0
11 Jul 2019
Widening the Representation Bottleneck in Neural Machine Translation
  with Lexical Shortcuts
Widening the Representation Bottleneck in Neural Machine Translation with Lexical Shortcuts
Denis Emelin
Ivan Titov
Rico Sennrich
19
10
0
28 Jun 2019
Self Multi-Head Attention for Speaker Recognition
Self Multi-Head Attention for Speaker Recognition
Miquel India
Pooyan Safari
Javier Hernando
6
110
0
24 Jun 2019
A Tensorized Transformer for Language Modeling
A Tensorized Transformer for Language Modeling
Xindian Ma
Peng Zhang
Shuai Zhang
Nan Duan
Yuexian Hou
D. Song
M. Zhou
18
162
0
24 Jun 2019
ParNet: Position-aware Aggregated Relation Network for Image-Text
  matching
ParNet: Position-aware Aggregated Relation Network for Image-Text matching
Yaxian Xia
Lun Huang
Wenmin Wang
Xiao-Yong Wei
Jie Chen
27
1
0
17 Jun 2019
Theoretical Limitations of Self-Attention in Neural Sequence Models
Theoretical Limitations of Self-Attention in Neural Sequence Models
Michael Hahn
19
261
0
16 Jun 2019
Scalable Syntax-Aware Language Models Using Knowledge Distillation
Scalable Syntax-Aware Language Models Using Knowledge Distillation
A. Kuncoro
Chris Dyer
Laura Rimell
S. Clark
Phil Blunsom
30
26
0
14 Jun 2019
Lightweight and Efficient Neural Natural Language Processing with
  Quaternion Networks
Lightweight and Efficient Neural Natural Language Processing with Quaternion Networks
Yi Tay
Aston Zhang
Anh Tuan Luu
J. Rao
Shuai Zhang
Shuohang Wang
Jie Fu
S. Hui
23
55
0
11 Jun 2019
CAiRE_HKUST at SemEval-2019 Task 3: Hierarchical Attention for Dialogue
  Emotion Classification
CAiRE_HKUST at SemEval-2019 Task 3: Hierarchical Attention for Dialogue Emotion Classification
Genta Indra Winata
Andrea Madotto
Zhaojiang Lin
Jamin Shin
Yan Xu
Peng-Tao Xu
Pascale Fung
15
22
0
10 Jun 2019
Attention is all you need for Videos: Self-attention based Video
  Summarization using Universal Transformers
Attention is all you need for Videos: Self-attention based Video Summarization using Universal Transformers
Manjot Bilkhu
Siyang Wang
Tushar Dobhal
ViT
11
15
0
06 Jun 2019
Understanding and Improving Transformer From a Multi-Particle Dynamic
  System Point of View
Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View
Yiping Lu
Zhuohan Li
Di He
Zhiqing Sun
Bin Dong
Tao Qin
Liwei Wang
Tie-Yan Liu
AI4CE
13
168
0
06 Jun 2019
Exploiting Sentential Context for Neural Machine Translation
Exploiting Sentential Context for Neural Machine Translation
Xing Wang
Zhaopeng Tu
Longyue Wang
Shuming Shi
22
21
0
04 Jun 2019
Educating Text Autoencoders: Latent Representation Guidance via
  Denoising
Educating Text Autoencoders: Latent Representation Guidance via Denoising
T. Shen
Jonas W. Mueller
Regina Barzilay
Tommi Jaakkola
14
4
0
29 May 2019
Graph Attention Auto-Encoders
Graph Attention Auto-Encoders
Amin Salehi
H. Davulcu
GNN
17
117
0
26 May 2019
An Explicitly Relational Neural Network Architecture
An Explicitly Relational Neural Network Architecture
Murray Shanahan
Kyriacos Nikiforou
Antonia Creswell
Christos Kaplanis
David Barrett
M. Garnelo
NAI
3DV
GAN
25
68
0
24 May 2019
Lightweight Network Architecture for Real-Time Action Recognition
Lightweight Network Architecture for Real-Time Action Recognition
Alexander Kozlov
Vadim Andronov
Y. Gritsenko
ViT
25
33
0
21 May 2019
Latent Universal Task-Specific BERT
Latent Universal Task-Specific BERT
A. Rozental
Zohar Kelrich
Daniel Fleischer
SSL
14
3
0
16 May 2019
Language Modeling with Deep Transformers
Language Modeling with Deep Transformers
Kazuki Irie
Albert Zeyer
Ralf Schluter
Hermann Ney
KELM
27
172
0
10 May 2019
Improving Differentiable Neural Computers Through Memory Masking,
  De-allocation, and Link Distribution Sharpness Control
Improving Differentiable Neural Computers Through Memory Masking, De-allocation, and Link Distribution Sharpness Control
Róbert Csordás
Jürgen Schmidhuber
13
28
0
23 Apr 2019
Exploring Unsupervised Pretraining and Sentence Structure Modelling for
  Winograd Schema Challenge
Exploring Unsupervised Pretraining and Sentence Structure Modelling for Winograd Schema Challenge
Yu-Ping Ruan
Xiao-Dan Zhu
Zhenhua Ling
Zhan Shi
Quan Liu
Si Wei
9
16
0
22 Apr 2019
Recurrent Space-time Graph Neural Networks
Recurrent Space-time Graph Neural Networks
Andrei Liviu Nicolicioiu
Iulia Duta
Marius Leordeanu
GNN
11
45
0
11 Apr 2019
Seq2Biseq: Bidirectional Output-wise Recurrent Neural Networks for
  Sequence Modelling
Seq2Biseq: Bidirectional Output-wise Recurrent Neural Networks for Sequence Modelling
Marco Dinarelli
Loïc Grobol
12
10
0
09 Apr 2019
Token-Level Ensemble Distillation for Grapheme-to-Phoneme Conversion
Token-Level Ensemble Distillation for Grapheme-to-Phoneme Conversion
Hao Sun
Xu Tan
Jun-Wei Gan
Hongzhi Liu
Sheng Zhao
Tao Qin
Tie-Yan Liu
13
65
0
06 Apr 2019
Information Aggregation for Multi-Head Attention with
  Routing-by-Agreement
Information Aggregation for Multi-Head Attention with Routing-by-Agreement
Jian Li
Baosong Yang
Zi-Yi Dou
Xing Wang
Michael R. Lyu
Zhaopeng Tu
14
46
0
05 Apr 2019
Modeling Recurrence for Transformer
Modeling Recurrence for Transformer
Jie Hao
Xing Wang
Baosong Yang
Longyue Wang
Jinfeng Zhang
Zhaopeng Tu
36
85
0
05 Apr 2019
An Attentive Survey of Attention Models
An Attentive Survey of Attention Models
S. Chaudhari
Varun Mithal
Gungor Polatkan
R. Ramanath
19
639
0
05 Apr 2019
A Holistic Representation Guided Attention Network for Scene Text
  Recognition
A Holistic Representation Guided Attention Network for Scene Text Recognition
L. Yang
Yuyang Deng
Peng Wang
Hui Li
Zhen Li
Yanning Zhang
21
36
0
02 Apr 2019
Interoperability and machine-to-machine translation model with mappings
  to machine learning tasks
Interoperability and machine-to-machine translation model with mappings to machine learning tasks
Jacob Nilsson
Fredrik Sandin
J. Delsing
AI4CE
31
18
0
26 Mar 2019
Auto-Vectorizing TensorFlow Graphs: Jacobians, Auto-Batching And Beyond
Auto-Vectorizing TensorFlow Graphs: Jacobians, Auto-Batching And Beyond
Ashish Agarwal
Igor Ganichev
27
8
0
08 Mar 2019
Star-Transformer
Star-Transformer
Qipeng Guo
Xipeng Qiu
Pengfei Liu
Yunfan Shao
Xiangyang Xue
Zheng-Wei Zhang
16
262
0
25 Feb 2019
Self-Attentive Model for Headline Generation
Self-Attentive Model for Headline Generation
Daniil Gavrilov
Pavel Kalaidin
Valentin Malykh
LRM
9
54
0
23 Jan 2019
On the Turing Completeness of Modern Neural Network Architectures
On the Turing Completeness of Modern Neural Network Architectures
Jorge A. Pérez
Javier Marinkovic
Pablo Barceló
BDL
7
139
0
10 Jan 2019
Layer Flexible Adaptive Computational Time
Layer Flexible Adaptive Computational Time
Lida Zhang
Abdolghani Ebrahimi
Diego Klabjan
AI4CE
28
1
0
06 Dec 2018
Attending to Mathematical Language with Transformers
Attending to Mathematical Language with Transformers
A. Wangperawong
17
22
0
05 Dec 2018
Previous
123...1089
Next