Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1807.03819
Cited By
Universal Transformers
10 July 2018
Mostafa Dehghani
Stephan Gouws
Oriol Vinyals
Jakob Uszkoreit
Lukasz Kaiser
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Universal Transformers"
50 / 459 papers shown
Title
Investigating the Limitations of Transformers with Simple Arithmetic Tasks
Rodrigo Nogueira
Zhiying Jiang
Jimmy J. Li
LRM
11
122
0
25 Feb 2021
Do Transformer Modifications Transfer Across Implementations and Applications?
Sharan Narang
Hyung Won Chung
Yi Tay
W. Fedus
Thibault Févry
...
Wei Li
Nan Ding
Jake Marcus
Adam Roberts
Colin Raffel
25
126
0
23 Feb 2021
Position Information in Transformers: An Overview
Philipp Dufter
Martin Schmitt
Hinrich Schütze
13
139
0
22 Feb 2021
On the Theory of Implicit Deep Learning: Global Convergence with Implicit Layers
Kenji Kawaguchi
PINN
28
42
0
15 Feb 2021
Dynamic Neural Networks: A Survey
Yizeng Han
Gao Huang
Shiji Song
Le Yang
Honghui Wang
Yulin Wang
3DH
AI4TS
AI4CE
18
621
0
09 Feb 2021
Distilling Large Language Models into Tiny and Effective Students using pQRNN
P. Kaliamoorthi
Aditya Siddhant
Edward Li
Melvin Johnson
MQ
19
17
0
21 Jan 2021
To Understand Representation of Layer-aware Sequence Encoders as Multi-order-graph
Sufeng Duan
Hai Zhao
MILM
22
0
0
16 Jan 2021
Neural Sequence-to-grid Module for Learning Symbolic Rules
Segwang Kim
Hyoungwook Nam
Joonyoung Kim
Kyomin Jung
NAI
64
11
0
13 Jan 2021
Of Non-Linearity and Commutativity in BERT
Sumu Zhao
Damian Pascual
Gino Brunner
Roger Wattenhofer
28
16
0
12 Jan 2021
I-BERT: Integer-only BERT Quantization
Sehoon Kim
A. Gholami
Z. Yao
Michael W. Mahoney
Kurt Keutzer
MQ
99
341
0
05 Jan 2021
An Efficient Transformer Decoder with Compressed Sub-layers
Yanyang Li
Ye Lin
Tong Xiao
Jingbo Zhu
25
29
0
03 Jan 2021
Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers
Machel Reid
Edison Marrese-Taylor
Y. Matsuo
MoE
14
48
0
01 Jan 2021
BinaryBERT: Pushing the Limit of BERT Quantization
Haoli Bai
Wei Zhang
Lu Hou
Lifeng Shang
Jing Jin
Xin Jiang
Qun Liu
Michael Lyu
Irwin King
MQ
142
221
0
31 Dec 2020
Implicit Feature Pyramid Network for Object Detection
Tiancai Wang
Xinming Zhang
Jian Sun
ObjD
13
27
0
25 Dec 2020
*-CFQ: Analyzing the Scalability of Machine Learning on a Compositional Task
Dmitry Tsarkov
Tibor Tihon
Nathan Scales
Nikola Momchev
Danila Sinopalnikov
Nathanael Scharli
16
17
0
15 Dec 2020
The Style-Content Duality of Attractiveness: Learning to Write Eye-Catching Headlines via Disentanglement
Li Mingzhe
Xiuying Chen
Min Yang
Shen Gao
Dongyan Zhao
Rui Yan
21
19
0
14 Dec 2020
Improving Task-Agnostic BERT Distillation with Layer Mapping Search
Xiaoqi Jiao
Huating Chang
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
Fang Wang
Qun Liu
21
12
0
11 Dec 2020
On the Binding Problem in Artificial Neural Networks
Klaus Greff
Sjoerd van Steenkiste
Jürgen Schmidhuber
OCL
224
254
0
09 Dec 2020
Parameter Efficient Multimodal Transformers for Video Representation Learning
Sangho Lee
Youngjae Yu
Gunhee Kim
Thomas Breuel
Jan Kautz
Yale Song
ViT
26
76
0
08 Dec 2020
Designing a Prospective COVID-19 Therapeutic with Reinforcement Learning
Marcin J. Skwark
Nicolás López Carranza
Thomas Pierrot
Joe Phillips
Slim Said
Alexandre Laterre
Amine Kerkeni
Uugur cSahin
Karim Beguir
9
4
0
03 Dec 2020
Learning Associative Inference Using Fast Weight Memory
Imanol Schlag
Tsendsuren Munkhdalai
Jürgen Schmidhuber
KELM
22
44
0
16 Nov 2020
A Hybrid Approach for Improved Low Resource Neural Machine Translation using Monolingual Data
Idris Abdulmumin
B. Galadanci
Abubakar Isa
Habeebah Adamu Kakudi
Ismaila Idris Sinan
8
6
0
14 Nov 2020
Communication-Cost Aware Microphone Selection For Neural Speech Enhancement with Ad-hoc Microphone Arrays
Jonah Casebeer
Jamshed Kaikaus
Paris Smaragdis
19
5
0
14 Nov 2020
Don't Read Too Much into It: Adaptive Computation for Open-Domain Question Answering
Yuxiang Wu
Sebastian Riedel
Pasquale Minervini
Pontus Stenetorp
19
8
0
10 Nov 2020
From Eye-blinks to State Construction: Diagnostic Benchmarks for Online Representation Learning
Banafsheh Rafiee
Zaheer Abbas
Sina Ghiassian
Raksha Kumaraswamy
R. Sutton
Elliot A. Ludvig
Adam White
OffRL
14
17
0
09 Nov 2020
Character-level Representations Improve DRS-based Semantic Parsing Even in the Age of BERT
Rik van Noord
Antonio Toral
Johan Bos
15
4
0
09 Nov 2020
N-ODE Transformer: A Depth-Adaptive Variant of the Transformer Using Neural Ordinary Differential Equations
Aaron Baier-Reinio
H. Sterck
16
9
0
22 Oct 2020
Lightweight, Dynamic Graph Convolutional Networks for AMR-to-Text Generation
Yan Zhang
Zhijiang Guo
Zhiyang Teng
Wei Lu
Shay B. Cohen
Zuozhu Liu
Lidong Bing
GNN
24
18
0
09 Oct 2020
Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks
Róbert Csordás
Sjoerd van Steenkiste
Jürgen Schmidhuber
42
87
0
05 Oct 2020
Which *BERT? A Survey Organizing Contextualized Encoders
Patrick Xia
Shijie Wu
Benjamin Van Durme
26
50
0
02 Oct 2020
Rethinking Attention with Performers
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
...
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
13
1,520
0
30 Sep 2020
Attention that does not Explain Away
Nan Ding
Xinjie Fan
Zhenzhong Lan
Dale Schuurmans
Radu Soricut
19
3
0
29 Sep 2020
TernaryBERT: Distillation-aware Ultra-low Bit BERT
Wei Zhang
Lu Hou
Yichun Yin
Lifeng Shang
Xiao Chen
Xin Jiang
Qun Liu
MQ
25
208
0
27 Sep 2020
Aggressive Language Detection with Joint Text Normalization via Adversarial Multi-task Learning
Shengqiong Wu
Hao Fei
Donghong Ji
23
5
0
19 Sep 2020
Current Limitations of Language Models: What You Need is Retrieval
Aran Komatsuzaki
LRM
6
3
0
15 Sep 2020
A Study of Genetic Algorithms for Hyperparameter Optimization of Neural Networks in Machine Translation
Keshav Ganapathy
6
8
0
15 Sep 2020
Efficient Transformers: A Survey
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
VLM
88
1,101
0
14 Sep 2020
DualDE: Dually Distilling Knowledge Graph Embedding for Faster and Cheaper Reasoning
Yushan Zhu
Wen Zhang
Mingyang Chen
Hui Chen
Xu-Xin Cheng
Wei Zhang
Huajun Chen Zhejiang University
14
27
0
13 Sep 2020
AMBERT: A Pre-trained Language Model with Multi-Grained Tokenization
Xinsong Zhang
Pengshuai Li
Hang Li
14
51
0
27 Aug 2020
SoDA: Multi-Object Tracking with Soft Data Association
Wei-Chih Hung
Henrik Kretzschmar
Nayeon Lee
Yuning Chai
Ruichi Yu
Ming-Hsuan Yang
Drago Anguelov
VOT
34
16
0
18 Aug 2020
An Ensemble of Knowledge Sharing Models for Dynamic Hand Gesture Recognition
K. Lai
Svetlana Yanushkevich
SLR
14
9
0
13 Aug 2020
Compression of Deep Learning Models for Text: A Survey
Manish Gupta
Puneet Agrawal
VLM
MedIm
AI4CE
12
115
0
12 Aug 2020
Self-attention encoding and pooling for speaker recognition
Pooyan Safari
Miquel India
Javier Hernando
ViT
14
81
0
03 Aug 2020
Distributed Associative Memory Network with Memory Refreshing Loss
Taewon Park
Inchul Choi
Minho Lee
CLL
23
6
0
21 Jul 2020
Compositional Generalization in Semantic Parsing: Pre-training vs. Specialized Architectures
Daniel Furrer
Marc van Zee
Nathan Scales
Nathanael Scharli
CoGe
18
113
0
17 Jul 2020
Hopfield Networks is All You Need
Hubert Ramsauer
Bernhard Schafl
Johannes Lehner
Philipp Seidl
Michael Widrich
...
David P. Kreil
Michael K Kopp
G. Klambauer
Johannes Brandstetter
Sepp Hochreiter
24
412
0
16 Jul 2020
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
39
1,665
0
29 Jun 2020
Compositional Generalization by Learning Analytical Expressions
Qian Liu
Shengnan An
Jian-Guang Lou
Bei Chen
Zeqi Lin
Yan Gao
Bin Zhou
Nanning Zheng
Dongmei Zhang
CoGe
NAI
20
72
0
18 Jun 2020
Neural Parameter Allocation Search
Bryan A. Plummer
Nikoli Dryden
Julius Frost
Torsten Hoefler
Kate Saenko
19
16
0
18 Jun 2020
Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation
Jungo Kasai
Nikolaos Pappas
Hao Peng
James Cross
Noah A. Smith
30
134
0
18 Jun 2020
Previous
1
2
3
...
10
6
7
8
9
Next