Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1912.10077
Cited By
Are Transformers universal approximators of sequence-to-sequence functions?
20 December 2019
Chulhee Yun
Srinadh Bhojanapalli
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Are Transformers universal approximators of sequence-to-sequence functions?"
46 / 246 papers shown
Title
Abstraction, Reasoning and Deep Learning: A Study of the "Look and Say" Sequence
Wlodek Zadrozny
19
1
0
27 Sep 2021
Cross-lingual Transfer of Monolingual Models
Evangelia Gogoulou
Ariel Ekgren
T. Isbister
Magnus Sahlgren
31
18
0
15 Sep 2021
MATE: Multi-view Attention for Table Transformer Efficiency
Julian Martin Eisenschlos
Maharshi Gor
Thomas Müller
William W. Cohen
LMTD
75
95
0
09 Sep 2021
Semantic-Based Self-Critical Training For Question Generation
Loïc Kwate Dassi
Kwate Dassi
27
0
0
26 Aug 2021
Auto-Parsing Network for Image Captioning and Visual Question Answering
Xu Yang
Chongyang Gao
Hanwang Zhang
Jianfei Cai
24
35
0
24 Aug 2021
Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers
Colin Wei
Yining Chen
Tengyu Ma
21
88
0
28 Jul 2021
Conformer-based End-to-end Speech Recognition With Rotary Position Embedding
Shengqiang Li
Menglong Xu
Xiao-Lei Zhang
18
9
0
13 Jul 2021
Introducing Self-Attention to Target Attentive Graph Neural Networks
Sai Mitheran
Abhinav Java
Surya Kant Sahu
Arshad Shaikh
21
9
0
04 Jul 2021
Saturated Transformers are Constant-Depth Threshold Circuits
William Merrill
Ashish Sabharwal
Noah A. Smith
25
97
0
30 Jun 2021
ROPE: Reading Order Equivariant Positional Encoding for Graph-based Document Information Extraction
Chen-Yu Lee
Chun-Liang Li
Chu Wang
Renshen Wang
Yasuhisa Fujii
Siyang Qin
Ashok Popat
Tomas Pfister
23
26
0
21 Jun 2021
Eigen Analysis of Self-Attention and its Reconstruction from Partial Computation
Srinadh Bhojanapalli
Ayan Chakrabarti
Himanshu Jain
Sanjiv Kumar
Michal Lukasik
Andreas Veit
26
8
0
16 Jun 2021
Thinking Like Transformers
Gail Weiss
Yoav Goldberg
Eran Yahav
AI4CE
35
128
0
13 Jun 2021
Rethinking Graph Transformers with Spectral Attention
Devin Kreuzer
Dominique Beaini
William L. Hamilton
Vincent Létourneau
Prudencio Tossou
46
508
0
07 Jun 2021
On the Expressive Power of Self-Attention Matrices
Valerii Likhosherstov
K. Choromanski
Adrian Weller
37
34
0
07 Jun 2021
Learning and Generalization in RNNs
A. Panigrahi
Navin Goyal
27
3
0
31 May 2021
Choose a Transformer: Fourier or Galerkin
Shuhao Cao
42
227
0
31 May 2021
Universal Adder Neural Networks
Hanting Chen
Yunhe Wang
Chang Xu
Chao Xu
Chunjing Xu
Tong Zhang
37
3
0
29 May 2021
Self-Attention Networks Can Process Bounded Hierarchical Languages
Shunyu Yao
Binghui Peng
Christos H. Papadimitriou
Karthik Narasimhan
39
74
0
24 May 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
46
2,204
0
20 Apr 2021
When FastText Pays Attention: Efficient Estimation of Word Representations using Constrained Positional Weighting
Vít Novotný
Michal Štefánik
E. F. Ayetiran
Petr Sojka
Radim Řehůřek
19
4
0
19 Apr 2021
A Simple and Effective Positional Encoding for Transformers
Pu-Chin Chen
Henry Tsai
Srinadh Bhojanapalli
Hyung Won Chung
Yin-Wen Chang
Chun-Sung Ferng
61
62
0
18 Apr 2021
SparseBERT: Rethinking the Importance Analysis in Self-attention
Han Shi
Jiahui Gao
Xiaozhe Ren
Hang Xu
Xiaodan Liang
Zhenguo Li
James T. Kwok
23
54
0
25 Feb 2021
Less is More: Pre-train a Strong Text Encoder for Dense Retrieval Using a Weak Decoder
Shuqi Lu
Di He
Chenyan Xiong
Guolin Ke
Waleed Malik
Zhicheng Dou
Paul N. Bennett
Tie-Yan Liu
Arnold Overwijk
RALM
43
11
0
18 Feb 2021
Axial Residual Networks for CycleGAN-based Voice Conversion
J. You
Gyuhyeon Nam
Dalhyun Kim
Gyeongsu Chae
16
3
0
16 Feb 2021
Neural Machine Translation: A Review of Methods, Resources, and Tools
Zhixing Tan
Shuo Wang
Zonghan Yang
Gang Chen
Xuancheng Huang
Maosong Sun
Yang Liu
3DV
AI4TS
30
105
0
31 Dec 2020
Hurricane Forecasting: A Novel Multimodal Machine Learning Framework
L. Boussioux
C. Zeng
Théo Guénais
Dimitris Bertsimas
16
38
0
11 Nov 2020
Effects of Parameter Norm Growth During Transformer Training: Inductive Bias from Gradient Descent
William Merrill
Vivek Ramanujan
Yoav Goldberg
Roy Schwartz
Noah A. Smith
AI4CE
19
36
0
19 Oct 2020
Inductive Entity Representations from Text via Link Prediction
Daniel Daza
Michael Cochez
Paul T. Groth
19
109
0
07 Oct 2020
Deep Representation Learning of Patient Data from Electronic Health Records (EHR): A Systematic Review
Yuqi Si
Jingcheng Du
Zhao Li
Xiaoqian Jiang
T. Miller
Fei Wang
W. J. Zheng
Kirk Roberts
OOD
21
154
0
06 Oct 2020
Attention-Based Clustering: Learning a Kernel from Context
Samuel Coward
Erik Visse-Martindale
Chithrupa Ramesh
6
3
0
02 Oct 2020
On the Ability and Limitations of Transformers to Recognize Formal Languages
S. Bhattamishra
Kabir Ahuja
Navin Goyal
11
10
0
23 Sep 2020
Trojaning Language Models for Fun and Profit
Xinyang Zhang
Zheng-Wei Zhang
Shouling Ji
Ting Wang
SILM
AAML
22
132
0
01 Aug 2020
UNIPoint: Universally Approximating Point Processes Intensities
Alexander Soen
A. Mathews
Daniel Grixti-Cheng
Lexing Xie
3DPC
6
0
0
28 Jul 2020
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
288
2,023
0
28 Jul 2020
Compositional Generalization in Semantic Parsing: Pre-training vs. Specialized Architectures
Daniel Furrer
Marc van Zee
Nathan Scales
Nathanael Scharli
CoGe
26
113
0
17 Jul 2020
Can neural networks acquire a structural bias from raw linguistic data?
Alex Warstadt
Samuel R. Bowman
AI4CE
20
53
0
14 Jul 2020
An EM Approach to Non-autoregressive Conditional Sequence Generation
Zhiqing Sun
Yiming Yang
14
42
0
29 Jun 2020
The Depth-to-Width Interplay in Self-Attention
Yoav Levine
Noam Wies
Or Sharir
Hofit Bata
Amnon Shashua
30
45
0
22 Jun 2020
On the Computational Power of Transformers and its Implications in Sequence Modeling
S. Bhattamishra
Arkil Patel
Navin Goyal
33
65
0
16 Jun 2020
O
(
n
)
O(n)
O
(
n
)
Connections are Expressive Enough: Universal Approximability of Sparse Transformers
Chulhee Yun
Yin-Wen Chang
Srinadh Bhojanapalli
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
6
78
0
08 Jun 2020
Normalized Attention Without Probability Cage
Oliver Richter
Roger Wattenhofer
14
21
0
19 May 2020
TraDE: Transformers for Density Estimation
Rasool Fakoor
Pratik Chaudhari
Jonas W. Mueller
Alex Smola
20
30
0
06 Apr 2020
Learning to Encode Position for Transformer with Continuous Dynamical Model
Xuanqing Liu
Hsiang-Fu Yu
Inderjit Dhillon
Cho-Jui Hsieh
16
107
0
13 Mar 2020
Low-Rank Bottleneck in Multi-head Attention Models
Srinadh Bhojanapalli
Chulhee Yun
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
24
94
0
17 Feb 2020
Qatten: A General Framework for Cooperative Multiagent Reinforcement Learning
Yaodong Yang
Jianye Hao
B. Liao
Kun Shao
Guangyong Chen
Wulong Liu
Hongyao Tang
OffRL
19
185
0
10 Feb 2020
Effective Approaches to Attention-based Neural Machine Translation
Thang Luong
Hieu H. Pham
Christopher D. Manning
218
7,929
0
17 Aug 2015
Previous
1
2
3
4
5