ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1912.10077
  4. Cited By
Are Transformers universal approximators of sequence-to-sequence
  functions?

Are Transformers universal approximators of sequence-to-sequence functions?

20 December 2019
Chulhee Yun
Srinadh Bhojanapalli
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
ArXivPDFHTML

Papers citing "Are Transformers universal approximators of sequence-to-sequence functions?"

46 / 246 papers shown
Title
Abstraction, Reasoning and Deep Learning: A Study of the "Look and Say"
  Sequence
Abstraction, Reasoning and Deep Learning: A Study of the "Look and Say" Sequence
Wlodek Zadrozny
19
1
0
27 Sep 2021
Cross-lingual Transfer of Monolingual Models
Cross-lingual Transfer of Monolingual Models
Evangelia Gogoulou
Ariel Ekgren
T. Isbister
Magnus Sahlgren
31
18
0
15 Sep 2021
MATE: Multi-view Attention for Table Transformer Efficiency
MATE: Multi-view Attention for Table Transformer Efficiency
Julian Martin Eisenschlos
Maharshi Gor
Thomas Müller
William W. Cohen
LMTD
75
95
0
09 Sep 2021
Semantic-Based Self-Critical Training For Question Generation
Semantic-Based Self-Critical Training For Question Generation
Loïc Kwate Dassi
Kwate Dassi
27
0
0
26 Aug 2021
Auto-Parsing Network for Image Captioning and Visual Question Answering
Auto-Parsing Network for Image Captioning and Visual Question Answering
Xu Yang
Chongyang Gao
Hanwang Zhang
Jianfei Cai
24
35
0
24 Aug 2021
Statistically Meaningful Approximation: a Case Study on Approximating
  Turing Machines with Transformers
Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers
Colin Wei
Yining Chen
Tengyu Ma
21
88
0
28 Jul 2021
Conformer-based End-to-end Speech Recognition With Rotary Position
  Embedding
Conformer-based End-to-end Speech Recognition With Rotary Position Embedding
Shengqiang Li
Menglong Xu
Xiao-Lei Zhang
18
9
0
13 Jul 2021
Introducing Self-Attention to Target Attentive Graph Neural Networks
Introducing Self-Attention to Target Attentive Graph Neural Networks
Sai Mitheran
Abhinav Java
Surya Kant Sahu
Arshad Shaikh
21
9
0
04 Jul 2021
Saturated Transformers are Constant-Depth Threshold Circuits
Saturated Transformers are Constant-Depth Threshold Circuits
William Merrill
Ashish Sabharwal
Noah A. Smith
25
97
0
30 Jun 2021
ROPE: Reading Order Equivariant Positional Encoding for Graph-based
  Document Information Extraction
ROPE: Reading Order Equivariant Positional Encoding for Graph-based Document Information Extraction
Chen-Yu Lee
Chun-Liang Li
Chu Wang
Renshen Wang
Yasuhisa Fujii
Siyang Qin
Ashok Popat
Tomas Pfister
23
26
0
21 Jun 2021
Eigen Analysis of Self-Attention and its Reconstruction from Partial
  Computation
Eigen Analysis of Self-Attention and its Reconstruction from Partial Computation
Srinadh Bhojanapalli
Ayan Chakrabarti
Himanshu Jain
Sanjiv Kumar
Michal Lukasik
Andreas Veit
26
8
0
16 Jun 2021
Thinking Like Transformers
Thinking Like Transformers
Gail Weiss
Yoav Goldberg
Eran Yahav
AI4CE
35
128
0
13 Jun 2021
Rethinking Graph Transformers with Spectral Attention
Rethinking Graph Transformers with Spectral Attention
Devin Kreuzer
Dominique Beaini
William L. Hamilton
Vincent Létourneau
Prudencio Tossou
46
508
0
07 Jun 2021
On the Expressive Power of Self-Attention Matrices
On the Expressive Power of Self-Attention Matrices
Valerii Likhosherstov
K. Choromanski
Adrian Weller
37
34
0
07 Jun 2021
Learning and Generalization in RNNs
Learning and Generalization in RNNs
A. Panigrahi
Navin Goyal
27
3
0
31 May 2021
Choose a Transformer: Fourier or Galerkin
Choose a Transformer: Fourier or Galerkin
Shuhao Cao
42
227
0
31 May 2021
Universal Adder Neural Networks
Universal Adder Neural Networks
Hanting Chen
Yunhe Wang
Chang Xu
Chao Xu
Chunjing Xu
Tong Zhang
37
3
0
29 May 2021
Self-Attention Networks Can Process Bounded Hierarchical Languages
Self-Attention Networks Can Process Bounded Hierarchical Languages
Shunyu Yao
Binghui Peng
Christos H. Papadimitriou
Karthik Narasimhan
39
74
0
24 May 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
46
2,204
0
20 Apr 2021
When FastText Pays Attention: Efficient Estimation of Word
  Representations using Constrained Positional Weighting
When FastText Pays Attention: Efficient Estimation of Word Representations using Constrained Positional Weighting
Vít Novotný
Michal Štefánik
E. F. Ayetiran
Petr Sojka
Radim Řehůřek
19
4
0
19 Apr 2021
A Simple and Effective Positional Encoding for Transformers
A Simple and Effective Positional Encoding for Transformers
Pu-Chin Chen
Henry Tsai
Srinadh Bhojanapalli
Hyung Won Chung
Yin-Wen Chang
Chun-Sung Ferng
61
62
0
18 Apr 2021
SparseBERT: Rethinking the Importance Analysis in Self-attention
SparseBERT: Rethinking the Importance Analysis in Self-attention
Han Shi
Jiahui Gao
Xiaozhe Ren
Hang Xu
Xiaodan Liang
Zhenguo Li
James T. Kwok
23
54
0
25 Feb 2021
Less is More: Pre-train a Strong Text Encoder for Dense Retrieval Using
  a Weak Decoder
Less is More: Pre-train a Strong Text Encoder for Dense Retrieval Using a Weak Decoder
Shuqi Lu
Di He
Chenyan Xiong
Guolin Ke
Waleed Malik
Zhicheng Dou
Paul N. Bennett
Tie-Yan Liu
Arnold Overwijk
RALM
43
11
0
18 Feb 2021
Axial Residual Networks for CycleGAN-based Voice Conversion
Axial Residual Networks for CycleGAN-based Voice Conversion
J. You
Gyuhyeon Nam
Dalhyun Kim
Gyeongsu Chae
16
3
0
16 Feb 2021
Neural Machine Translation: A Review of Methods, Resources, and Tools
Neural Machine Translation: A Review of Methods, Resources, and Tools
Zhixing Tan
Shuo Wang
Zonghan Yang
Gang Chen
Xuancheng Huang
Maosong Sun
Yang Liu
3DV
AI4TS
30
105
0
31 Dec 2020
Hurricane Forecasting: A Novel Multimodal Machine Learning Framework
Hurricane Forecasting: A Novel Multimodal Machine Learning Framework
L. Boussioux
C. Zeng
Théo Guénais
Dimitris Bertsimas
16
38
0
11 Nov 2020
Effects of Parameter Norm Growth During Transformer Training: Inductive
  Bias from Gradient Descent
Effects of Parameter Norm Growth During Transformer Training: Inductive Bias from Gradient Descent
William Merrill
Vivek Ramanujan
Yoav Goldberg
Roy Schwartz
Noah A. Smith
AI4CE
19
36
0
19 Oct 2020
Inductive Entity Representations from Text via Link Prediction
Inductive Entity Representations from Text via Link Prediction
Daniel Daza
Michael Cochez
Paul T. Groth
19
109
0
07 Oct 2020
Deep Representation Learning of Patient Data from Electronic Health
  Records (EHR): A Systematic Review
Deep Representation Learning of Patient Data from Electronic Health Records (EHR): A Systematic Review
Yuqi Si
Jingcheng Du
Zhao Li
Xiaoqian Jiang
T. Miller
Fei Wang
W. J. Zheng
Kirk Roberts
OOD
21
154
0
06 Oct 2020
Attention-Based Clustering: Learning a Kernel from Context
Attention-Based Clustering: Learning a Kernel from Context
Samuel Coward
Erik Visse-Martindale
Chithrupa Ramesh
6
3
0
02 Oct 2020
On the Ability and Limitations of Transformers to Recognize Formal
  Languages
On the Ability and Limitations of Transformers to Recognize Formal Languages
S. Bhattamishra
Kabir Ahuja
Navin Goyal
11
10
0
23 Sep 2020
Trojaning Language Models for Fun and Profit
Trojaning Language Models for Fun and Profit
Xinyang Zhang
Zheng-Wei Zhang
Shouling Ji
Ting Wang
SILM
AAML
22
132
0
01 Aug 2020
UNIPoint: Universally Approximating Point Processes Intensities
UNIPoint: Universally Approximating Point Processes Intensities
Alexander Soen
A. Mathews
Daniel Grixti-Cheng
Lexing Xie
3DPC
6
0
0
28 Jul 2020
Big Bird: Transformers for Longer Sequences
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
288
2,023
0
28 Jul 2020
Compositional Generalization in Semantic Parsing: Pre-training vs.
  Specialized Architectures
Compositional Generalization in Semantic Parsing: Pre-training vs. Specialized Architectures
Daniel Furrer
Marc van Zee
Nathan Scales
Nathanael Scharli
CoGe
26
113
0
17 Jul 2020
Can neural networks acquire a structural bias from raw linguistic data?
Can neural networks acquire a structural bias from raw linguistic data?
Alex Warstadt
Samuel R. Bowman
AI4CE
20
53
0
14 Jul 2020
An EM Approach to Non-autoregressive Conditional Sequence Generation
An EM Approach to Non-autoregressive Conditional Sequence Generation
Zhiqing Sun
Yiming Yang
14
42
0
29 Jun 2020
The Depth-to-Width Interplay in Self-Attention
The Depth-to-Width Interplay in Self-Attention
Yoav Levine
Noam Wies
Or Sharir
Hofit Bata
Amnon Shashua
30
45
0
22 Jun 2020
On the Computational Power of Transformers and its Implications in
  Sequence Modeling
On the Computational Power of Transformers and its Implications in Sequence Modeling
S. Bhattamishra
Arkil Patel
Navin Goyal
33
65
0
16 Jun 2020
$O(n)$ Connections are Expressive Enough: Universal Approximability of
  Sparse Transformers
O(n)O(n)O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers
Chulhee Yun
Yin-Wen Chang
Srinadh Bhojanapalli
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
6
78
0
08 Jun 2020
Normalized Attention Without Probability Cage
Normalized Attention Without Probability Cage
Oliver Richter
Roger Wattenhofer
14
21
0
19 May 2020
TraDE: Transformers for Density Estimation
TraDE: Transformers for Density Estimation
Rasool Fakoor
Pratik Chaudhari
Jonas W. Mueller
Alex Smola
20
30
0
06 Apr 2020
Learning to Encode Position for Transformer with Continuous Dynamical
  Model
Learning to Encode Position for Transformer with Continuous Dynamical Model
Xuanqing Liu
Hsiang-Fu Yu
Inderjit Dhillon
Cho-Jui Hsieh
16
107
0
13 Mar 2020
Low-Rank Bottleneck in Multi-head Attention Models
Low-Rank Bottleneck in Multi-head Attention Models
Srinadh Bhojanapalli
Chulhee Yun
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
24
94
0
17 Feb 2020
Qatten: A General Framework for Cooperative Multiagent Reinforcement
  Learning
Qatten: A General Framework for Cooperative Multiagent Reinforcement Learning
Yaodong Yang
Jianye Hao
B. Liao
Kun Shao
Guangyong Chen
Wulong Liu
Hongyao Tang
OffRL
19
185
0
10 Feb 2020
Effective Approaches to Attention-based Neural Machine Translation
Effective Approaches to Attention-based Neural Machine Translation
Thang Luong
Hieu H. Pham
Christopher D. Manning
218
7,929
0
17 Aug 2015
Previous
12345