Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.24531
Cited By
Transformers Are Universally Consistent
30 May 2025
Sagar Ghosh
Kushal Bose
Swagatam Das
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Transformers Are Universally Consistent"
24 / 24 papers shown
Title
On the Universal Statistical Consistency of Expansive Hyperbolic Deep Convolutional Neural Networks
Sagar Ghosh
Kushal Bose
Swagatam Das
25
1
0
15 Nov 2024
A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models
Hanqing Zhang
Haolin Song
Shaoyu Li
Ming Zhou
Dawei Song
69
219
0
14 Jan 2022
Universal Consistency of Deep Convolutional Neural Networks
Shao-Bo Lin
Kaidong Wang
Yao Wang
Ding-Xuan Zhou
34
23
0
23 Jun 2021
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
92
1,101
0
08 Jun 2021
Decision Transformer: Reinforcement Learning via Sequence Modeling
Lili Chen
Kevin Lu
Aravind Rajeswaran
Kimin Lee
Aditya Grover
Michael Laskin
Pieter Abbeel
A. Srinivas
Igor Mordatch
OffRL
77
1,608
0
02 Jun 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
403
40,217
0
22 Oct 2020
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
484
2,051
0
28 Jul 2020
Conformer: Convolution-augmented Transformer for Speech Recognition
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
...
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
190
3,082
0
16 May 2020
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
100
3,996
0
10 Apr 2020
Sparse Sinkhorn Attention
Yi Tay
Dara Bahri
Liu Yang
Donald Metzler
Da-Cheng Juan
63
336
0
26 Feb 2020
Are Transformers universal approximators of sequence-to-sequence functions?
Chulhee Yun
Srinadh Bhojanapalli
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
83
347
0
20 Dec 2019
Axial Attention in Multidimensional Transformers
Jonathan Ho
Nal Kalchbrenner
Dirk Weissenborn
Tim Salimans
81
525
0
20 Dec 2019
Improving Multi-Head Attention with Capsule Networks
Shuhao Gu
Yang Feng
42
13
0
31 Aug 2019
What Does BERT Look At? An Analysis of BERT's Attention
Kevin Clark
Urvashi Khandelwal
Omer Levy
Christopher D. Manning
MILM
186
1,586
0
11 Jun 2019
Learning Deep Transformer Models for Machine Translation
Qiang Wang
Bei Li
Tong Xiao
Jingbo Zhu
Changliang Li
Derek F. Wong
Lidia S. Chao
59
666
0
05 Jun 2019
Generating Long Sequences with Sparse Transformers
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
76
1,880
0
23 Apr 2019
Universal approximations of permutation invariant/equivariant functions by deep neural networks
Akiyoshi Sannai
Yuuki Takai
Matthieu Cordonnier
43
68
0
05 Mar 2019
Star-Transformer
Qipeng Guo
Xipeng Qiu
Pengfei Liu
Yunfan Shao
Xiangyang Xue
Zheng Zhang
58
264
0
25 Feb 2019
Molecular Transformer - A Model for Uncertainty-Calibrated Chemical Reaction Prediction
P. Schwaller
Teodoro Laino
John McGuinness
A. Horváth
Constantine Bekas
A. Lee
91
730
0
06 Nov 2018
ResNet with one-neuron hidden layers is a Universal Approximator
Hongzhou Lin
Stefanie Jegelka
65
227
0
28 Jun 2018
Universality of Deep Convolutional Neural Networks
Ding-Xuan Zhou
HAI
PINN
238
514
0
28 May 2018
The Expressive Power of Neural Networks: A View from the Width
Zhou Lu
Hongming Pu
Feicheng Wang
Zhiqiang Hu
Liwei Wang
67
886
0
08 Sep 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
453
129,831
0
12 Jun 2017
Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks
Peter L. Bartlett
Nick Harvey
Christopher Liaw
Abbas Mehrabian
142
427
0
08 Mar 2017
1