ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1807.03819
  4. Cited By
Universal Transformers

Universal Transformers

10 July 2018
Mostafa Dehghani
Stephan Gouws
Oriol Vinyals
Jakob Uszkoreit
Lukasz Kaiser
ArXivPDFHTML

Papers citing "Universal Transformers"

50 / 459 papers shown
Title
Transformers are Expressive, But Are They Expressive Enough for
  Regression?
Transformers are Expressive, But Are They Expressive Enough for Regression?
Swaroop Nath
H. Khadilkar
Pushpak Bhattacharyya
31
3
0
23 Feb 2024
When in Doubt, Think Slow: Iterative Reasoning with Latent Imagination
When in Doubt, Think Slow: Iterative Reasoning with Latent Imagination
Martin A Benfeghoul
Umais Zahid
Qinghai Guo
Z. Fountas
OffRL
LRM
32
2
0
23 Feb 2024
Multimodal Transformer With a Low-Computational-Cost Guarantee
Multimodal Transformer With a Low-Computational-Cost Guarantee
Sungjin Park
Edward Choi
49
1
0
23 Feb 2024
Head-wise Shareable Attention for Large Language Models
Head-wise Shareable Attention for Large Language Models
Zouying Cao
Yifei Yang
Hai Zhao
41
4
0
19 Feb 2024
Limits of Transformer Language Models on Learning to Compose Algorithms
Limits of Transformer Language Models on Learning to Compose Algorithms
Jonathan Thomm
Aleksandar Terzić
Giacomo Camposampiero
Michael Hersche
Bernhard Schölkopf
Abbas Rahimi
39
3
0
08 Feb 2024
Salsa Fresca: Angular Embeddings and Pre-Training for ML Attacks on
  Learning With Errors
Salsa Fresca: Angular Embeddings and Pre-Training for ML Attacks on Learning With Errors
Samuel Stevens
Emily Wenger
C. Li
Niklas Nolte
Eshika Saxena
François Charton
Kristin E. Lauter
AAML
36
6
0
02 Feb 2024
Investigating Recurrent Transformers with Dynamic Halt
Investigating Recurrent Transformers with Dynamic Halt
Jishnu Ray Chowdhury
Cornelia Caragea
39
1
0
01 Feb 2024
A Comprehensive Survey of Compression Algorithms for Language Models
A Comprehensive Survey of Compression Algorithms for Language Models
Seungcheol Park
Jaehyeon Choi
Sojin Lee
U. Kang
MQ
29
12
0
27 Jan 2024
Freely Long-Thinking Transformer (FraiLT)
Freely Long-Thinking Transformer (FraiLT)
Akbay Tabak
15
0
0
21 Jan 2024
Interplay of Semantic Communication and Knowledge Learning
Interplay of Semantic Communication and Knowledge Learning
Fei Ni
Bingyan Wang
Rongpeng Li
Zhifeng Zhao
Honggang Zhang
29
0
0
18 Jan 2024
Preparing Lessons for Progressive Training on Language Models
Preparing Lessons for Progressive Training on Language Models
Yu Pan
Ye Yuan
Yichun Yin
Jiaxin Shi
Zenglin Xu
Ming Zhang
Lifeng Shang
Xin Jiang
Qun Liu
21
9
0
17 Jan 2024
Mitigating Over-smoothing in Transformers via Regularized Nonlocal
  Functionals
Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals
Tam Nguyen
Tan-Minh Nguyen
Richard G. Baraniuk
21
8
0
01 Dec 2023
INarIG: Iterative Non-autoregressive Instruct Generation Model For
  Word-Level Auto Completion
INarIG: Iterative Non-autoregressive Instruct Generation Model For Word-Level Auto Completion
Hengchao Shang
Zongyao Li
Daimeng Wei
Jiaxin Guo
Minghan Wang
Xiaoyu Chen
Lizhi Lei
Hao-Yu Yang
19
0
0
30 Nov 2023
Probabilistic Transformer: A Probabilistic Dependency Model for
  Contextual Word Representation
Probabilistic Transformer: A Probabilistic Dependency Model for Contextual Word Representation
Haoyi Wu
Kewei Tu
132
3
0
26 Nov 2023
The Impact of Depth on Compositional Generalization in Transformer
  Language Models
The Impact of Depth on Compositional Generalization in Transformer Language Models
Jackson Petty
Sjoerd van Steenkiste
Ishita Dasgupta
Fei Sha
Daniel H Garrette
Tal Linzen
AI4CE
VLM
23
16
0
30 Oct 2023
PartialFormer: Modeling Part Instead of Whole for Machine Translation
PartialFormer: Modeling Part Instead of Whole for Machine Translation
Tong Zheng
Bei Li
Huiwen Bao
Jiale Wang
Weiqiao Shan
Tong Xiao
Jingbo Zhu
MoE
AI4CE
16
0
0
23 Oct 2023
The Locality and Symmetry of Positional Encodings
The Locality and Symmetry of Positional Encodings
Lihu Chen
Gaël Varoquaux
Fabian M. Suchanek
36
0
0
19 Oct 2023
Boosting Inference Efficiency: Unleashing the Power of Parameter-Shared
  Pre-trained Language Models
Boosting Inference Efficiency: Unleashing the Power of Parameter-Shared Pre-trained Language Models
Weize Chen
Xiaoyue Xu
Xu Han
Yankai Lin
Ruobing Xie
Zhiyuan Liu
Maosong Sun
Jie Zhou
19
0
0
19 Oct 2023
DepWiGNN: A Depth-wise Graph Neural Network for Multi-hop Spatial
  Reasoning in Text
DepWiGNN: A Depth-wise Graph Neural Network for Multi-hop Spatial Reasoning in Text
Shuaiyi Li
Yang Deng
Wai Lam
30
2
0
19 Oct 2023
CoTFormer: More Tokens With Attention Make Up For Less Depth
CoTFormer: More Tokens With Attention Make Up For Less Depth
Amirkeivan Mohtashami
Matteo Pagliardini
Martin Jaggi
8
0
0
16 Oct 2023
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
MoE
22
18
0
16 Oct 2023
Adaptivity and Modularity for Efficient Generalization Over Task
  Complexity
Adaptivity and Modularity for Efficient Generalization Over Task Complexity
Samira Abnar
Omid Saremi
Laurent Dinh
Shantel Wilson
Miguel Angel Bautista
...
Vimal Thilak
Etai Littwin
Jiatao Gu
Josh Susskind
Samy Bengio
34
5
0
13 Oct 2023
Counting and Algorithmic Generalization with Transformers
Counting and Algorithmic Generalization with Transformers
Simon Ouellette
Rolf Pfister
Hansueli Jud
22
4
0
12 Oct 2023
The Temporal Structure of Language Processing in the Human Brain
  Corresponds to The Layered Hierarchy of Deep Language Models
The Temporal Structure of Language Processing in the Human Brain Corresponds to The Layered Hierarchy of Deep Language Models
Ariel Goldstein
Eric Ham
Mariano Schain
Samuel A. Nastase
Zaid Zada
...
Avinatan Hassidim
O. Devinsky
A. Flinker
Omer Levy
Uri Hasson
AI4CE
15
10
0
11 Oct 2023
Sparse Universal Transformer
Sparse Universal Transformer
Shawn Tan
Yikang Shen
Zhenfang Chen
Aaron Courville
Chuang Gan
MoE
32
13
0
11 Oct 2023
Optimizing Large Language Models to Expedite the Development of Smart
  Contracts
Optimizing Large Language Models to Expedite the Development of Smart Contracts
Nii Osae Osae Dade
Margaret Lartey-Quaye
Emmanuel Teye-Kofi Odonkor
Paul Ammah
27
4
0
08 Oct 2023
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Anna Langedijk
Hosein Mohebbi
Gabriele Sarti
Willem H. Zuidema
Jaap Jumelet
26
10
0
05 Oct 2023
ResidualTransformer: Residual Low-Rank Learning with Weight-Sharing for
  Transformer Layers
ResidualTransformer: Residual Low-Rank Learning with Weight-Sharing for Transformer Layers
Yiming Wang
Jinyu Li
16
4
0
03 Oct 2023
Federated Deep Equilibrium Learning: A Compact Shared Representation for
  Edge Communication Efficiency
Federated Deep Equilibrium Learning: A Compact Shared Representation for Edge Communication Efficiency
Long Tan Le
Tuan Dung Nguyen
Tung-Anh Nguyen
Choong Seon Hong
Nguyen H. Tran
FedML
26
0
0
27 Sep 2023
TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight
  Inheritance
TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance
Kan Wu
Houwen Peng
Zhenghong Zhou
Bin Xiao
Mengchen Liu
...
Xi
Xi Chen
Xinggang Wang
Hongyang Chao
Han Hu
VLM
OODD
29
53
0
21 Sep 2023
A Data Source for Reasoning Embodied Agents
A Data Source for Reasoning Embodied Agents
Jack Lanchantin
Sainbayar Sukhbaatar
Gabriel Synnaeve
Yuxuan Sun
Kavya Srinet
Arthur Szlam
LM&Ro
LRM
25
5
0
14 Sep 2023
One Wide Feedforward is All You Need
One Wide Feedforward is All You Need
Telmo Pires
António V. Lopes
Yannick Assogba
Hendra Setiawan
35
12
0
04 Sep 2023
Exemplar-Free Continual Transformer with Convolutions
Exemplar-Free Continual Transformer with Convolutions
Anurag Roy
Vinay K. Verma
Sravan Voonna
Kripabandhu Ghosh
Saptarshi Ghosh
Abir Das
CLL
BDL
15
10
0
22 Aug 2023
CausalLM is not optimal for in-context learning
CausalLM is not optimal for in-context learning
Nan Ding
Tomer Levinboim
Jialin Wu
Sebastian Goodman
Radu Soricut
24
23
0
14 Aug 2023
Layer-wise Representation Fusion for Compositional Generalization
Layer-wise Representation Fusion for Compositional Generalization
Yafang Zheng
Lei Lin
Shantao Liu
Binling Wang
Zhaohong Lai
Wenhao Rao
Biao Fu
Yidong Chen
Xiaodon Shi
AI4CE
43
2
0
20 Jul 2023
Efficient Beam Tree Recursion
Efficient Beam Tree Recursion
Jishnu Ray Chowdhury
Cornelia Caragea
29
3
0
20 Jul 2023
R-Cut: Enhancing Explainability in Vision Transformers with Relationship
  Weighted Out and Cut
R-Cut: Enhancing Explainability in Vision Transformers with Relationship Weighted Out and Cut
Yingjie Niu
Ming Ding
Maoning Ge
Robin Karlsson
Yuxiao Zhang
K. Takeda
ViT
26
3
0
18 Jul 2023
Revisiting Implicit Models: Sparsity Trade-offs Capability in
  Weight-tied Model for Vision Tasks
Revisiting Implicit Models: Sparsity Trade-offs Capability in Weight-tied Model for Vision Tasks
Haobo Song
Soumajit Majumder
Tao R. Lin
VLM
20
0
0
16 Jul 2023
Coupling Large Language Models with Logic Programming for Robust and
  General Reasoning from Text
Coupling Large Language Models with Logic Programming for Robust and General Reasoning from Text
Zhun Yang
Adam Ishay
Joohyung Lee
LRM
ELM
33
51
0
15 Jul 2023
A Hybrid System for Systematic Generalization in Simple Arithmetic
  Problems
A Hybrid System for Systematic Generalization in Simple Arithmetic Problems
Flavio Petruzzellis
Alberto Testolin
A. Sperduti
AIMat
LRM
37
1
0
29 Jun 2023
Partitioning-Guided K-Means: Extreme Empty Cluster Resolution for
  Extreme Model Compression
Partitioning-Guided K-Means: Extreme Empty Cluster Resolution for Extreme Model Compression
Tianhong Huang
Victor Agostinelli
Lizhong Chen
MQ
13
0
0
24 Jun 2023
Large Sequence Models for Sequential Decision-Making: A Survey
Large Sequence Models for Sequential Decision-Making: A Survey
Muning Wen
Runji Lin
Hanjing Wang
Yaodong Yang
Ying Wen
Luo Mai
J. Wang
Haifeng Zhang
Weinan Zhang
LM&Ro
LRM
37
35
0
24 Jun 2023
LightGlue: Local Feature Matching at Light Speed
LightGlue: Local Feature Matching at Light Speed
Philipp Lindenberger
Paul-Edouard Sarlin
Marc Pollefeys
3DV
VLM
14
394
0
23 Jun 2023
Max-Margin Token Selection in Attention Mechanism
Max-Margin Token Selection in Attention Mechanism
Davoud Ataee Tarzanagh
Yingcong Li
Xuechen Zhang
Samet Oymak
34
38
0
23 Jun 2023
SALSA VERDE: a machine learning attack on Learning With Errors with
  sparse small secrets
SALSA VERDE: a machine learning attack on Learning With Errors with sparse small secrets
Cathy Li
Emily Wenger
Zeyuan Allen-Zhu
François Charton
Kristin E. Lauter
AAML
25
10
0
20 Jun 2023
Trained Transformers Learn Linear Models In-Context
Trained Transformers Learn Linear Models In-Context
Ruiqi Zhang
Spencer Frei
Peter L. Bartlett
26
173
0
16 Jun 2023
Understanding Parameter Sharing in Transformers
Understanding Parameter Sharing in Transformers
Ye Lin
Mingxuan Wang
Zhexi Zhang
Xiaohui Wang
Tong Xiao
Jingbo Zhu
MoE
16
2
0
15 Jun 2023
MobileNMT: Enabling Translation in 15MB and 30ms
MobileNMT: Enabling Translation in 15MB and 30ms
Ye Lin
Xiaohui Wang
Zhexi Zhang
Mingxuan Wang
Tong Xiao
Jingbo Zhu
MQ
25
1
0
07 Jun 2023
A Mathematical Abstraction for Balancing the Trade-off Between
  Creativity and Reality in Large Language Models
A Mathematical Abstraction for Balancing the Trade-off Between Creativity and Reality in Large Language Models
Ritwik Sinha
Zhao-quan Song
Tianyi Zhou
22
23
0
04 Jun 2023
Exposing Attention Glitches with Flip-Flop Language Modeling
Exposing Attention Glitches with Flip-Flop Language Modeling
Bingbin Liu
Jordan T. Ash
Surbhi Goel
A. Krishnamurthy
Cyril Zhang
LRM
27
46
0
01 Jun 2023
Previous
12345...8910
Next