ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1807.03819
  4. Cited By
Universal Transformers

Universal Transformers

10 July 2018
Mostafa Dehghani
Stephan Gouws
Oriol Vinyals
Jakob Uszkoreit
Lukasz Kaiser
ArXivPDFHTML

Papers citing "Universal Transformers"

50 / 459 papers shown
Title
M3G: Multi-Granular Gesture Generator for Audio-Driven Full-Body Human Motion Synthesis
M3G: Multi-Granular Gesture Generator for Audio-Driven Full-Body Human Motion Synthesis
Zhizhuo Yin
Yuk Hang Tsui
Pan Hui
SLR
VGen
19
0
0
13 May 2025
Signatures of human-like processing in Transformer forward passes
Signatures of human-like processing in Transformer forward passes
Jennifer Hu
Michael A. Lepori
Michael Franke
AI4CE
153
0
0
18 Apr 2025
Fixed-Point RNNs: From Diagonal to Dense in a Few Iterations
Sajad Movahedi
Felix Sarnthein
Nicola Muca Cirone
Antonio Orvieto
48
2
0
13 Mar 2025
A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers
William Merrill
Ashish Sabharwal
55
4
0
05 Mar 2025
Revisiting Kernel Attention with Correlated Gaussian Process Representation
Revisiting Kernel Attention with Correlated Gaussian Process Representation
Long Minh Bui
Tho Tran Huu
Duy-Tung Dinh
T. Nguyen
Trong Nghia Hoang
46
2
0
27 Feb 2025
Reasoning with Latent Thoughts: On the Power of Looped Transformers
Reasoning with Latent Thoughts: On the Power of Looped Transformers
Nikunj Saunshi
Nishanth Dikkala
Zhiyuan Li
Sanjiv Kumar
Sashank J. Reddi
OffRL
LRM
AI4CE
56
10
0
24 Feb 2025
On the Robustness of Transformers against Context Hijacking for Linear Classification
On the Robustness of Transformers against Context Hijacking for Linear Classification
Tianle Li
Chenyang Zhang
Xingwu Chen
Yuan Cao
Difan Zou
72
0
0
24 Feb 2025
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao-quan Song
Yufa Zhou
96
18
0
21 Feb 2025
Self-Supervised Transformers as Iterative Solution Improvers for Constraint Satisfaction
Self-Supervised Transformers as Iterative Solution Improvers for Constraint Satisfaction
Yudong Xu
Wenhao Li
Scott Sanner
Elias Boutros Khalil
41
0
0
18 Feb 2025
Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning
Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning
Qifan Yu
Zhenyu He
Sijie Li
Xun Zhou
Jun Zhang
Jingjing Xu
Di He
OffRL
LRM
89
4
0
12 Feb 2025
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
Samira Abnar
Harshay Shah
Dan Busbridge
Alaaeldin Mohamed Elnouby Ali
J. Susskind
Vimal Thilak
MoE
LRM
39
5
0
28 Jan 2025
Merging Feed-Forward Sublayers for Compressed Transformers
Merging Feed-Forward Sublayers for Compressed Transformers
Neha Verma
Kenton W. Murray
Kevin Duh
AI4CE
50
0
0
10 Jan 2025
A novel framework for MCDM based on Z numbers and soft likelihood
  function
A novel framework for MCDM based on Z numbers and soft likelihood function
Yuanpeng He
43
0
0
26 Dec 2024
Learning Elementary Cellular Automata with Transformers
Learning Elementary Cellular Automata with Transformers
Mikhail Burtsev
75
1
0
02 Dec 2024
Scaling LLM Inference with Optimized Sample Compute Allocation
Scaling LLM Inference with Optimized Sample Compute Allocation
Kexun Zhang
Shang Zhou
Danqing Wang
William Yang Wang
Lei Li
50
9
0
29 Oct 2024
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Sangmin Bae
Adam Fisch
Hrayr Harutyunyan
Ziwei Ji
Seungyeon Kim
Tal Schuster
KELM
78
5
0
28 Oct 2024
Learning How Hard to Think: Input-Adaptive Allocation of LM Computation
Learning How Hard to Think: Input-Adaptive Allocation of LM Computation
Mehul Damani
Idan Shenfeld
Andi Peng
Andreea Bobu
Jacob Andreas
39
16
0
07 Oct 2024
Learning Semantic Structure through First-Order-Logic Translation
Learning Semantic Structure through First-Order-Logic Translation
Akshay Chaturvedi
Nicholas Asher
LRM
23
0
0
04 Oct 2024
Autoregressive Large Language Models are Computationally Universal
Autoregressive Large Language Models are Computationally Universal
Dale Schuurmans
Hanjun Dai
Francesco Zanini
33
2
0
04 Oct 2024
Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model
  Compression
Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression
Jingcun Wang
Yu-Guang Chen
Ing-Chao Lin
Bing Li
Grace Li Zhang
35
4
0
02 Oct 2024
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
Kevin Xu
Issei Sato
39
3
0
02 Oct 2024
On the Inductive Bias of Stacking Towards Improving Reasoning
On the Inductive Bias of Stacking Towards Improving Reasoning
Nikunj Saunshi
Stefani Karp
Shankar Krishnan
Sobhan Miryoosefi
Sashank J. Reddi
Sanjiv Kumar
LRM
AI4CE
34
4
0
27 Sep 2024
On the Design Space Between Transformers and Recursive Neural Nets
On the Design Space Between Transformers and Recursive Neural Nets
Jishnu Ray Chowdhury
Cornelia Caragea
32
0
0
03 Sep 2024
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Gagan Jain
Nidhi Hegde
Aditya Kusupati
Arsha Nagrani
Shyamal Buch
Prateek Jain
Anurag Arnab
Sujoy Paul
MoE
37
7
0
29 Jul 2024
Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions
Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions
Sarah Wiegreffe
Oyvind Tafjord
Yonatan Belinkov
Hanna Hajishirzi
Ashish Sabharwal
50
3
0
21 Jul 2024
Search, Examine and Early-Termination: Fake News Detection with
  Annotation-Free Evidences
Search, Examine and Early-Termination: Fake News Detection with Annotation-Free Evidences
Yuzhou Yang
Yangming Zhou
Qichao Ying
Zhenxing Qian
Xinpeng Zhang
52
1
0
10 Jul 2024
Mixture-of-Modules: Reinventing Transformers as Dynamic Assemblies of
  Modules
Mixture-of-Modules: Reinventing Transformers as Dynamic Assemblies of Modules
Zhuocheng Gong
Ang Lv
Jian-Yu Guan
Junxi Yan
Wei Yu Wu
Huishuai Zhang
Minlie Huang
Dongyan Zhao
Rui Yan
MoE
52
6
0
09 Jul 2024
Algorithmic Language Models with Neurally Compiled Libraries
Algorithmic Language Models with Neurally Compiled Libraries
Lucas Saldyt
Subbarao Kambhampati
LRM
56
0
0
06 Jul 2024
Increasing Model Capacity for Free: A Simple Strategy for Parameter
  Efficient Fine-tuning
Increasing Model Capacity for Free: A Simple Strategy for Parameter Efficient Fine-tuning
Haobo Song
Hao Zhao
Soumajit Majumder
Tao Lin
25
3
0
01 Jul 2024
Papez: Resource-Efficient Speech Separation with Auditory Working Memory
Papez: Resource-Efficient Speech Separation with Auditory Working Memory
Hyunseok Oh
Juheon Yi
Youngki Lee
19
2
0
01 Jul 2024
FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech
  Synthesis
FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis
Yinlin Guo
Yening Lv
Jinqiao Dou
Yan Zhang
Yuehai Wang
18
0
0
30 Jun 2024
The Remarkable Robustness of LLMs: Stages of Inference?
The Remarkable Robustness of LLMs: Stages of Inference?
Vedang Lad
Wes Gurnee
Max Tegmark
38
33
0
27 Jun 2024
Brain-Like Language Processing via a Shallow Untrained Multihead
  Attention Network
Brain-Like Language Processing via a Shallow Untrained Multihead Attention Network
Badr AlKhamissi
Greta Tuckute
Antoine Bosselut
Martin Schrimpf
73
6
0
21 Jun 2024
Elliptical Attention
Elliptical Attention
Stefan K. Nielsen
Laziz U. Abdullaev
R. Teo
Tan M. Nguyen
23
3
0
19 Jun 2024
Learning Iterative Reasoning through Energy Diffusion
Learning Iterative Reasoning through Energy Diffusion
Yilun Du
Jiayuan Mao
Joshua B. Tenenbaum
LRM
PINN
48
6
0
17 Jun 2024
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Namgyu Ho
Sangmin Bae
Taehyeon Kim
Hyunjik Jo
Yireun Kim
Tal Schuster
Adam Fisch
James Thorne
Se-Young Yun
45
8
0
04 Jun 2024
THREAD: Thinking Deeper with Recursive Spawning
THREAD: Thinking Deeper with Recursive Spawning
Philip Schroeder
Nathaniel Morgan
Hongyin Luo
James R. Glass
LRM
LLMAG
ReLM
40
1
0
27 May 2024
Transformers Can Do Arithmetic with the Right Embeddings
Transformers Can Do Arithmetic with the Right Embeddings
Sean McLeish
Arpit Bansal
Alex Stein
Neel Jain
John Kirchenbauer
...
B. Kailkhura
A. Bhatele
Jonas Geiping
Avi Schwarzschild
Tom Goldstein
50
28
0
27 May 2024
Zamba: A Compact 7B SSM Hybrid Model
Zamba: A Compact 7B SSM Hybrid Model
Paolo Glorioso
Quentin G. Anthony
Yury Tokpanov
James Whittington
Jonathan Pilault
Adam Ibrahim
Beren Millidge
30
45
0
26 May 2024
MoEUT: Mixture-of-Experts Universal Transformers
MoEUT: Mixture-of-Experts Universal Transformers
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
Christopher Potts
Christopher D. Manning
MoE
45
5
0
25 May 2024
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to
  the Edge of Generalization
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
Boshi Wang
Xiang Yue
Yu-Chuan Su
Huan Sun
LRM
29
41
0
23 May 2024
Towards a Theoretical Understanding of the 'Reversal Curse' via Training
  Dynamics
Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics
Hanlin Zhu
Baihe Huang
Shaolun Zhang
Michael I. Jordan
Jiantao Jiao
Yuandong Tian
Stuart Russell
LRM
AI4CE
49
13
0
07 May 2024
EfficientASR: Speech Recognition Network Compression via Attention
  Redundancy and Chunk-Level FFN Optimization
EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization
Jianzong Wang
Ziqi Liang
Xulong Zhang
Ning Cheng
Jing Xiao
35
0
0
30 Apr 2024
Enhancing Length Extrapolation in Sequential Models with
  Pointer-Augmented Neural Memory
Enhancing Length Extrapolation in Sequential Models with Pointer-Augmented Neural Memory
Hung Le
D. Nguyen
Kien Do
Svetha Venkatesh
T. Tran
30
0
0
18 Apr 2024
Accelerating Inference in Large Language Models with a Unified Layer
  Skipping Strategy
Accelerating Inference in Large Language Models with a Unified Layer Skipping Strategy
Yijin Liu
Fandong Meng
Jie Zhou
AI4CE
27
7
0
10 Apr 2024
Shortcut-connected Expert Parallelism for Accelerating
  Mixture-of-Experts
Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts
Weilin Cai
Juyong Jiang
Le Qin
Junwei Cui
Sunghun Kim
Jiayi Huang
53
7
0
07 Apr 2024
On Difficulties of Attention Factorization through Shared Memory
On Difficulties of Attention Factorization through Shared Memory
Uladzislau Yorsh
Martin Holevna
Ondrej Bojar
David Herel
23
0
0
31 Mar 2024
Adapting to Length Shift: FlexiLength Network for Trajectory Prediction
Adapting to Length Shift: FlexiLength Network for Trajectory Prediction
Yi Tian Xu
Yun Fu
38
11
0
31 Mar 2024
Towards Understanding the Relationship between In-context Learning and
  Compositional Generalization
Towards Understanding the Relationship between In-context Learning and Compositional Generalization
Sungjun Han
Sebastian Padó
CoGe
21
2
0
18 Mar 2024
Subhomogeneous Deep Equilibrium Models
Subhomogeneous Deep Equilibrium Models
Pietro Sittoni
Francesco Tudisco
29
0
0
01 Mar 2024
1234...8910
Next