ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1807.03819
  4. Cited By
Universal Transformers

Universal Transformers

10 July 2018
Mostafa Dehghani
Stephan Gouws
Oriol Vinyals
Jakob Uszkoreit
Lukasz Kaiser
ArXivPDFHTML

Papers citing "Universal Transformers"

50 / 459 papers shown
Title
Monotonic Location Attention for Length Generalization
Monotonic Location Attention for Length Generalization
Jishnu Ray Chowdhury
Cornelia Caragea
LLMAG
19
8
0
31 May 2023
The Tunnel Effect: Building Data Representations in Deep Neural Networks
The Tunnel Effect: Building Data Representations in Deep Neural Networks
Wojciech Masarczyk
M. Ostaszewski
Ehsan Imani
Razvan Pascanu
Piotr Milo's
Tomasz Trzciñski
28
18
0
31 May 2023
Randomized Positional Encodings Boost Length Generalization of
  Transformers
Randomized Positional Encodings Boost Length Generalization of Transformers
Anian Ruoss
Grégoire Delétang
Tim Genewein
Jordi Grau-Moya
Róbert Csordás
Mehdi Abbana Bennani
Shane Legg
J. Veness
LLMAG
36
99
0
26 May 2023
Scaling Data-Constrained Language Models
Scaling Data-Constrained Language Models
Niklas Muennighoff
Alexander M. Rush
Boaz Barak
Teven Le Scao
Aleksandra Piktus
Nouamane Tazi
S. Pyysalo
Thomas Wolf
Colin Raffel
ALM
27
197
0
25 May 2023
Scan and Snap: Understanding Training Dynamics and Token Composition in
  1-layer Transformer
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer
Yuandong Tian
Yiping Wang
Beidi Chen
S. Du
MLT
26
70
0
25 May 2023
Towards Revealing the Mystery behind Chain of Thought: A Theoretical
  Perspective
Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective
Guhao Feng
Bohang Zhang
Yuntian Gu
Haotian Ye
Di He
Liwei Wang
LRM
27
215
0
24 May 2023
Revisiting Token Dropping Strategy in Efficient BERT Pretraining
Revisiting Token Dropping Strategy in Efficient BERT Pretraining
Qihuang Zhong
Liang Ding
Juhua Liu
Xuebo Liu
Min Zhang
Bo Du
Dacheng Tao
VLM
31
9
0
24 May 2023
Can Transformers Learn to Solve Problems Recursively?
Can Transformers Learn to Solve Problems Recursively?
Shizhuo Zhang
Curt Tigges
Stella Biderman
Maxim Raginsky
Talia Ringer
15
13
0
24 May 2023
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis
Fuzhao Xue
Yao Fu
Wangchunshu Zhou
Zangwei Zheng
Yang You
83
76
0
22 May 2023
F-PABEE: Flexible-patience-based Early Exiting for Single-label and
  Multi-label text Classification Tasks
F-PABEE: Flexible-patience-based Early Exiting for Single-label and Multi-label text Classification Tasks
Xiangxiang Gao
Wei-wei Zhu
Jiasheng Gao
Congrui Yin
VLM
26
12
0
21 May 2023
Learning to Compose Representations of Different Encoder Layers towards
  Improving Compositional Generalization
Learning to Compose Representations of Different Encoder Layers towards Improving Compositional Generalization
Lei Lin
Shuangtao Li
Yafang Zheng
Biao Fu
Shantao Liu
Yidong Chen
Xiaodon Shi
CoGe
25
2
0
20 May 2023
Soft Prompt Decoding for Multilingual Dense Retrieval
Soft Prompt Decoding for Multilingual Dense Retrieval
Zhiqi Huang
Hansi Zeng
Hamed Zamani
James Allan
RALM
63
13
0
15 May 2023
Code Execution with Pre-trained Language Models
Code Execution with Pre-trained Language Models
Chenxiao Liu
Shuai Lu
Weizhu Chen
Daxin Jiang
Alexey Svyatkovskiy
Shengyu Fu
Neel Sundaresan
Nan Duan
ELM
22
21
0
08 May 2023
Leveraging Synthetic Targets for Machine Translation
Leveraging Synthetic Targets for Machine Translation
Sarthak Mittal
Oleksii Hrinchuk
Oleksii Kuchaiev
26
2
0
07 May 2023
Transformer Working Memory Enables Regular Language Reasoning and
  Natural Language Length Extrapolation
Transformer Working Memory Enables Regular Language Reasoning and Natural Language Length Extrapolation
Ta-Chung Chi
Ting-Han Fan
Alexander I. Rudnicky
Peter J. Ramadge
LRM
14
12
0
05 May 2023
Approximating CKY with Transformers
Approximating CKY with Transformers
Ghazal Khalighinejad
Ollie Liu
Sam Wiseman
52
2
0
03 May 2023
Learning to Reason and Memorize with Self-Notes
Learning to Reason and Memorize with Self-Notes
Jack Lanchantin
Shubham Toshniwal
Jason Weston
Arthur Szlam
Sainbayar Sukhbaatar
ReLM
LRM
LLMAG
88
29
0
01 May 2023
The Closeness of In-Context Learning and Weight Shifting for Softmax
  Regression
The Closeness of In-Context Learning and Weight Shifting for Softmax Regression
Shuai Li
Zhao-quan Song
Yu Xia
Tong Yu
Tianyi Zhou
30
36
0
26 Apr 2023
Sim-T: Simplify the Transformer Network by Multiplexing Technique for
  Speech Recognition
Sim-T: Simplify the Transformer Network by Multiplexing Technique for Speech Recognition
Guangyong Wei
Zhikui Duan
Shiren Li
Guangguang Yang
Xinmei Yu
Junhua Li
22
4
0
11 Apr 2023
Decoder-Only or Encoder-Decoder? Interpreting Language Model as a
  Regularized Encoder-Decoder
Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder
Z. Fu
W. Lam
Qian Yu
Anthony Man-Cho So
Shengding Hu
Zhiyuan Liu
Nigel Collier
AuLLM
34
41
0
08 Apr 2023
SwiftTron: An Efficient Hardware Accelerator for Quantized Transformers
SwiftTron: An Efficient Hardware Accelerator for Quantized Transformers
Alberto Marchisio
David Durà
Maurizio Capra
Maurizio Martina
Guido Masera
Muhammad Shafique
33
18
0
08 Apr 2023
An Over-parameterized Exponential Regression
An Over-parameterized Exponential Regression
Yeqi Gao
Sridhar Mahadevan
Zhao-quan Song
16
35
0
29 Mar 2023
Solving Regularized Exp, Cosh and Sinh Regression Problems
Solving Regularized Exp, Cosh and Sinh Regression Problems
Zhihang Li
Zhao-quan Song
Tianyi Zhou
23
39
0
28 Mar 2023
Beyond Universal Transformer: block reusing with adaptor in Transformer
  for automatic speech recognition
Beyond Universal Transformer: block reusing with adaptor in Transformer for automatic speech recognition
Haoyu Tang
Zhaoyi Liu
Chang Zeng
Xinfeng Li
34
1
0
23 Mar 2023
CerviFormer: A Pap-smear based cervical cancer classification method
  using cross attention and latent transformer
CerviFormer: A Pap-smear based cervical cancer classification method using cross attention and latent transformer
Bhaswati Singha Deo
M. Pal
P. Panigrahi
A. Pradhan
MedIm
33
22
0
17 Mar 2023
RenewNAT: Renewing Potential Translation for Non-Autoregressive
  Transformer
RenewNAT: Renewing Potential Translation for Non-Autoregressive Transformer
Pei Guo
Yisheng Xiao
Juntao Li
M. Zhang
24
6
0
14 Mar 2023
On the Expressiveness and Generalization of Hypergraph Neural Networks
On the Expressiveness and Generalization of Hypergraph Neural Networks
Zhezheng Luo
Jiayuan Mao
J. Tenenbaum
L. Kaelbling
NAI
AI4CE
28
5
0
09 Mar 2023
SALSA PICANTE: a machine learning attack on LWE with binary secrets
SALSA PICANTE: a machine learning attack on LWE with binary secrets
Cathy Li
Jana Sotáková
Emily Wenger
Mohamed Malhou
Evrard Garcelon
François Charton
Kristin E. Lauter
AAML
30
14
0
07 Mar 2023
Depression Detection Using Digital Traces on Social Media: A
  Knowledge-aware Deep Learning Approach
Depression Detection Using Digital Traces on Social Media: A Knowledge-aware Deep Learning Approach
Wenli Zhang
Jiaheng Xie
Zhuocheng Zhang
Xiang Liu
24
9
0
06 Mar 2023
Neural Algorithmic Reasoning with Causal Regularisation
Neural Algorithmic Reasoning with Causal Regularisation
Beatrice Bevilacqua
Kyriacos Nikiforou
Borja Ibarz
Ioana Bica
Michela Paganini
Charles Blundell
Jovana Mitrović
Petar Velivcković
OOD
CML
NAI
36
26
0
20 Feb 2023
Neural Attention Memory
Neural Attention Memory
Hyoungwook Nam
S. Seo
HAI
27
0
0
18 Feb 2023
A Theoretical Understanding of Shallow Vision Transformers: Learning,
  Generalization, and Sample Complexity
A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity
Hongkang Li
M. Wang
Sijia Liu
Pin-Yu Chen
ViT
MLT
37
57
0
12 Feb 2023
Towards energy-efficient Deep Learning: An overview of energy-efficient
  approaches along the Deep Learning Lifecycle
Towards energy-efficient Deep Learning: An overview of energy-efficient approaches along the Deep Learning Lifecycle
Vanessa Mehlin
Sigurd Schacht
Carsten Lanquillon
HAI
MedIm
33
19
0
05 Feb 2023
Learning a Fourier Transform for Linear Relative Positional Encodings in
  Transformers
Learning a Fourier Transform for Linear Relative Positional Encodings in Transformers
K. Choromanski
Shanda Li
Valerii Likhosherstov
Kumar Avinava Dubey
Shengjie Luo
Di He
Yiming Yang
Tamás Sarlós
Thomas Weingarten
Adrian Weller
34
8
0
03 Feb 2023
Entity-Agnostic Representation Learning for Parameter-Efficient
  Knowledge Graph Embedding
Entity-Agnostic Representation Learning for Parameter-Efficient Knowledge Graph Embedding
Mingyang Chen
Wen Zhang
Zhen Yao
Yushan Zhu
Yang Gao
Jeff Z. Pan
Hua-zeng Chen
28
10
0
03 Feb 2023
Looped Transformers as Programmable Computers
Looped Transformers as Programmable Computers
Angeliki Giannou
Shashank Rajput
Jy-yong Sohn
Kangwook Lee
Jason D. Lee
Dimitris Papailiopoulos
10
94
0
30 Jan 2023
Adaptive Computation with Elastic Input Sequence
Adaptive Computation with Elastic Input Sequence
Fuzhao Xue
Valerii Likhosherstov
Anurag Arnab
N. Houlsby
Mostafa Dehghani
Yang You
31
18
0
30 Jan 2023
Understanding INT4 Quantization for Transformer Models: Latency Speedup,
  Composability, and Failure Cases
Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
Xiaoxia Wu
Cheng-rong Li
Reza Yazdani Aminabadi
Z. Yao
Yuxiong He
MQ
11
19
0
27 Jan 2023
Towards Autoformalization of Mathematics and Code Correctness:
  Experiments with Elementary Proofs
Towards Autoformalization of Mathematics and Code Correctness: Experiments with Elementary Proofs
Garett Cunningham
Razvan C. Bunescu
D. Juedes
LRM
18
16
0
05 Jan 2023
Circumventing interpretability: How to defeat mind-readers
Circumventing interpretability: How to defeat mind-readers
Lee D. Sharkey
35
3
0
21 Dec 2022
Contrastive Distillation Is a Sample-Efficient Self-Supervised Loss
  Policy for Transfer Learning
Contrastive Distillation Is a Sample-Efficient Self-Supervised Loss Policy for Transfer Learning
Christopher T. Lengerich
Gabriel Synnaeve
Amy Zhang
Hugh Leather
Kurt Shuster
Franccois Charton
Charysse Redwood
SSL
OffRL
24
1
0
21 Dec 2022
Semantics-Empowered Communication: A Tutorial-cum-Survey
Semantics-Empowered Communication: A Tutorial-cum-Survey
Zhilin Lu
Rongpeng Li
Kun Lu
Xianfu Chen
E. Hossain
Zhifeng Zhao
Honggang Zhang
34
19
0
16 Dec 2022
P-Transformer: Towards Better Document-to-Document Neural Machine
  Translation
P-Transformer: Towards Better Document-to-Document Neural Machine Translation
Yachao Li
Junhui Li
Jing Jiang
Shimin Tao
Hao-Yu Yang
M. Zhang
ViT
25
9
0
12 Dec 2022
Vision Transformer Computation and Resilience for Dynamic Inference
Vision Transformer Computation and Resilience for Dynamic Inference
Kavya Sreedhar
Jason Clemons
Rangharajan Venkatesan
S. Keckler
M. Horowitz
24
2
0
06 Dec 2022
Language Models as Agent Models
Language Models as Agent Models
Jacob Andreas
LLMAG
34
132
0
03 Dec 2022
Lightweight and Flexible Deep Equilibrium Learning for CSI Feedback in
  FDD Massive MIMO
Lightweight and Flexible Deep Equilibrium Learning for CSI Feedback in FDD Massive MIMO
Yifan Ma
Wentao Yu
Xianghao Yu
Jun Zhang
Shenghui Song
Khaled B. Letaief
12
2
0
28 Nov 2022
Path Independent Equilibrium Models Can Better Exploit Test-Time
  Computation
Path Independent Equilibrium Models Can Better Exploit Test-Time Computation
Cem Anil
Ashwini Pokle
Kaiqu Liang
Johannes Treutlein
Yuhuai Wu
Shaojie Bai
Zico Kolter
Roger C. Grosse
29
16
0
18 Nov 2022
Language models are good pathologists: using attention-based sequence
  reduction and text-pretrained transformers for efficient WSI classification
Language models are good pathologists: using attention-based sequence reduction and text-pretrained transformers for efficient WSI classification
Juan Pisula
Katarzyna Bozek
VLM
MedIm
33
3
0
14 Nov 2022
Empirical Evaluation of Post-Training Quantization Methods for Language
  Tasks
Empirical Evaluation of Post-Training Quantization Methods for Language Tasks
Ting Hu
Christoph Meinel
Haojin Yang
MQ
28
3
0
29 Oct 2022
Benchmarking Language Models for Code Syntax Understanding
Benchmarking Language Models for Code Syntax Understanding
Da Shen
Xinyun Chen
Chenguang Wang
Koushik Sen
Dawn Song
ELM
22
16
0
26 Oct 2022
Previous
123456...8910
Next