Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2109.15142
Cited By
Redesigning the Transformer Architecture with Insights from Multi-particle Dynamical Systems
30 September 2021
Subhabrata Dutta
Tanya Gautam
Soumen Chakrabarti
Tanmoy Chakraborty
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Redesigning the Transformer Architecture with Insights from Multi-particle Dynamical Systems"
11 / 11 papers shown
Title
Neural ODE Transformers: Analyzing Internal Dynamics and Adaptive Fine-tuning
Anh Tong
Thanh Nguyen-Tang
Dongeun Lee
Duc Nguyen
Toan M. Tran
David Hall
Cheongwoong Kang
Jaesik Choi
74
1
0
03 Mar 2025
Rethinking Attention with Performers
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
...
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
113
1,548
0
30 Sep 2020
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
446
2,051
0
28 Jul 2020
Linformer: Self-Attention with Linear Complexity
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
141
1,678
0
08 Jun 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
VLM
65
1,230
0
25 Feb 2020
Axial Attention in Multidimensional Transformers
Jonathan Ho
Nal Kalchbrenner
Dirk Weissenborn
Tim Salimans
54
525
0
20 Dec 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
80
7,386
0
02 Oct 2019
Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View
Yiping Lu
Zhuohan Li
Di He
Zhiqing Sun
Bin Dong
Tao Qin
Liwei Wang
Tie-Yan Liu
AI4CE
35
170
0
06 Jun 2019
Are Sixteen Heads Really Better than One?
Paul Michel
Omer Levy
Graham Neubig
MoE
43
1,049
0
25 May 2019
Neural Ordinary Differential Equations
T. Chen
Yulia Rubanova
J. Bettencourt
David Duvenaud
AI4CE
179
5,024
0
19 Jun 2018
Deep Neural Networks Motivated by Partial Differential Equations
Lars Ruthotto
E. Haber
AI4CE
53
488
0
12 Apr 2018
1