ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.13499
  4. Cited By
Optimal Control for Transformer Architectures: Enhancing Generalization, Robustness and Efficiency

Optimal Control for Transformer Architectures: Enhancing Generalization, Robustness and Efficiency

16 May 2025
Kelvin Kan
Xingjian Li
Benjamin J. Zhang
Tuhin Sahai
Stanley Osher
Markos A. Katsoulakis
ArXivPDFHTML

Papers citing "Optimal Control for Transformer Architectures: Enhancing Generalization, Robustness and Efficiency"

25 / 25 papers shown
Title
Emergence of meta-stable clustering in mean-field transformer models
Emergence of meta-stable clustering in mean-field transformer models
Giuseppe Bruno
Federico Pasqualotto
Andrea Agazzi
67
8
0
30 Oct 2024
A Survey of Time Series Foundation Models: Generalizing Time Series
  Representation with Large Language Model
A Survey of Time Series Foundation Models: Generalizing Time Series Representation with Large Language Model
Weiqi Zhang
Jiexia Ye
Ke Yi
Yongzi Yu
Ziyue Li
Jia Li
Fugee Tsung
AI4TS
AI4CE
67
24
0
03 May 2024
A mathematical perspective on Transformers
A mathematical perspective on Transformers
Borjan Geshkovski
Cyril Letrouit
Yury Polyanskiy
Philippe Rigollet
EDL
AI4CE
69
39
0
17 Dec 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
808
13,788
0
15 Mar 2023
Taming Hyperparameter Tuning in Continuous Normalizing Flows Using the
  JKO Scheme
Taming Hyperparameter Tuning in Continuous Normalizing Flows Using the JKO Scheme
Alexander Vidal
Samy Wu Fung
Luis Tenorio
Stanley Osher
L. Nurbekyan
63
18
0
30 Nov 2022
Vision Transformers: State of the Art and Research Challenges
Vision Transformers: State of the Art and Research Challenges
Bo-Kai Ruan
Hong-Han Shuai
Wen-Huang Cheng
ViT
39
18
0
07 Jul 2022
Transformer for Partial Differential Equations' Operator Learning
Transformer for Partial Differential Equations' Operator Learning
Zijie Li
Kazem Meidani
A. Farimani
79
155
0
26 May 2022
What Language Model Architecture and Pretraining Objective Work Best for
  Zero-Shot Generalization?
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
Thomas Wang
Adam Roberts
Daniel Hesslow
Teven Le Scao
Hyung Won Chung
Iz Beltagy
Julien Launay
Colin Raffel
79
171
0
12 Apr 2022
ODE Transformer: An Ordinary Differential Equation-Inspired Model for
  Sequence Generation
ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation
Bei Li
Quan Du
Tao Zhou
Yi Jing
Shuhan Zhou
Xin Zeng
Tong Xiao
JingBo Zhu
Xuebo Liu
Min Zhang
42
32
0
17 Mar 2022
Multivariate Quantile Function Forecaster
Multivariate Quantile Function Forecaster
Kelvin K. Kan
Franccois-Xavier Aubet
Tim Januschowski
Youngsuk Park
Konstantinos Benidis
Lars Ruthotto
Jan Gasthaus
AI4TS
55
23
0
23 Feb 2022
LaMDA: Language Models for Dialog Applications
LaMDA: Language Models for Dialog Applications
R. Thoppilan
Daniel De Freitas
Jamie Hall
Noam M. Shazeer
Apoorv Kulshreshtha
...
Blaise Aguera-Arcas
Claire Cui
M. Croak
Ed H. Chi
Quoc Le
ALM
107
1,577
0
20 Jan 2022
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
Nan Du
Yanping Huang
Andrew M. Dai
Simon Tong
Dmitry Lepikhin
...
Kun Zhang
Quoc V. Le
Yonghui Wu
Zhiwen Chen
Claire Cui
ALM
MoE
163
794
0
13 Dec 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
399
2,051
0
31 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
458
40,217
0
22 Oct 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech
  Representations
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
197
5,734
0
20 Jun 2020
Linformer: Self-Attention with Linear Complexity
Linformer: Self-Attention with Linear Complexity
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
179
1,678
0
08 Jun 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
542
41,106
0
28 May 2020
On Layer Normalization in the Transformer Architecture
On Layer Normalization in the Transformer Architecture
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
98
973
0
12 Feb 2020
Revisiting Point Cloud Classification: A New Benchmark Dataset and
  Classification Model on Real-World Data
Revisiting Point Cloud Classification: A New Benchmark Dataset and Classification Model on Real-World Data
Mikaela Angelina Uy
Quang Pham
Binh-Son Hua
D. Nguyen
Sai-Kit Yeung
3DV
3DPC
80
773
0
13 Aug 2019
Understanding and Improving Transformer From a Multi-Particle Dynamic
  System Point of View
Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View
Yiping Lu
Zhuohan Li
Di He
Zhiqing Sun
Bin Dong
Tao Qin
Liwei Wang
Tie-Yan Liu
AI4CE
57
170
0
06 Jun 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
159
3,714
0
09 Jan 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.2K
93,936
0
11 Oct 2018
Neural Ordinary Differential Equations
Neural Ordinary Differential Equations
T. Chen
Yulia Rubanova
J. Bettencourt
David Duvenaud
AI4CE
276
5,024
0
19 Jun 2018
Deep Neural Networks Motivated by Partial Differential Equations
Deep Neural Networks Motivated by Partial Differential Equations
Lars Ruthotto
E. Haber
AI4CE
79
488
0
12 Apr 2018
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
519
129,831
0
12 Jun 2017
1