Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1901.11117
Cited By
The Evolved Transformer
30 January 2019
David R. So
Chen Liang
Quoc V. Le
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Evolved Transformer"
50 / 111 papers shown
Title
The Cake that is Intelligence and Who Gets to Bake it: An AI Analogy and its Implications for Participation
Martin Mundt
Anaelia Ovalle
Felix Friedrich
A Pranav
Subarnaduti Paul
Manuel Brack
Kristian Kersting
William Agnew
368
0
0
05 Feb 2025
Evolutionary Optimization of Model Merging Recipes
Takuya Akiba
Makoto Shing
Yujin Tang
Qi Sun
David Ha
MoMe
116
101
0
28 Jan 2025
An Evolved Universal Transformer Memory
Edoardo Cetin
Qi Sun
Tianyu Zhao
Yujin Tang
218
0
0
17 Oct 2024
AgentSquare: Automatic LLM Agent Search in Modular Design Space
Yu Shang
Yu Li
Keyu Zhao
Likai Ma
Jiaheng Liu
Fengli Xu
Yong Li
LLMAG
50
10
0
08 Oct 2024
SFTformer: A Spatial-Frequency-Temporal Correlation-Decoupling Transformer for Radar Echo Extrapolation
Liangyu Xu
Wanxuan Lu
Hongfeng Yu
Fanglong Yao
Xian Sun
Kun Fu
50
5
0
28 Feb 2024
DistDNAS: Search Efficient Feature Interactions within 2 Hours
Tunhou Zhang
W. Wen
Igor Fedorov
Xi Liu
Buyun Zhang
...
Wen-Yen Chen
Yiping Han
Feng Yan
Hai Helen Li
Yiran Chen
21
1
0
01 Nov 2023
Evolutionary Neural Architecture Search for Transformer in Knowledge Tracing
Shangshang Yang
Xiaoshan Yu
Ye Tian
Xueming Yan
Haiping Ma
Xingyi Zhang
ViT
KELM
AI4Ed
24
2
0
02 Oct 2023
Efficiency is Not Enough: A Critical Perspective of Environmentally Sustainable AI
Dustin Wright
Christian Igel
Gabrielle Samuel
Raghavendra Selvan
37
15
0
05 Sep 2023
Text Analysis Using Deep Neural Networks in Digital Humanities and Information Science
Omri Suissa
Avshalom Elmalech
M. Zhitomirsky-Geffet
AI4CE
9
45
0
30 Jul 2023
Layer-wise Representation Fusion for Compositional Generalization
Yafang Zheng
Lei Lin
Shantao Liu
Binling Wang
Zhaohong Lai
Wenhao Rao
Biao Fu
Yidong Chen
Xiaodon Shi
AI4CE
48
2
0
20 Jul 2023
Learning to Compose Representations of Different Encoder Layers towards Improving Compositional Generalization
Lei Lin
Shuangtao Li
Yafang Zheng
Biao Fu
Shantao Liu
Yidong Chen
Xiaodon Shi
CoGe
27
3
0
20 May 2023
Neural Architecture Search for Effective Teacher-Student Knowledge Transfer in Language Models
Aashka Trivedi
Takuma Udagawa
Michele Merler
Yikang Shen
Yousef El-Kurdi
Bishwaranjan Bhattacharjee
35
7
0
16 Mar 2023
Gradient-Free Structured Pruning with Unlabeled Data
Azade Nova
H. Dai
Dale Schuurmans
SyDa
40
20
0
07 Mar 2023
AccelTran: A Sparsity-Aware Accelerator for Dynamic Inference with Transformers
Shikhar Tuli
N. Jha
36
32
0
28 Feb 2023
Full Stack Optimization of Transformer Inference: a Survey
Sehoon Kim
Coleman Hooper
Thanakul Wattanawong
Minwoo Kang
Ruohan Yan
...
Qijing Huang
Kurt Keutzer
Michael W. Mahoney
Y. Shao
A. Gholami
MQ
36
102
0
27 Feb 2023
Symbolic Discovery of Optimization Algorithms
Xiangning Chen
Chen Liang
Da Huang
Esteban Real
Kaiyuan Wang
...
Xuanyi Dong
Thang Luong
Cho-Jui Hsieh
Yifeng Lu
Quoc V. Le
67
353
0
13 Feb 2023
A Survey on Efficient Training of Transformers
Bohan Zhuang
Jing Liu
Zizheng Pan
Haoyu He
Yuetian Weng
Chunhua Shen
31
47
0
02 Feb 2023
Adaptive Neural Networks Using Residual Fitting
N. Ford
J. Winder
Josh Mcclellan
27
0
0
13 Jan 2023
Convolution-enhanced Evolving Attention Networks
Yujing Wang
Yaming Yang
Zhuowan Li
Jiangang Bai
Mingliang Zhang
Xiangtai Li
Jiahao Yu
Ce Zhang
Gao Huang
Yu Tong
ViT
27
6
0
16 Dec 2022
HeatViT: Hardware-Efficient Adaptive Token Pruning for Vision Transformers
Peiyan Dong
Mengshu Sun
Alec Lu
Yanyue Xie
Li-Yu Daisy Liu
...
Xin Meng
ZeLin Li
Xue Lin
Zhenman Fang
Yanzhi Wang
ViT
36
62
0
15 Nov 2022
Language models are good pathologists: using attention-based sequence reduction and text-pretrained transformers for efficient WSI classification
Juan Pisula
Katarzyna Bozek
VLM
MedIm
36
3
0
14 Nov 2022
Search to Pass Messages for Temporal Knowledge Graph Completion
Zhen Wang
Haotong Du
Quanming Yao
Xuelong Li
26
11
0
30 Oct 2022
Categorizing Semantic Representations for Neural Machine Translation
Yongjing Yin
Yafu Li
Fandong Meng
Jie Zhou
Yue Zhang
24
6
0
13 Oct 2022
LidarNAS: Unifying and Searching Neural Architectures for 3D Point Clouds
Chenxi Liu
Zhaoqi Leng
Peigen Sun
Shuyang Cheng
C. Qi
Yin Zhou
Mingxing Tan
Drago Anguelov
3DPC
3DV
37
5
0
10 Oct 2022
E-Branchformer: Branchformer with Enhanced merging for speech recognition
Kwangyoun Kim
Felix Wu
Yifan Peng
Jing Pan
Prashant Sridhar
Kyu Jeong Han
Shinji Watanabe
61
105
0
30 Sep 2022
Searching a High-Performance Feature Extractor for Text Recognition Network
Hui Zhang
Quanming Yao
James T. Kwok
X. Bai
30
7
0
27 Sep 2022
Design Automation for Fast, Lightweight, and Effective Deep Learning Models: A Survey
Dalin Zhang
Kaixuan Chen
Yan Zhao
B. Yang
Li-Ping Yao
Christian S. Jensen
48
3
0
22 Aug 2022
Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization
T. Nguyen
Richard G. Baraniuk
Robert M. Kirby
Stanley J. Osher
Bao Wang
34
9
0
01 Aug 2022
Neural Architecture Search on Efficient Transformers and Beyond
Zexiang Liu
Dong Li
Kaiyue Lu
Zhen Qin
Weixuan Sun
Jiacheng Xu
Yiran Zhong
35
19
0
28 Jul 2022
Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?
Yi Tay
Mostafa Dehghani
Samira Abnar
Hyung Won Chung
W. Fedus
J. Rao
Sharan Narang
Vinh Q. Tran
Dani Yogatama
Donald Metzler
AI4CE
34
100
0
21 Jul 2022
Born for Auto-Tagging: Faster and better with new objective functions
Chiung-ju Liu
Huang-Ting Shieh
27
1
0
15 Jun 2022
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Derya Soydaner
3DV
44
150
0
27 Apr 2022
SepViT: Separable Vision Transformer
Wei Li
Xing Wang
Xin Xia
Jie Wu
Jiashi Li
Xuefeng Xiao
Min Zheng
Shiping Wen
ViT
26
40
0
29 Mar 2022
Token Dropping for Efficient BERT Pretraining
Le Hou
Richard Yuanzhe Pang
Dinesh Manocha
Yuexin Wu
Xinying Song
Xiaodan Song
Denny Zhou
22
43
0
24 Mar 2022
Training-free Transformer Architecture Search
Qinqin Zhou
Kekai Sheng
Xiawu Zheng
Ke Li
Xing Sun
Yonghong Tian
Jie Chen
Rongrong Ji
ViT
45
46
0
23 Mar 2022
AutoDistil: Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models
Dongkuan Xu
Subhabrata Mukherjee
Xiaodong Liu
Debadeepta Dey
Wenhui Wang
Xiang Zhang
Ahmed Hassan Awadallah
Jianfeng Gao
30
4
0
29 Jan 2022
A Literature Survey of Recent Advances in Chatbots
Guendalina Caldarini
Sardar F. Jaf
K. McGarry
AI4CE
37
274
0
17 Jan 2022
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
24
103
0
16 Jan 2022
Automated Deep Learning: Neural Architecture Search Is Not the End
Xuanyi Dong
D. Kedziora
Katarzyna Musial
Bogdan Gabrys
29
26
0
16 Dec 2021
Transformer-based Korean Pretrained Language Models: A Survey on Three Years of Progress
Kichang Yang
KELM
VLM
29
11
0
25 Nov 2021
Grounded Graph Decoding Improves Compositional Generalization in Question Answering
Yu Gai
Paras Jain
Wendi Zhang
Joseph E. Gonzalez
D. Song
Ion Stoica
BDL
OOD
29
8
0
05 Nov 2021
SQALER: Scaling Question Answering by Decoupling Multi-Hop and Logical Reasoning
Mattia Atzeni
Jasmina Bogojeska
Andreas Loukas
ReLM
LRM
27
15
0
27 Oct 2021
ProxyBO: Accelerating Neural Architecture Search via Bayesian Optimization with Zero-cost Proxies
Yu Shen
Yang Li
Jian Zheng
Wentao Zhang
Peng Yao
Jixiang Li
Sen Yang
Ji Liu
Cui Bin
AI4CE
42
30
0
20 Oct 2021
NAS-HPO-Bench-II: A Benchmark Dataset on Joint Optimization of Convolutional Neural Network Architecture and Training Hyperparameters
Yoichi Hirose
Nozomu Yoshinari
Shinichi Shirakawa
25
13
0
19 Oct 2021
Accelerating Framework of Transformer by Hardware Design and Model Compression Co-Optimization
Panjie Qi
E. Sha
Qingfeng Zhuge
Hongwu Peng
Shaoyi Huang
Zhenglun Kong
Yuhong Song
Bingbing Li
11
50
0
19 Oct 2021
Taming Sparsely Activated Transformer with Stochastic Experts
Simiao Zuo
Xiaodong Liu
Jian Jiao
Young Jin Kim
Hany Hassan
Ruofei Zhang
T. Zhao
Jianfeng Gao
MoE
44
109
0
08 Oct 2021
An Analysis of Super-Net Heuristics in Weight-Sharing NAS
Kaicheng Yu
René Ranftl
Mathieu Salzmann
42
6
0
04 Oct 2021
Towards Efficient Post-training Quantization of Pre-trained Language Models
Haoli Bai
Lu Hou
Lifeng Shang
Xin Jiang
Irwin King
M. Lyu
MQ
82
47
0
30 Sep 2021
Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
Yi Tay
Mostafa Dehghani
J. Rao
W. Fedus
Samira Abnar
Hyung Won Chung
Sharan Narang
Dani Yogatama
Ashish Vaswani
Donald Metzler
206
111
0
22 Sep 2021
Primer: Searching for Efficient Transformers for Language Modeling
David R. So
Wojciech Mañke
Hanxiao Liu
Zihang Dai
Noam M. Shazeer
Quoc V. Le
VLM
91
153
0
17 Sep 2021
1
2
3
Next