Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2002.05202
Cited By
GLU Variants Improve Transformer
12 February 2020
Noam M. Shazeer
Re-assign community
ArXiv
PDF
HTML
Papers citing
"GLU Variants Improve Transformer"
50 / 652 papers shown
Title
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters
Yixin Song
Haotong Xie
Zhengyan Zhang
Bo Wen
Li Ma
Zeyu Mi
Haibo Chen
MoE
48
22
0
10 Jun 2024
Attention as a Hypernetwork
Simon Schug
Seijin Kobayashi
Yassir Akram
João Sacramento
Razvan Pascanu
GNN
37
3
0
09 Jun 2024
Accelerating evolutionary exploration through language model-based transfer learning
M. Reissmann
Yuan Fang
Andrew S. H. Ooi
R. D. Sandberg
47
2
0
07 Jun 2024
Phy-Diff: Physics-guided Hourglass Diffusion Model for Diffusion MRI Synthesis
Juanhua Zhang
Ruodan Yan
Alessandro Perelli
Xi Chen
Chao Li
MedIm
DiffM
58
5
0
05 Jun 2024
Xmodel-LM Technical Report
Yichuan Wang
Yang Liu
Yu Yan
Qun Wang
Xucheng Huang
Ling Jiang
OSLM
ALM
35
1
0
05 Jun 2024
Scalable MatMul-free Language Modeling
Rui-Jie Zhu
Yu Zhang
Ethan Sifferman
Tyler Sheaves
Yiqiao Wang
Dustin Richmond
P. Zhou
Jason Eshraghian
33
17
0
04 Jun 2024
Decoupled Alignment for Robust Plug-and-Play Adaptation
Haozheng Luo
Jiahao Yu
Wenxin Zhang
Jialong Li
Jerry Yao-Chieh Hu
Xingyu Xing
Han Liu
56
11
0
03 Jun 2024
LOLA: LLM-Assisted Online Learning Algorithm for Content Experiments
Zikun Ye
Hema Yoganarasimhan
Yufeng Zheng
52
2
0
03 Jun 2024
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models
Tianwen Wei
Bo Zhu
Liang Zhao
Cheng Cheng
Biye Li
...
Yutuan Ma
Rui Hu
Shuicheng Yan
Han Fang
Yahui Zhou
MoE
54
24
0
03 Jun 2024
AudioLCM: Text-to-Audio Generation with Latent Consistency Models
Huadai Liu
Rongjie Huang
Yang Liu
Hengyuan Cao
Jialei Wang
Xize Cheng
Siqi Zheng
Zhou Zhao
70
8
0
01 Jun 2024
You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet
Zhen Qin
Yuxin Mao
Xuyang Shen
Dong Li
Jing Zhang
Yuchao Dai
Yiran Zhong
58
1
0
31 May 2024
Improving Generalization and Convergence by Enhancing Implicit Regularization
Mingze Wang
Haotian He
Jinbo Wang
Zilin Wang
Guanhua Huang
Bo Tang
Zhiyu Li
E. Weinan
Lei Wu
45
7
0
31 May 2024
TAIA: Large Language Models are Out-of-Distribution Data Learners
Shuyang Jiang
Yusheng Liao
Ya Zhang
Yu Wang
Yanfeng Wang
29
3
0
30 May 2024
Would I Lie To You? Inference Time Alignment of Language Models using Direct Preference Heads
Avelina Asada Hadji-Kyriacou
Ognjen Arandjelović
28
1
0
30 May 2024
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series
Ge Zhang
Scott Qu
Jiaheng Liu
Chenchen Zhang
Chenghua Lin
...
Zi-Kai Zhao
Jiajun Zhang
Wanli Ouyang
Wenhao Huang
Wenhu Chen
ELM
43
44
0
29 May 2024
ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning
Ruchika Chavhan
Da Li
Timothy M. Hospedales
44
15
0
29 May 2024
Transformers as Neural Operators for Solutions of Differential Equations with Finite Regularity
Benjamin Shih
Ahmad Peyvan
Zhongqiang Zhang
George Karniadakis
AI4CE
51
11
0
29 May 2024
Enhancing Vision-Language Model with Unmasked Token Alignment
Jihao Liu
Jinliang Zheng
Boxiao Liu
Yu Liu
Hongsheng Li
CLIP
32
0
0
29 May 2024
Understanding Transformer Reasoning Capabilities via Graph Algorithms
Clayton Sanford
Bahare Fatemi
Ethan Hall
Anton Tsitsulin
Seyed Mehran Kazemi
Jonathan J. Halcrow
Bryan Perozzi
Vahab Mirrokni
46
31
0
28 May 2024
ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention
Bencheng Liao
Xinggang Wang
Lianghui Zhu
Qian Zhang
Chang Huang
57
4
0
28 May 2024
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
Alexander Hägele
Elie Bakouch
Atli Kosson
Loubna Ben Allal
Leandro von Werra
Martin Jaggi
40
35
0
28 May 2024
2BP: 2-Stage Backpropagation
Christopher Rae
Joseph K. L. Lee
James Richings
MoE
MQ
47
0
0
28 May 2024
Transformers Can Do Arithmetic with the Right Embeddings
Sean McLeish
Arpit Bansal
Alex Stein
Neel Jain
John Kirchenbauer
...
B. Kailkhura
A. Bhatele
Jonas Geiping
Avi Schwarzschild
Tom Goldstein
53
30
0
27 May 2024
The Expressive Capacity of State Space Models: A Formal Language Perspective
Yash Sarrof
Yana Veitsman
Michael Hahn
Mamba
38
8
0
27 May 2024
Are Self-Attentions Effective for Time Series Forecasting?
Dongbin Kim
Jinseong Park
Jaewook Lee
Hoki Kim
AI4TS
45
4
0
27 May 2024
Disentangling and Integrating Relational and Sensory Information in Transformer Architectures
Awni Altabaa
John Lafferty
37
3
0
26 May 2024
Expanded Gating Ranges Improve Activation Functions
Allen Hao Huang
AI4CE
29
1
0
25 May 2024
Activator: GLU Activation Function as the Core Component of a Vision Transformer
Abdullah Nazhat Abdullah
Tarkan Aydin
ViT
43
0
0
24 May 2024
Sparse maximal update parameterization: A holistic approach to sparse training dynamics
Nolan Dey
Shane Bergsma
Joel Hestness
38
5
0
24 May 2024
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Jialong Wu
Shaofeng Yin
Ningya Feng
Xu He
Dong Li
Haifeng Zhang
Mingsheng Long
VGen
49
26
0
24 May 2024
Emergence of a High-Dimensional Abstraction Phase in Language Transformers
Emily Cheng
Diego Doimo
Corentin Kervadec
Iuri Macocco
Jade Yu
Alessandro Laio
Marco Baroni
112
11
0
24 May 2024
Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training
Xianzhi Du
Tom Gunter
Xiang Kong
Mark Lee
Zirui Wang
Aonan Zhang
Nan Du
Ruoming Pang
MoE
25
0
0
23 May 2024
Aya 23: Open Weight Releases to Further Multilingual Progress
Viraat Aryabumi
John Dang
Dwarak Talupuru
Saurabh Dash
David Cairuz
...
Aidan Gomez
Phil Blunsom
Marzieh Fadaee
Ahmet Üstün
Sara Hooker
OSLM
60
76
0
23 May 2024
Neural Pfaffians: Solving Many Many-Electron Schrödinger Equations
Nicholas Gao
Stephan Günnemann
33
4
0
23 May 2024
Mitigating Quantization Errors Due to Activation Spikes in GLU-Based LLMs
Jaewoo Yang
Hayun Kim
Younghoon Kim
47
12
0
23 May 2024
Super Tiny Language Models
Dylan Hillier
Leon Guertler
Cheston Tan
Palaash Agrawal
Ruirui Chen
Bobby Cheng
58
4
0
23 May 2024
360Zhinao Technical Report
360Zhinao Team
40
0
0
22 May 2024
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Ting Jiang
Shaohan Huang
Shengyue Luo
Zihan Zhang
Haizhen Huang
...
Weiwei Deng
Feng Sun
Qi Zhang
Deqing Wang
Fuzhen Zhuang
37
35
0
20 May 2024
PRISM: A Multi-Modal Generative Foundation Model for Slide-Level Histopathology
George Shaikovski
Adam Casson
Kristen Severson
Eric Zimmermann
Yi Kan Wang
...
Peter Hamilton
William A. Moye
Eugene Vorontsov
Siqi Liu
Thomas J. Fuchs
MedIm
40
25
0
16 May 2024
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Chameleon Team
MLLM
62
265
0
16 May 2024
LoRA Learns Less and Forgets Less
D. Biderman
Jose Javier Gonzalez Ortiz
Jacob P. Portes
Mansheej Paul
Philip Greengard
...
Sam Havens
Vitaliy Chiley
Jonathan Frankle
Cody Blakeney
John P. Cunningham
CLL
43
114
0
15 May 2024
Improving Transformers with Dynamically Composable Multi-Head Attention
Da Xiao
Qingye Meng
Shengping Li
Xingyuan Yuan
34
3
0
14 May 2024
CANAL -- Cyber Activity News Alerting Language Model: Empirical Approach vs. Expensive LLM
Urjitkumar Patel
Fang-Chun Yeh
Chinmay Gondhalekar
29
3
0
10 May 2024
OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning
Dan Qiao
Yi Su
Pinzheng Wang
Jing Ye
Wen Xie
...
Wenliang Chen
Guohong Fu
Guodong Zhou
Qiaoming Zhu
Min Zhang
MQ
40
0
0
09 May 2024
You Only Cache Once: Decoder-Decoder Architectures for Language Models
Yutao Sun
Li Dong
Yi Zhu
Shaohan Huang
Wenhui Wang
Shuming Ma
Quanlu Zhang
Jianyong Wang
Furu Wei
VLM
38
54
0
08 May 2024
EVA-X: A Foundation Model for General Chest X-ray Analysis with Self-supervised Learning
Jingfeng Yao
Xinggang Wang
Yuehao Song
Huangxuan Zhao
Jun Ma
Yajie Chen
Wenyu Liu
Bo Wang
ViT
42
5
0
08 May 2024
ChuXin: 1.6B Technical Report
Xiaomin Zhuang
Yufan Jiang
Qiaozhi He
Zhihua Wu
ALM
43
0
0
08 May 2024
Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Mayank Mishra
Matt Stallone
Gaoyuan Zhang
Songlin Yang
Aditya Prasad
...
Amith Singhee
Nirmit Desai
David D. Cox
Ruchir Puri
Yikang Shen
AI4TS
63
58
0
07 May 2024
Learning Linear Block Error Correction Codes
Yoni Choukroun
Lior Wolf
31
6
0
07 May 2024
Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training
Zexuan Zhong
Mengzhou Xia
Danqi Chen
Mike Lewis
MoE
57
15
0
06 May 2024
Previous
1
2
3
...
6
7
8
...
12
13
14
Next