Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2002.05202
Cited By
GLU Variants Improve Transformer
12 February 2020
Noam M. Shazeer
Re-assign community
ArXiv
PDF
HTML
Papers citing
"GLU Variants Improve Transformer"
50 / 648 papers shown
Title
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
Xingchen Sun
Yanfeng Chen
Yanwen Huang
Ruobing Xie
Jiaqi Zhu
...
Zhanhui Kang
Yong Yang
Yuhong Liu
Di Wang
Jie Jiang
MoE
ALM
ELM
73
26
0
04 Nov 2024
Training Compute-Optimal Protein Language Models
Xingyi Cheng
Bo Chen
Pan Li
Jing Gong
Jie Tang
Le Song
84
13
0
04 Nov 2024
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
Yuqi Luo
Chenyang Song
Xu Han
Yuxiao Chen
Chaojun Xiao
Zhiyuan Liu
Maosong Sun
49
3
0
04 Nov 2024
Context-Aware Token Selection and Packing for Enhanced Vision Transformer
Tianyi Zhang
B. Li
Jae-sun Seo
Yu Cao
38
0
0
31 Oct 2024
Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis
Théodor Lemerle
Harrison Vanderbyl
Vaibhav Srivastav
Nicolas Obin
Axel Roebel
37
1
0
30 Oct 2024
BongLLaMA: LLaMA for Bangla Language
Abdullah Khan Zehady
Safi Al Mamun
Naymul Islam
Santu Karmaker
ALM
35
1
0
28 Oct 2024
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Sangmin Bae
Adam Fisch
Hrayr Harutyunyan
Ziwei Ji
Seungyeon Kim
Tal Schuster
KELM
84
5
0
28 Oct 2024
Ripple: Accelerating LLM Inference on Smartphones with Correlation-Aware Neuron Management
Tuowei Wang
Ruwen Fan
Minxing Huang
Zixu Hao
Kun Li
Ting Cao
Youyou Lu
Yaoxue Zhang
Ju Ren
53
2
0
25 Oct 2024
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
Haocheng Xi
Han Cai
Ligeng Zhu
Yunfan LU
Kurt Keutzer
Jianfei Chen
Song Han
MQ
75
9
0
25 Oct 2024
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design
Ruisi Cai
Yeonju Ro
Geon-Woo Kim
Peihao Wang
Babak Ehteshami Bejnordi
Aditya Akella
Zihan Wang
MoE
33
4
0
24 Oct 2024
Taipan: Efficient and Expressive State Space Language Models with Selective Attention
Chien Van Nguyen
Huy Huu Nguyen
Thang M. Pham
Ruiyi Zhang
Hanieh Deilamsalehy
...
Ryan A. Rossi
Trung Bui
Viet Dac Lai
Franck Dernoncourt
Thien Huu Nguyen
Mamba
RALM
34
1
0
24 Oct 2024
Bielik 7B v0.1: A Polish Language Model -- Development, Insights, and Evaluation
Krzysztof Ociepa
Łukasz Flis
Krzysztof Wróbel
Adrian Gwoździej
Remigiusz Kinas
27
1
0
24 Oct 2024
Scaling up Masked Diffusion Models on Text
Shen Nie
Fengqi Zhu
Chao Du
Tianyu Pang
Qian Liu
Guangtao Zeng
Min-Bin Lin
Chongxuan Li
AI4CE
55
14
0
24 Oct 2024
Future Token Prediction -- Causal Language Modelling with Per-Token Semantic State Vector for Multi-Token Prediction
Nicholas Walker
29
0
0
23 Oct 2024
PLDR-LLM: Large Language Model from Power Law Decoder Representations
Burc Gokden
26
1
0
22 Oct 2024
LMHaze: Intensity-aware Image Dehazing with a Large-scale Multi-intensity Real Haze Dataset
Ruikun Zhang
Hao Yang
Yan Yang
Ying Fu
Liyuan Pan
43
3
0
21 Oct 2024
Natural GaLore: Accelerating GaLore for memory-efficient LLM Training and Fine-tuning
Arijit Das
26
1
0
21 Oct 2024
TIPS: Text-Image Pretraining with Spatial awareness
Kevis-Kokitsi Maninis
Kaifeng Chen
Soham Ghosh
Arjun Karpur
Koert Chen
...
Jan Dlabal
Dan Gnanapragasam
Mojtaba Seyedhosseini
Howard Zhou
Andre Araujo
VLM
35
3
0
21 Oct 2024
LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics
Thomas Robert
M. Safaryan
Ionut-Vlad Modoranu
Dan Alistarh
ODL
36
2
0
21 Oct 2024
Comprehensive benchmarking of large language models for RNA secondary structure prediction
L. I. Zablocki
L. A. Bugnon
M. Gerard
L. Di Persia
G. Stegmayer
D. H. Milone
AI4TS
31
3
0
21 Oct 2024
Lossless KV Cache Compression to 2%
Zhen Yang
Jizong Han
Kan Wu
Ruobing Xie
An Wang
Xingchen Sun
Zhanhui Kang
VLM
MQ
36
2
0
20 Oct 2024
CompAct: Compressed Activations for Memory-Efficient LLM Training
Yara Shamshoum
Nitzan Hodos
Yuval Sieradzki
Assaf Schuster
MQ
VLM
52
0
0
20 Oct 2024
Quanta Video Restoration
Prateek Chennuri
Yiheng Chi
Enze Jiang
G. M. Dilshan Godaliyadda
Abhiram Gnanasambandam
Hamid R. Sheikh
I. Gyongy
Stanley H. Chan
24
0
0
19 Oct 2024
FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model
ZiDong Wang
Zeyu Lu
Di Huang
Cai Zhou
Wanli Ouyang
and Lei Bai
76
3
0
17 Oct 2024
VividMed: Vision Language Model with Versatile Visual Grounding for Medicine
Lingxiao Luo
Bingda Tang
Xuanzhong Chen
Rong Han
Ting Chen
VLM
29
2
0
16 Oct 2024
Neuron-based Personality Trait Induction in Large Language Models
Jia Deng
Tianyi Tang
Yanbin Yin
Wenhao Yang
Wayne Xin Zhao
Zhicheng Dou
38
1
0
16 Oct 2024
MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
Yanyue Xie
Zhi Zhang
Ding Zhou
Cong Xie
Ziang Song
Xin Liu
Yanzhi Wang
Xue Lin
An Xu
LLMAG
40
3
0
15 Oct 2024
Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws
Yiding Jiang
Allan Zhou
Zhili Feng
Sadhika Malladi
J. Zico Kolter
41
15
0
15 Oct 2024
Survey and Evaluation of Converging Architecture in LLMs based on Footsteps of Operations
Seongho Kim
Jihyun Moon
Juntaek Oh
Insu Choi
Joon-Sung Yang
21
0
0
15 Oct 2024
SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments
Syed Abdul Gaffar Shakhadri
Kruthika KR
Rakshit Aralimatti
VLM
20
1
0
15 Oct 2024
Rethinking Graph Transformer Architecture Design for Node Classification
Jiajun Zhou
Xuanze Chen
Chenxuan Xie
Yu Shanqing
Qi Xuan
Xiaoniu Yang
28
0
0
15 Oct 2024
Transfer Learning with Foundational Models for Time Series Forecasting using Low-Rank Adaptations
M. Germán-Morales
A. J. Rivera-Rivas
M. J. del Jesus Díaz
C. J. Carmona
AI4TS
AI4CE
56
0
0
15 Oct 2024
MIND: Math Informed syNthetic Dialogues for Pretraining LLMs
Syeda Nahida Akter
Shrimai Prabhumoye
John Kamalu
S. Satheesh
Eric Nyberg
M. Patwary
M. Shoeybi
Bryan Catanzaro
LRM
SyDa
ReLM
109
1
0
15 Oct 2024
Liger Kernel: Efficient Triton Kernels for LLM Training
Pin-Lun Hsu
Yun Dai
Vignesh Kothapalli
Qingquan Song
Shao Tang
Siyu Zhu
Steven Shimizu
Shivam Sahni
Haowen Ning
Yanning Chen
53
27
0
14 Oct 2024
ControlMM: Controllable Masked Motion Generation
Ekkasit Pinyoanuntapong
Muhammad Usama Saleem
Korrawe Karunratanakul
Pu Wang
Hongfei Xue
Chong Chen
Chuan Guo
Junli Cao
J. Ren
Sergey Tulyakov
VGen
37
4
0
14 Oct 2024
Parenting: Optimizing Knowledge Selection of Retrieval-Augmented Language Models with Parameter Decoupling and Tailored Tuning
Yongxin Xu
Ruizhe Zhang
Xinke Jiang
Yujie Feng
Yuzhen Xiao
Xinyu Ma
Runchuan Zhu
Xu Chu
Junfeng Zhao
Yasha Wang
KELM
22
4
0
14 Oct 2024
Diffusion Models Need Visual Priors for Image Generation
Xiaoyu Yue
Zidong Wang
Zeyu Lu
S. Sun
Meng Wei
Wanli Ouyang
Junlin Wu
Luping Zhou
VLM
53
1
0
11 Oct 2024
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation
Jiatao Gu
Yuyang Wang
Yizhe Zhang
Qihang Zhang
Dinghuai Zhang
Navdeep Jaitly
Josh Susskind
Shuangfei Zhai
DiffM
39
13
0
10 Oct 2024
Upcycling Large Language Models into Mixture of Experts
Ethan He
Abhinav Khattar
R. Prenger
V. Korthikanti
Zijie Yan
Tong Liu
Shiqing Fan
Ashwath Aithal
M. Shoeybi
Bryan Catanzaro
MoE
45
9
0
10 Oct 2024
Pixtral 12B
Pravesh Agrawal
Szymon Antoniak
Emma Bou Hanna
Baptiste Bout
Devendra Singh Chaplot
...
Joachim Studnia
Sandeep Subramanian
Sagar Vaze
Thomas Wang
Sophia Yang
VLM
MLLM
39
49
0
09 Oct 2024
Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning
Siyuan Li
Juanxi Tian
Zedong Wang
Luyuan Zhang
Zicheng Liu
Weiyang Jin
Yang Liu
Baigui Sun
Stan Z. Li
34
0
0
08 Oct 2024
Compression via Pre-trained Transformers: A Study on Byte-Level Multimodal Data
David Heurtel-Depeiges
Anian Ruoss
Joel Veness
Tim Genewein
32
1
0
07 Oct 2024
Initialization of Large Language Models via Reparameterization to Mitigate Loss Spikes
Kosuke Nishida
Kyosuke Nishida
Kuniko Saito
36
1
0
07 Oct 2024
Differential Transformer
Tianzhu Ye
Li Dong
Yuqing Xia
Yutao Sun
Yi Zhu
Gao Huang
Furu Wei
171
0
0
07 Oct 2024
A Cross-Lingual Meta-Learning Method Based on Domain Adaptation for Speech Emotion Recognition
David-Gabriel Ion
Razvan-Alexandru Smadu
Dumitru-Clementin Cercel
Florin-Catalin Pop
Mihaela-Claudia Cercel
28
0
0
06 Oct 2024
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
62
16
0
06 Oct 2024
Continuous Approximations for Improving Quantization Aware Training of LLMs
He Li
Jianhang Hong
Yuanzhuo Wu
Snehal Adbol
Zonglin Li
MQ
29
1
0
06 Oct 2024
Exploring the Benefit of Activation Sparsity in Pre-training
Zhengyan Zhang
Chaojun Xiao
Qiujieli Qin
Yankai Lin
Zhiyuan Zeng
Xu Han
Zhiyuan Liu
Ruobing Xie
Maosong Sun
Jie Zhou
MoE
64
3
0
04 Oct 2024
Can Mamba Always Enjoy the "Free Lunch"?
Ruifeng Ren
Zhicong Li
Yong Liu
44
1
0
04 Oct 2024
ReLIC: A Recipe for 64k Steps of In-Context Reinforcement Learning for Embodied AI
Ahmad Elawady
Gunjan Chhablani
Ram Ramrakhya
Karmesh Yadav
Dhruv Batra
Z. Kira
Andrew Szot
OffRL
28
0
0
03 Oct 2024
Previous
1
2
3
4
5
...
11
12
13
Next