ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.05202
  4. Cited By
GLU Variants Improve Transformer

GLU Variants Improve Transformer

12 February 2020
Noam M. Shazeer
ArXivPDFHTML

Papers citing "GLU Variants Improve Transformer"

50 / 652 papers shown
Title
Diffusion Guided Language Modeling
Diffusion Guided Language Modeling
Justin Lovelace
Varsha Kishore
Yiwei Chen
Kilian Q. Weinberger
44
6
0
08 Aug 2024
wav2graph: A Framework for Supervised Learning Knowledge Graph from
  Speech
wav2graph: A Framework for Supervised Learning Knowledge Graph from Speech
Khai Le-Duc
Quy-Anh Dang
Tan-Hanh Pham
Truong-Son Hy
32
0
0
08 Aug 2024
EXAONE 3.0 7.8B Instruction Tuned Language Model
EXAONE 3.0 7.8B Instruction Tuned Language Model
LG AI Research
:
Soyoung An
Kyunghoon Bae
Eunbi Choi
...
Boseong Seo
Sihoon Yang
Heuiyeen Yeen
Kyungjae Yoo
Hyeongu Yun
ELM
ALM
57
10
0
07 Aug 2024
TF-Locoformer: Transformer with Local Modeling by Convolution for Speech
  Separation and Enhancement
TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement
Kohei Saijo
Gordon Wichern
François G. Germain
Zexu Pan
Jonathan Le Roux
46
7
0
06 Aug 2024
MARCO: A Memory-Augmented Reinforcement Framework for Combinatorial
  Optimization
MARCO: A Memory-Augmented Reinforcement Framework for Combinatorial Optimization
Andoni I. Garmendia
Quentin Cappart
Josu Ceberio
A. Mendiburu
40
2
0
05 Aug 2024
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
Dongyang Liu
Shitian Zhao
Le Zhuo
Weifeng Lin
Ping Luo
Xinyue Li
Qi Qin
Yu Qiao
Hongsheng Li
Peng Gao
MLLM
82
48
0
05 Aug 2024
Penzai + Treescope: A Toolkit for Interpreting, Visualizing, and Editing
  Models As Data
Penzai + Treescope: A Toolkit for Interpreting, Visualizing, and Editing Models As Data
Mingshu Li
44
3
0
01 Aug 2024
Gemma 2: Improving Open Language Models at a Practical Size
Gemma 2: Improving Open Language Models at a Practical Size
Gemma Team
Gemma Team Morgane Riviere
Shreya Pathak
Pier Giuseppe Sessa
Cassidy Hardin
...
Noah Fiedel
Armand Joulin
Kathleen Kenealy
Robert Dadashi
Alek Andreev
VLM
MoE
OSLM
42
681
0
31 Jul 2024
UniProcessor: A Text-induced Unified Low-level Image Processor
UniProcessor: A Text-induced Unified Low-level Image Processor
Huiyu Duan
Xiongkuo Min
Sijing Wu
Wei Shen
Guangtao Zhai
DiffM
47
8
0
30 Jul 2024
A federated large language model for long-term time series forecasting
A federated large language model for long-term time series forecasting
Raed Abdel Sater
A. B. Hamza
AI4TS
37
2
0
30 Jul 2024
mGTE: Generalized Long-Context Text Representation and Reranking Models
  for Multilingual Text Retrieval
mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval
Xin Zhang
Yanzhao Zhang
Dingkun Long
Wen Xie
Ziqi Dai
...
Pengjun Xie
Fei Huang
Meishan Zhang
Wenjie Li
Min Zhang
42
78
0
29 Jul 2024
Enhancing Model Performance: Another Approach to Vision-Language
  Instruction Tuning
Enhancing Model Performance: Another Approach to Vision-Language Instruction Tuning
Vedanshu
M. M. Tripathi
Bhavnesh Jaint
MLLM
VLM
45
0
0
25 Jul 2024
How Lightweight Can A Vision Transformer Be
How Lightweight Can A Vision Transformer Be
Jen Hong Tan
ViT
MoE
65
0
0
25 Jul 2024
u-$\mu$P: The Unit-Scaled Maximal Update Parametrization
u-μ\muμP: The Unit-Scaled Maximal Update Parametrization
Charlie Blake
C. Eichenberg
Josef Dean
Lukas Balles
Luke Y. Prince
Bjorn Deiseroth
Andres Felipe Cruz Salinas
Carlo Luschi
Samuel Weinbach
Douglas Orr
61
10
0
24 Jul 2024
Stretching Each Dollar: Diffusion Training from Scratch on a
  Micro-Budget
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
Vikash Sehwag
Xianghao Kong
Jingtao Li
Michael Spranger
Lingjuan Lyu
DiffM
47
9
0
22 Jul 2024
Inverted Activations
Inverted Activations
Georgii Sergeevich Novikov
Ivan Oseledets
26
0
0
22 Jul 2024
Beyond Next Token Prediction: Patch-Level Training for Large Language Models
Beyond Next Token Prediction: Patch-Level Training for Large Language Models
Chenze Shao
Fandong Meng
Jie Zhou
53
1
0
17 Jul 2024
Any-Property-Conditional Molecule Generation with Self-Criticism using
  Spanning Trees
Any-Property-Conditional Molecule Generation with Self-Criticism using Spanning Trees
Alexia Jolicoeur-Martineau
A. Baratin
Kisoo Kwon
Boris Knyazev
Yan Zhang
45
1
0
12 Jul 2024
Flash normalization: fast normalization for LLMs
Flash normalization: fast normalization for LLMs
Nils Graef
Matthew Clapp
Andrew Wasielewski
23
0
0
12 Jul 2024
RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective
  Weight-Activation Quantization
RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization
Xijie Huang
Zechun Liu
Shih-yang Liu
Kwang-Ting Cheng
MQ
43
8
0
10 Jul 2024
Toto: Time Series Optimized Transformer for Observability
Toto: Time Series Optimized Transformer for Observability
Ben Cohen
Emaad Khwaja
Kan Wang
Charles Masson
Elise Ramé
Youssef Doubli
Othmane Abou-Amal
AI4TS
43
3
0
10 Jul 2024
How Effective are State Space Models for Machine Translation?
How Effective are State Space Models for Machine Translation?
Hugo Pitorro
Pavlo Vasylenko
Marcos Vinícius Treviso
André F. T. Martins
Mamba
45
3
0
07 Jul 2024
YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer
  Architectures and Cross-dataset Stem Augmentation
YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation
Sungkyun Chang
Emmanouil Benetos
Holger Kirchhoff
Simon Dixon
39
3
0
05 Jul 2024
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Yu Sun
Xinhao Li
Karan Dalal
Jiarui Xu
Arjun Vikram
...
Xinlei Chen
Xiaolong Wang
Sanmi Koyejo
Tatsunori Hashimoto
Carlos Guestrin
71
93
0
05 Jul 2024
Mixture of A Million Experts
Mixture of A Million Experts
Xu Owen He
MoE
46
26
0
04 Jul 2024
The Mysterious Case of Neuron 1512: Injectable Realignment Architectures
  Reveal Internal Characteristics of Meta's Llama 2 Model
The Mysterious Case of Neuron 1512: Injectable Realignment Architectures Reveal Internal Characteristics of Meta's Llama 2 Model
Brenden Smith
Dallin Baker
Clayton Chase
Myles Barney
Kaden Parker
Makenna Allred
Peter Hu
Alex Evans
Nancy Fulda
40
0
0
04 Jul 2024
Enhancing Translation Accuracy of Large Language Models through
  Continual Pre-Training on Parallel Data
Enhancing Translation Accuracy of Large Language Models through Continual Pre-Training on Parallel Data
Minato Kondo
T. Utsuro
Masaaki Nagata
CLL
44
4
0
03 Jul 2024
52B to 1T: Lessons Learned via Tele-FLM Series
52B to 1T: Lessons Learned via Tele-FLM Series
Xiang Li
Yiqun Yao
Xin Jiang
Xuezhi Fang
Chao Wang
...
Yequan Wang
Zhongjiang He
Zhongyuan Wang
Xuelong Li
Tiejun Huang
ALM
LRM
49
3
0
03 Jul 2024
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models:
  Enhancing Performance and Reducing Inference Costs
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
Enshu Liu
Junyi Zhu
Zinan Lin
Xuefei Ning
Matthew B. Blaschko
Shengen Yan
Guohao Dai
Huazhong Yang
Yu Wang
MoE
62
5
0
01 Jul 2024
YuLan: An Open-source Large Language Model
YuLan: An Open-source Large Language Model
Yutao Zhu
Kun Zhou
Kelong Mao
Wentong Chen
Yiding Sun
...
Wenbing Huang
Ze-Feng Gao
Yueguo Chen
Weizheng Lu
Ji-Rong Wen
ALM
ELM
44
1
0
28 Jun 2024
Resolving Discrepancies in Compute-Optimal Scaling of Language Models
Resolving Discrepancies in Compute-Optimal Scaling of Language Models
Tomer Porian
Mitchell Wortsman
J. Jitsev
Ludwig Schmidt
Y. Carmon
63
21
0
27 Jun 2024
Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse
  Gradients
Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients
Aashiq Muhamed
Oscar Li
David Woodruff
Mona Diab
Virginia Smith
61
8
0
25 Jun 2024
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual
  Pre-training
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training
Tong Zhu
Xiaoye Qu
Daize Dong
Jiacheng Ruan
Jingqi Tong
Conghui He
Yu Cheng
MoE
ALM
54
72
0
24 Jun 2024
OTCE: Hybrid SSM and Attention with Cross Domain Mixture of Experts to
  construct Observer-Thinker-Conceiver-Expresser
OTCE: Hybrid SSM and Attention with Cross Domain Mixture of Experts to construct Observer-Thinker-Conceiver-Expresser
Jingze Shi
Ting Xie
Bingheng Wu
Chunjun Zheng
Kai Wang
27
2
0
24 Jun 2024
Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing
  Backpropagation
Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation
Yuchen Yang
Yingdong Shi
Cheems Wang
Xiantong Zhen
Yuxuan Shi
Jun Xu
40
1
0
24 Jun 2024
Unsupervised Extraction of Dialogue Policies from Conversations
Unsupervised Extraction of Dialogue Policies from Conversations
Makesh Narsimhan Sreedhar
Traian Rebedea
Christopher Parisien
OffRL
20
2
0
21 Jun 2024
RouteFinder: Towards Foundation Models for Vehicle Routing Problems
RouteFinder: Towards Foundation Models for Vehicle Routing Problems
Federico Berto
Chuanbo Hua
Nayeli Gast Zepeda
André Hottung
N. Wouda
Leon Lan
Kevin Tierney
J. Park
Jinkyoo Park
61
10
0
21 Jun 2024
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All
  Tools
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
Team GLM
:
Aohan Zeng
Bin Xu
Bowen Wang
...
Zhaoyu Wang
Zhen Yang
Zhengxiao Du
Zhenyu Hou
Zihan Wang
ALM
79
515
0
18 Jun 2024
MCSD: An Efficient Language Model with Diverse Fusion
MCSD: An Efficient Language Model with Diverse Fusion
Hua Yang
Duohai Li
Shiman Li
35
2
0
18 Jun 2024
LiLiuM: eBay's Large Language Models for e-commerce
LiLiuM: eBay's Large Language Models for e-commerce
Christian Herold
Michael Kozielski
Leonid Ekimov
Pavel Petrushkov
P. Vandenbussche
Shahram Khadivi
43
1
0
17 Jun 2024
Optimized Speculative Sampling for GPU Hardware Accelerators
Optimized Speculative Sampling for GPU Hardware Accelerators
Dominik Wagner
Seanie Lee
Ilja Baumann
Philipp Seeberger
Korbinian Riedhammer
Tobias Bocklet
48
3
0
16 Jun 2024
H-Fac: Memory-Efficient Optimization with Factorized Hamiltonian Descent
H-Fac: Memory-Efficient Optimization with Factorized Hamiltonian Descent
Son Nguyen
Lizhang Chen
Bo Liu
Qiang Liu
30
3
0
14 Jun 2024
GEB-1.3B: Open Lightweight Large Language Model
GEB-1.3B: Open Lightweight Large Language Model
Jie Wu
Yufeng Zhu
Lei Shen
Xuqing Lu
ALM
37
0
0
14 Jun 2024
Alleviating Distortion in Image Generation via Multi-Resolution
  Diffusion Models
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models
Qihao Liu
Zhanpeng Zeng
Ju He
Qihang Yu
Xiaohui Shen
Liang-Chieh Chen
53
21
0
13 Jun 2024
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities
Roman Bachmann
Oğuzhan Fatih Kar
David Mizrahi
Ali Garjani
Mingfei Gao
David Griffiths
Jiaming Hu
Afshin Dehghan
Amir Zamir
MoE
VLM
MLLM
49
14
0
13 Jun 2024
An Empirical Study of Mamba-based Language Models
An Empirical Study of Mamba-based Language Models
R. Waleffe
Wonmin Byeon
Duncan Riach
Brandon Norick
V. Korthikanti
...
Vartika Singh
Jared Casper
Jan Kautz
M. Shoeybi
Bryan Catanzaro
63
66
0
12 Jun 2024
Fetch-A-Set: A Large-Scale OCR-Free Benchmark for Historical Document
  Retrieval
Fetch-A-Set: A Large-Scale OCR-Free Benchmark for Historical Document Retrieval
Adrià Molina
O. R. Terrades
Josep Lladós
48
0
0
11 Jun 2024
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Liliang Ren
Yang Liu
Yadong Lu
Yelong Shen
Chen Liang
Weizhu Chen
Mamba
77
57
0
11 Jun 2024
Autoregressive Model Beats Diffusion: Llama for Scalable Image
  Generation
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Peize Sun
Yi Jiang
Shoufa Chen
Shilong Zhang
Bingyue Peng
Ping Luo
Zehuan Yuan
VLM
68
230
0
10 Jun 2024
PowerInfer-2: Fast Large Language Model Inference on a Smartphone
PowerInfer-2: Fast Large Language Model Inference on a Smartphone
Zhenliang Xue
Yixin Song
Zeyu Mi
Le Chen
Yubin Xia
Haibo Chen
59
42
0
10 Jun 2024
Previous
123...567...121314
Next