ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.05202
  4. Cited By
GLU Variants Improve Transformer

GLU Variants Improve Transformer

12 February 2020
Noam M. Shazeer
ArXivPDFHTML

Papers citing "GLU Variants Improve Transformer"

50 / 650 papers shown
Title
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Shuming Ma
Hongyu Wang
Lingxiao Ma
Lei Wang
Wenhui Wang
Shaohan Huang
Lifeng Dong
Ruiping Wang
Jilong Xue
Furu Wei
MQ
45
207
0
27 Feb 2024
Language-Specific Neurons: The Key to Multilingual Capabilities in Large
  Language Models
Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models
Tianyi Tang
Wenyang Luo
Haoyang Huang
Dongdong Zhang
Xiaolei Wang
Xin Zhao
Furu Wei
Ji-Rong Wen
64
46
0
26 Feb 2024
MambaIR: A Simple Baseline for Image Restoration with State-Space Model
MambaIR: A Simple Baseline for Image Restoration with State-Space Model
Hang Guo
Jinmin Li
Tao Dai
Zhihao Ouyang
Xudong Ren
Shu-Tao Xia
Mamba
58
202
0
23 Feb 2024
Position: Categorical Deep Learning is an Algebraic Theory of All
  Architectures
Position: Categorical Deep Learning is an Algebraic Theory of All Architectures
Bruno Gavranovic
Paul Lessard
Andrew Dudzik
Tamara von Glehn
J. G. Araújo
Petar Velickovic
40
8
0
23 Feb 2024
Understanding and Patching Compositional Reasoning in LLMs
Understanding and Patching Compositional Reasoning in LLMs
Zhaoyi Li
Gangwei Jiang
Hong Xie
Linqi Song
Defu Lian
Ying Wei
LRM
58
22
0
22 Feb 2024
Improving Language Understanding from Screenshots
Improving Language Understanding from Screenshots
Tianyu Gao
Zirui Wang
Adithya Bhaskar
Danqi Chen
VLM
43
10
0
21 Feb 2024
ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity
  within Large Language Models
ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models
Chenyang Song
Xu Han
Zhengyan Zhang
Shengding Hu
Xiyu Shi
...
Chen Chen
Zhiyuan Liu
Guanglin Li
Tao Yang
Maosong Sun
53
24
0
21 Feb 2024
Transformer tricks: Precomputing the first layer
Transformer tricks: Precomputing the first layer
Nils Graef
MoE
32
4
0
20 Feb 2024
Empirical Study on Updating Key-Value Memories in Transformer
  Feed-forward Layers
Empirical Study on Updating Key-Value Memories in Transformer Feed-forward Layers
Zihan Qiu
Zeyu Huang
Youcheng Huang
Jie Fu
KELM
30
5
0
19 Feb 2024
Can Large Multimodal Models Uncover Deep Semantics Behind Images?
Can Large Multimodal Models Uncover Deep Semantics Behind Images?
Yixin Yang
Zheng Li
Qingxiu Dong
Heming Xia
Zhifang Sui
VLM
35
9
0
17 Feb 2024
PointMamba: A Simple State Space Model for Point Cloud Analysis
PointMamba: A Simple State Space Model for Point Cloud Analysis
Dingkang Liang
Xin Zhou
Wei Xu
Xingkui Zhu
Zhikang Zou
Xiaoqing Ye
Xinyu Wang
Xiang Bai
89
93
0
16 Feb 2024
Towards Privacy-Aware Sign Language Translation at Scale
Towards Privacy-Aware Sign Language Translation at Scale
Phillip Rust
Bowen Shi
Skyler Wang
Necati Cihan Camgöz
Jean Maillard
SLR
47
14
0
14 Feb 2024
Transformers Can Achieve Length Generalization But Not Robustly
Transformers Can Achieve Length Generalization But Not Robustly
Yongchao Zhou
Uri Alon
Xinyun Chen
Xuezhi Wang
Rishabh Agarwal
Denny Zhou
52
36
0
14 Feb 2024
Spectral Filters, Dark Signals, and Attention Sinks
Spectral Filters, Dark Signals, and Attention Sinks
Nicola Cancedda
64
16
0
14 Feb 2024
PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
Michael Dorkenwald
Nimrod Barazani
Cees G. M. Snoek
Yuki M. Asano
VLM
MLLM
27
12
0
13 Feb 2024
Learn To be Efficient: Build Structured Sparsity in Large Language
  Models
Learn To be Efficient: Build Structured Sparsity in Large Language Models
Haizhong Zheng
Xiaoyan Bai
Xueshen Liu
Z. Morley Mao
Beidi Chen
Fan Lai
Atul Prakash
51
11
0
09 Feb 2024
Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation
Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation
Ziyang Wang
Jian-Qing Zheng
Yichi Zhang
Ge Cui
Lei Li
Mamba
38
125
0
07 Feb 2024
ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse
  LLMs
ReLU2^22 Wins: Discovering Efficient Activation Functions for Sparse LLMs
Zhengyan Zhang
Yixin Song
Guanghui Yu
Xu Han
Yankai Lin
Chaojun Xiao
Chenyang Song
Zhiyuan Liu
Zeyu Mi
Maosong Sun
22
31
0
06 Feb 2024
A Survey on Transformer Compression
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
41
28
0
05 Feb 2024
DenseFormer: Enhancing Information Flow in Transformers via Depth
  Weighted Averaging
DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
Matteo Pagliardini
Amirkeivan Mohtashami
F. Fleuret
Martin Jaggi
40
6
0
04 Feb 2024
Unified Training of Universal Time Series Forecasting Transformers
Unified Training of Universal Time Series Forecasting Transformers
Gerald Woo
Chenghao Liu
Akshat Kumar
Caiming Xiong
Silvio Savarese
Doyen Sahoo
AI4TS
120
165
0
04 Feb 2024
Leveraging Continuously Differentiable Activation Functions for Learning in Quantized Noisy Environments
Leveraging Continuously Differentiable Activation Functions for Learning in Quantized Noisy Environments
Vivswan Shah
Nathan Youngblood
41
2
0
04 Feb 2024
Learning Structure-Aware Representations of Dependent Types
Learning Structure-Aware Representations of Dependent Types
Konstantinos Kogkalidis
Orestis Melkonian
Jean-Philippe Bernardy
NAI
26
1
0
03 Feb 2024
From PEFT to DEFT: Parameter Efficient Finetuning for Reducing
  Activation Density in Transformers
From PEFT to DEFT: Parameter Efficient Finetuning for Reducing Activation Density in Transformers
Bharat Runwal
Tejaswini Pedapati
Pin-Yu Chen
MoE
49
5
0
02 Feb 2024
Nomic Embed: Training a Reproducible Long Context Text Embedder
Nomic Embed: Training a Reproducible Long Context Text Embedder
Zach Nussbaum
John X. Morris
Brandon Duderstadt
Andriy Mulyar
27
96
0
02 Feb 2024
Investigating Recurrent Transformers with Dynamic Halt
Investigating Recurrent Transformers with Dynamic Halt
Jishnu Ray Chowdhury
Cornelia Caragea
41
1
0
01 Feb 2024
OLMo: Accelerating the Science of Language Models
OLMo: Accelerating the Science of Language Models
Dirk Groeneveld
Iz Beltagy
Pete Walsh
Akshita Bhagia
Rodney Michael Kinney
...
Jesse Dodge
Kyle Lo
Luca Soldaini
Noah A. Smith
Hanna Hajishirzi
OSLM
141
359
0
01 Feb 2024
BlackMamba: Mixture of Experts for State-Space Models
BlackMamba: Mixture of Experts for State-Space Models
Quentin G. Anthony
Yury Tokpanov
Paolo Glorioso
Beren Millidge
33
21
0
01 Feb 2024
LOCOST: State-Space Models for Long Document Abstractive Summarization
LOCOST: State-Space Models for Long Document Abstractive Summarization
Florian Le Bronnec
Song Duong
Mathieu Ravaut
Alexandre Allauzen
Nancy F. Chen
Vincent Guigue
Alberto Lumbreras
Laure Soulier
Patrick Gallinari
48
8
0
31 Jan 2024
Weaver: Foundation Models for Creative Writing
Weaver: Foundation Models for Creative Writing
Tiannan Wang
Jiamin Chen
Qingrui Jia
Shuai Wang
Ruoyu Fang
...
Xiaohua Xu
Ningyu Zhang
Huajun Chen
Yuchen Eleanor Jiang
Wangchunshu Zhou
35
20
0
30 Jan 2024
TeenyTinyLlama: open-source tiny language models trained in Brazilian
  Portuguese
TeenyTinyLlama: open-source tiny language models trained in Brazilian Portuguese
N. Corrêa
Sophia Falk
Shiza Fatimah
Aniket Sen
N. D. Oliveira
30
9
0
30 Jan 2024
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Fuzhao Xue
Zian Zheng
Yao Fu
Jinjie Ni
Zangwei Zheng
Wangchunshu Zhou
Yang You
MoE
36
88
0
29 Jan 2024
Baichuan2-Sum: Instruction Finetune Baichuan2-7B Model for Dialogue
  Summarization
Baichuan2-Sum: Instruction Finetune Baichuan2-7B Model for Dialogue Summarization
Jianfei Xiao
Yancan Chen
Yimin Ou
Hanyi Yu
Kai Shu
Yiyong Xiao
ALM
31
11
0
27 Jan 2024
The Case for Co-Designing Model Architectures with Hardware
The Case for Co-Designing Model Architectures with Hardware
Quentin G. Anthony
Jacob Hatef
Deepak Narayanan
Stella Biderman
Stas Bekman
Junqi Yin
Hari Subramoni
Hari Subramoni
Dhabaleswar Panda
3DV
19
4
0
25 Jan 2024
TURNA: A Turkish Encoder-Decoder Language Model for Enhanced
  Understanding and Generation
TURNA: A Turkish Encoder-Decoder Language Model for Enhanced Understanding and Generation
Gokcce Uludougan
Zeynep Yirmibecsouglu Balal
Furkan Akkurt
Melikcsah Turker
Onur Gungor
S. Uskudarli
39
12
0
25 Jan 2024
A Survey of Deep Learning and Foundation Models for Time Series
  Forecasting
A Survey of Deep Learning and Foundation Models for Time Series Forecasting
John A. Miller
Mohammed Aldosari
Farah Saeed
Nasid Habib Barna
Subas Rana
I. Arpinar
Ninghao Liu
AI4TS
AI4CE
43
20
0
25 Jan 2024
SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced
  Token Detection
SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection
Ke Ye
Heinrich Jiang
Afshin Rostamizadeh
Ayan Chakrabarti
Giulia DeSalvo
Jean-François Kagy
Lazaros Karydas
Gui Citovsky
Sanjiv Kumar
36
0
0
24 Jan 2024
In-Context Language Learning: Architectures and Algorithms
In-Context Language Learning: Architectures and Algorithms
Ekin Akyürek
Bailin Wang
Yoon Kim
Jacob Andreas
LRM
ReLM
48
42
0
23 Jan 2024
MVSFormer++: Revealing the Devil in Transformer's Details for Multi-View
  Stereo
MVSFormer++: Revealing the Devil in Transformer's Details for Multi-View Stereo
Chenjie Cao
Xinlin Ren
Yanwei Fu
34
26
0
22 Jan 2024
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass
  Diffusion Transformers
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers
Katherine Crowson
Stefan Andreas Baumann
Alex Birch
Tanishq Mathew Abraham
Daniel Z. Kaplan
Enrico Shippole
29
48
0
21 Jan 2024
A Study on Training and Developing Large Language Models for Behavior
  Tree Generation
A Study on Training and Developing Large Language Models for Behavior Tree Generation
Fu Li
Xueying Wang
Bin Li
Yunlong Wu
Yanzhen Wang
Xiaodong Yi
14
4
0
16 Jan 2024
Extreme Compression of Large Language Models via Additive Quantization
Extreme Compression of Large Language Models via Additive Quantization
Vage Egiazarian
Andrei Panferov
Denis Kuznedelev
Elias Frantar
Artem Babenko
Dan Alistarh
MQ
100
91
0
11 Jan 2024
xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering
  the Language of Protein
xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein
Bo Chen
Xingyi Cheng
Pan Li
Yangli-ao Geng
Jing Gong
...
Chiming Liu
Aohan Zeng
Yuxiao Dong
Jie Tang
Leo T. Song
42
101
0
11 Jan 2024
FFSplit: Split Feed-Forward Network For Optimizing Accuracy-Efficiency
  Trade-off in Language Model Inference
FFSplit: Split Feed-Forward Network For Optimizing Accuracy-Efficiency Trade-off in Language Model Inference
Zirui Liu
Qingquan Song
Q. Xiao
Sathiya Keerthi Selvaraj
Rahul Mazumder
Aman Gupta
Xia Hu
35
4
0
08 Jan 2024
TeleChat Technical Report
TeleChat Technical Report
Zhongjiang He
Zihan Wang
Xinzhan Liu
Shixuan Liu
Yitong Yao
...
Zilu Huang
Sishi Xiong
Yuxiang Zhang
Chao Wang
Shuangyong Song
AI4MH
LRM
ALM
66
3
0
08 Jan 2024
PIXAR: Auto-Regressive Language Modeling in Pixel Space
PIXAR: Auto-Regressive Language Modeling in Pixel Space
Yintao Tai
Xiyang Liao
Alessandro Suglia
Antonio Vergari
MLLM
26
7
0
06 Jan 2024
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DeepSeek-AI Xiao Bi
:
Xiao Bi
Deli Chen
Guanting Chen
...
Yao Zhao
Shangyan Zhou
Shunfeng Zhou
Qihao Zhu
Yuheng Zou
LRM
ALM
139
309
0
05 Jan 2024
Introducing Bode: A Fine-Tuned Large Language Model for Portuguese
  Prompt-Based Task
Introducing Bode: A Fine-Tuned Large Language Model for Portuguese Prompt-Based Task
Gabriel Lino Garcia
P. H. Paiola
Luis Henrique Morelli
Giovani Candido
Arnaldo Cândido Júnior
D. Jodas
Luis C. S. Afonso
I. R. Guilherme
B. Penteado
João Paulo Papa
24
11
0
05 Jan 2024
TinyLlama: An Open-Source Small Language Model
TinyLlama: An Open-Source Small Language Model
Peiyuan Zhang
Guangtao Zeng
Tianduo Wang
Wei Lu
ALM
LRM
54
359
0
04 Jan 2024
Enhancing Automatic Modulation Recognition through Robust Global Feature
  Extraction
Enhancing Automatic Modulation Recognition through Robust Global Feature Extraction
Yunpeng Qu
Zhilin Lu
Rui Zeng
Jintao Wang
Jian Wang
42
8
0
02 Jan 2024
Previous
123...1011121389
Next