Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.14905
Cited By
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
22 February 2024
Zechun Liu
Changsheng Zhao
Forrest N. Iandola
Chen Lai
Yuandong Tian
Igor Fedorov
Yunyang Xiong
Ernie Chang
Yangyang Shi
Raghuraman Krishnamoorthi
Liangzhen Lai
Vikas Chandra
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases"
28 / 28 papers shown
Title
Distilling LLM Agent into Small Models with Retrieval and Code Tools
Minki Kang
Jongwon Jeong
Seanie Lee
Jaewoong Cho
Sung Ju Hwang
LRM
120
1
0
23 May 2025
SmallPlan: Leverage Small Language Models for Sequential Path Planning with Simulation-Powered, LLM-Guided Distillation
Quang P.M. Pham
Khoi T.N. Nguyen
Nhi H. Doan
Cuong Pham
Kentaro Inui
Dezhen Song
146
0
0
01 May 2025
KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments
Junyoung Park
Dalton Jones
Matthew J Morse
Raghavv Goel
Mingu Lee
Chris Lott
51
0
0
21 Apr 2025
Simplifying Data Integration: SLM-Driven Systems for Unified Semantic Queries Across Heterogeneous Databases
Teng Lin
47
0
0
08 Apr 2025
SpinQuant: LLM quantization with learned rotations
Zechun Liu
Changsheng Zhao
Igor Fedorov
Bilge Soran
Dhruv Choudhary
Raghuraman Krishnamoorthi
Vikas Chandra
Yuandong Tian
Tijmen Blankevoort
MQ
167
105
0
21 Feb 2025
Slamming: Training a Speech Language Model on One GPU in a Day
Gallil Maimon
Avishai Elmakies
Yossi Adi
56
3
0
19 Feb 2025
DiSCo: Device-Server Collaborative LLM-Based Text Streaming Services
Ting Sun
Penghan Wang
Fan Lai
43
0
0
17 Feb 2025
EfficientLLM: Scalable Pruning-Aware Pretraining for Architecture-Agnostic Edge Language Models
Xingrun Xing
Zheng Liu
Shitao Xiao
Boyan Gao
Yiming Liang
Wanpeng Zhang
Haokun Lin
Guoqi Li
Jiajun Zhang
LRM
132
1
0
10 Feb 2025
ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization
Zechun Liu
Changsheng Zhao
Hanxian Huang
Sijia Chen
Jing Zhang
...
Yuandong Tian
Bilge Soran
Raghuraman Krishnamoorthi
Tijmen Blankevoort
Vikas Chandra
MQ
105
7
0
04 Feb 2025
Merging Feed-Forward Sublayers for Compressed Transformers
Neha Verma
Kenton W. Murray
Kevin Duh
AI4CE
89
0
0
10 Jan 2025
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Sangmin Bae
Adam Fisch
Hrayr Harutyunyan
Ziwei Ji
Seungyeon Kim
Tal Schuster
KELM
95
6
0
28 Oct 2024
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
Seanie Lee
Haebin Seong
Dong Bok Lee
Minki Kang
Xiaoyin Chen
Dominik Wagner
Yoshua Bengio
Juho Lee
Sung Ju Hwang
95
5
0
02 Oct 2024
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Guanqiao Qu
Qiyuan Chen
Wei Wei
Zheng Lin
Xianhao Chen
Kaibin Huang
72
49
0
09 Jul 2024
Can Small Language Models Learn, Unlearn, and Retain Noise Patterns?
Nicy Scaria
Silvester John Joseph Kennedy
Deepak N. Subramani
MU
43
2
0
01 Jul 2024
TinyLlama: An Open-Source Small Language Model
Peiyuan Zhang
Guangtao Zeng
Tianduo Wang
Wei Lu
ALM
LRM
109
381
0
04 Jan 2024
QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models
Jing Liu
Ruihao Gong
Xiuying Wei
Zhiwei Dong
Jianfei Cai
Bohan Zhuang
MQ
44
52
0
12 Oct 2023
Qwen Technical Report
Jinze Bai
Shuai Bai
Yunfei Chu
Zeyu Cui
Kai Dang
...
Zhenru Zhang
Chang Zhou
Jingren Zhou
Xiaohuan Zhou
Tianhang Zhu
OSLM
157
1,756
0
28 Sep 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
236
4,186
0
09 Jun 2023
Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts
Ganesh Jawahar
Haichuan Yang
Yunyang Xiong
Zechun Liu
Dilin Wang
...
Barlas Oğuz
Muhammad Abdul-Mageed
L. Lakshmanan
Raghuraman Krishnamoorthi
Vikas Chandra
52
4
0
08 Jun 2023
RWKV: Reinventing RNNs for the Transformer Era
Bo Peng
Eric Alcaide
Quentin G. Anthony
Alon Albalak
Samuel Arcadinho
...
Qihang Zhao
P. Zhou
Qinghua Zhou
Jian Zhu
Rui-Jie Zhu
165
585
0
22 May 2023
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BigScience Workshop
:
Teven Le Scao
Angela Fan
Christopher Akiki
...
Zhongli Xie
Zifan Ye
M. Bras
Younes Belkada
Thomas Wolf
VLM
280
2,364
0
09 Nov 2022
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
Sid Black
Stella Biderman
Eric Hallahan
Quentin G. Anthony
Leo Gao
...
Shivanshu Purohit
Laria Reynolds
J. Tow
Benqi Wang
Samuel Weinbach
122
820
0
14 Apr 2022
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
121
1,734
0
29 Jun 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
451
4,662
0
23 Jan 2020
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Mingxing Tan
Quoc V. Le
3DV
MedIm
121
17,950
0
28 May 2019
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
Christopher Clark
Kenton Lee
Ming-Wei Chang
Tom Kwiatkowski
Michael Collins
Kristina Toutanova
177
1,475
0
24 May 2019
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
Mandar Joshi
Eunsol Choi
Daniel S. Weld
Luke Zettlemoyer
RALM
173
2,610
0
09 May 2017
Language Modeling with Gated Convolutional Networks
Yann N. Dauphin
Angela Fan
Michael Auli
David Grangier
195
2,377
0
23 Dec 2016
1