Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.02385
Cited By
TinyLlama: An Open-Source Small Language Model
4 January 2024
Peiyuan Zhang
Guangtao Zeng
Tianduo Wang
Wei Lu
ALM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"TinyLlama: An Open-Source Small Language Model"
50 / 266 papers shown
Title
Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers
Siddharth Singh
Prajwal Singhania
Aditya K. Ranjan
John Kirchenbauer
Jonas Geiping
...
Abhimanyu Hans
Manli Shu
Aditya Tomar
Tom Goldstein
A. Bhatele
105
2
0
12 Feb 2025
Bridging the Safety Gap: A Guardrail Pipeline for Trustworthy LLM Inferences
Shanshan Han
Salman Avestimehr
Chaoyang He
76
1
0
12 Feb 2025
Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding
Zhilin Wang
Muneeza Azmart
Ang Li
R. Horesh
Mikhail Yurochkin
118
1
0
11 Feb 2025
Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM
Qingshui Gu
Shu Li
Tianyu Zheng
Zhaoxiang Zhang
281
0
0
10 Feb 2025
EfficientLLM: Scalable Pruning-Aware Pretraining for Architecture-Agnostic Edge Language Models
Xingrun Xing
Zheng Liu
Shitao Xiao
Boyan Gao
Yiming Liang
Wanpeng Zhang
Haokun Lin
Guoqi Li
Jiajun Zhang
LRM
67
1
0
10 Feb 2025
Lossless Acceleration of Large Language Models with Hierarchical Drafting based on Temporal Locality in Speculative Decoding
Sukmin Cho
S. Choi
T. Hwang
Jeongyeon Seo
Soyeong Jeong
Huije Lee
Hoyun Song
Jong C. Park
Youngjin Kwon
51
0
0
08 Feb 2025
MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs
Yuhang Zhou
Giannis Karamanolakis
Victor Soto
Anna Rumshisky
Mayank Kulkarni
Furong Huang
Wei Ai
Jianhua Lu
MoMe
113
1
0
03 Feb 2025
Nearly Lossless Adaptive Bit Switching
Haiduo Huang
Zhenhua Liu
Tian Xia
Wenzhe zhao
Pengju Ren
MQ
68
0
0
03 Feb 2025
Vision-centric Token Compression in Large Language Model
Ling Xing
Alex Jinpeng Wang
Rui Yan
Xiangbo Shu
Jinhui Tang
VLM
65
0
0
02 Feb 2025
Evaluating Small Language Models for News Summarization: Implications and Factors Influencing Performance
Borui Xu
Yao Chen
Zeyi Wen
Weiguo Liu
Bingsheng He
84
1
0
02 Feb 2025
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
Makoto Shing
Kou Misaki
Han Bao
Sho Yokoi
Takuya Akiba
VLM
57
1
0
28 Jan 2025
Irrational Complex Rotations Empower Low-bit Optimizers
Zhen Tian
Wayne Xin Zhao
Zhicheng Dou
MQ
46
0
0
22 Jan 2025
Empowering Large Language Models in Wireless Communication: A Novel Dataset and Fine-Tuning Framework
Yushen Lin
Ruichen Zhang
Wenqi Huang
Kaidi Wang
Z. Ding
Daniel K. C. So
Dusit Niyato
76
0
0
17 Jan 2025
On the uncertainty principle of neural networks
Jun-Jie Zhang
Dong-xiao Zhang
Jian-Nan Chen
L. Pang
Deyu Meng
59
2
0
17 Jan 2025
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
221
0
0
30 Dec 2024
Deploying Foundation Model Powered Agent Services: A Survey
Wenchao Xu
Jinyu Chen
Peirong Zheng
Xiaoquan Yi
Tianyi Tian
...
Quan Wan
Yining Qi
Yunfeng Fan
Qinliang Su
Xuemin Shen
AI4CE
119
1
0
18 Dec 2024
Learning to Reason via Self-Iterative Process Feedback for Small Language Models
Kaiyuan Chen
Jin Wang
Xuejie Zhang
LRM
ReLM
90
2
0
11 Dec 2024
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
Byung-Kwan Lee
Ryo Hachiuma
Yu-Chiang Frank Wang
Y. Ro
Yueh-Hua Wu
VLM
81
0
0
02 Dec 2024
Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Model
Qianhan Feng
Wenshuo Li
Tong Lin
Xinghao Chen
VLM
77
0
0
02 Dec 2024
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
Shufan Li
Konstantinos Kallidromitis
Akash Gokul
Zichun Liao
Yusuke Kato
Kazuki Kozuka
Aditya Grover
VGen
105
5
0
02 Dec 2024
CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning
Duo Wu
Yufei Guo
Yuan Meng
Yanning Zhang
Le Sun
Zhi Wang
258
0
0
25 Nov 2024
Bi-Mamba: Towards Accurate 1-Bit State Space Models
Shengkun Tang
Liqun Ma
Yiming Li
Mingjie Sun
Zhiqiang Shen
Mamba
81
3
0
18 Nov 2024
LLäMmlein: Compact and Competitive German-Only Language Models from Scratch
Jan Pfister
Julia Wunderle
Andreas Hotho
28
2
0
17 Nov 2024
SoftLMs: Efficient Adaptive Low-Rank Approximation of Language Models using Soft-Thresholding Mechanism
Priyansh Bhatnagar
Linfeng Wen
Mingu Kang
39
0
0
15 Nov 2024
FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training
Philip Zmushko
Aleksandr Beznosikov
Martin Takáč
Samuel Horváth
44
0
0
12 Nov 2024
LLM-NEO: Parameter Efficient Knowledge Distillation for Large Language Models
Runming Yang
Taiqiang Wu
Jiahao Wang
Pengfei Hu
Ngai Wong
Yujiu Yang
Yujiu Yang
208
1
0
11 Nov 2024
Privacy Risks of Speculative Decoding in Large Language Models
Jiankun Wei
Abdulrahman Abdulrazzag
Tianchen Zhang
Adel Muursepp
Gururaj Saileshwar
37
2
0
01 Nov 2024
Improving Few-Shot Cross-Domain Named Entity Recognition by Instruction Tuning a Word-Embedding based Retrieval Augmented Large Language Model
Subhadip Nandi
Neeraj Agrawal
36
0
0
01 Nov 2024
MoD: A Distribution-Based Approach for Merging Large Language Models
Quy-Anh Dang
Chris Ngo
MoMe
VLM
31
0
0
01 Nov 2024
MESS+: Energy-Optimal Inferencing in Language Model Zoos with Service Level Guarantees
Ryan Zhang
Herbert Woisetschläger
Shiqiang Wang
Hans-Arno Jacobsen
26
0
0
31 Oct 2024
Multilingual Pretraining Using a Large Corpus Machine-Translated from a Single Source Language
Jiayi Wang
Yao Lu
Maurice Weber
Max Ryabinin
Yihong Chen
Raphael Tang
Pontus Stenetorp
LRM
47
1
0
31 Oct 2024
Mobility-LLM: Learning Visiting Intentions and Travel Preferences from Human Mobility Data with Large Language Models
Letian Gong
Yan Lin
Xinyue Zhang
Yiwen Lu
Xuedi Han
Yichen Liu
Shengnan Guo
Youfang Lin
Huaiyu Wan
54
5
0
29 Oct 2024
Transferable Post-training via Inverse Value Learning
Xinyu Lu
Xueru Wen
Yaojie Lu
Bowen Yu
Hongyu Lin
Haiyang Yu
Le Sun
Xianpei Han
Yongbin Li
25
1
0
28 Oct 2024
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Sangmin Bae
Adam Fisch
Hrayr Harutyunyan
Ziwei Ji
Seungyeon Kim
Tal Schuster
KELM
84
5
0
28 Oct 2024
Computational Bottlenecks of Training Small-scale Large Language Models
Saleh Ashkboos
Iman Mirzadeh
Keivan Alizadeh
Mohammad Hossein Sekhavat
Moin Nabi
Mehrdad Farajtabar
Fartash Faghri
26
0
0
25 Oct 2024
Bielik 7B v0.1: A Polish Language Model -- Development, Insights, and Evaluation
Krzysztof Ociepa
Łukasz Flis
Krzysztof Wróbel
Adrian Gwoździej
Remigiusz Kinas
27
1
0
24 Oct 2024
Scaling up Masked Diffusion Models on Text
Shen Nie
Fengqi Zhu
Chao Du
Tianyu Pang
Qian Liu
Guangtao Zeng
Min Lin
Chongxuan Li
AI4CE
63
14
0
24 Oct 2024
MiniPLM: Knowledge Distillation for Pre-Training Language Models
Yuxian Gu
Hao Zhou
Fandong Meng
Jie Zhou
Minlie Huang
73
5
0
22 Oct 2024
A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference
You Wu
Haoyi Wu
Kewei Tu
34
3
0
18 Oct 2024
BenTo: Benchmark Task Reduction with In-Context Transferability
Hongyu Zhao
Ming Li
Lichao Sun
Tianyi Zhou
35
0
0
17 Oct 2024
Learning from Imperfect Data: Towards Efficient Knowledge Distillation of Autoregressive Language Models for Text-to-SQL
Qihuang Zhong
Kunfeng Chen
Liang Ding
Juhua Liu
Bo Du
Dacheng Tao
44
0
0
15 Oct 2024
ControlMM: Controllable Masked Motion Generation
Ekkasit Pinyoanuntapong
Muhammad Usama Saleem
Korrawe Karunratanakul
Pu Wang
Hongfei Xue
Chong Chen
Chuan Guo
Junli Cao
J. Ren
Sergey Tulyakov
VGen
42
4
0
14 Oct 2024
Reverse Modeling in Large Language Models
S. Yu
Yuanchen Xu
Cunxiao Du
Yanying Zhou
Minghui Qiu
Q. Sun
Hao Zhang
Jiawei Wu
41
2
0
13 Oct 2024
CAMPHOR: Collaborative Agents for Multi-input Planning and High-Order Reasoning On Device
Yicheng Fu
R. Anantha
Jianpeng Cheng
LRM
LLMAG
28
2
0
12 Oct 2024
Generation with Dynamic Vocabulary
Yanting Liu
Tao Ji
Changzhi Sun
Yuanbin Wu
Xiaoling Wang
45
0
0
11 Oct 2024
KV Prediction for Improved Time to First Token
Maxwell Horton
Qingqing Cao
Chenfan Sun
Yanzi Jin
Sachin Mehta
Mohammad Rastegari
Moin Nabi
AI4TS
39
2
0
10 Oct 2024
VerifierQ: Enhancing LLM Test Time Compute with Q-Learning-based Verifiers
Jianing Qi
Hao Tang
Zhigang Zhu
OffRL
LRM
38
4
0
10 Oct 2024
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
Peng Jin
Bo Zhu
Li Yuan
Shuicheng Yan
MoE
32
4
0
09 Oct 2024
Exploring the Readiness of Prominent Small Language Models for the Democratization of Financial Literacy
Tagore Rao Kosireddy
Jeffrey D. Wall
Evan Lucas
34
1
0
09 Oct 2024
Personal Intelligence System UniLM: Hybrid On-Device Small Language Model and Server-Based Large Language Model for Malay Nusantara
Azree Nazri
Olalekan Agbolade
Faisal Aziz
30
0
0
09 Oct 2024
Previous
1
2
3
4
5
6
Next