Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.02737
Cited By
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
4 February 2025
Loubna Ben Allal
Anton Lozhkov
Elie Bakouch
Gabriel Martín Blázquez
Guilherme Penedo
Lewis Tunstall
Andrés Marafioti
Hynek Kydlícek
Agustín Piqueres Lajarín
Vaibhav Srivastav
Joshua Lochner
Caleb Fahlgren
Xuan-Son Nguyen
Clémentine Fourrier
Ben Burtenshaw
Hugo Larcher
Haojun Zhao
Cyril Zakka
Mathieu Morlon
Colin Raffel
Leandro von Werra
Thomas Wolf
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model"
28 / 28 papers shown
Title
DES-LOC: Desynced Low Communication Adaptive Optimizers for Training Foundation Models
Alex Iacob
Lorenzo Sani
M. Safaryan
Paris Giampouras
Samuel Horváth
...
Meghdad Kurmanji
Preslav Aleksandrov
William F. Shen
Xinchi Qiu
Nicholas D. Lane
OffRL
58
0
0
28 May 2025
NileChat: Towards Linguistically Diverse and Culturally Aware LLMs for Local Communities
Abdellah El Mekki
Houdaifa Atou
Omer Nacar
Shady Shehata
Muhammad Abdul-Mageed
49
0
0
23 May 2025
MEgoHand: Multimodal Egocentric Hand-Object Interaction Motion Generation
Bohan Zhou
Yi Zhan
Zhongbin Zhang
Zongqing Lu
47
0
0
22 May 2025
A Japanese Language Model and Three New Evaluation Benchmarks for Pharmaceutical NLP
Issey Sukeda
Takuro Fujii
Kosei Buma
Shunsuke Sasaki
Shinnosuke Ono
ELM
44
0
0
22 May 2025
Krikri: Advancing Open Large Language Models for Greek
Dimitris Roussis
Leon Voukoutis
Georgios Paraskevopoulos
Sokratis Sofianopoulos
Prokopis Prokopidis
Vassilis Papavasileiou
Athanasios Katsamanis
Stelios Piperidis
Vassilis Katsouros
ALM
79
0
0
19 May 2025
ChemPile: A 250GB Diverse and Curated Dataset for Chemical Foundation Models
Adrian Mirza
Nawaf Alampara
Martiño Ríos-García
Mohamed Abdelalim
Jack Butler
...
Mark Worrall
Adamo Young
Philippe Schwaller
Michael Pieler
Kevin Maik Jablonka
107
0
0
18 May 2025
Parallel Scaling Law for Language Models
Mouxiang Chen
Binyuan Hui
Zeyu Cui
Jiaxi Yang
Dayiheng Liu
Jianling Sun
Junyang Lin
Zhongxin Liu
MoE
LRM
75
2
0
15 May 2025
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection
Kai Hua
Steven Wu
Ge Zhang
Ke Shen
LRM
65
0
0
12 May 2025
FRAIN to Train: A Fast-and-Reliable Solution for Decentralized Federated Learning
Sanghyeon Park
Soo-Mook Moon
62
0
0
07 May 2025
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Xianhang Li
Yixiao Liu
Haoqin Tu
Hongru Zhu
Cihang Xie
VLM
369
1
0
07 May 2025
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark
Enxin Song
Wenhao Chai
Weili Xu
Jianwen Xie
Yuxuan Liu
Gaoang Wang
89
6
0
20 Apr 2025
Empirical Evaluation of Knowledge Distillation from Transformers to Subquadratic Language Models
Patrick Haller
Jonas Golde
Alan Akbik
94
0
0
19 Apr 2025
Summarization of Multimodal Presentations with Vision-Language Models: Study of the Effect of Modalities and Structure
Théo Gigant
Camille Guinaudeau
Frédéric Dufaux
72
0
0
14 Apr 2025
UNDO: Understanding Distillation as Optimization
Kushal Kumar Jain
Piyushi Goyal
Kumar Shridhar
63
0
0
03 Apr 2025
Scaling Analysis of Interleaved Speech-Text Language Models
Gallil Maimon
Michael Hassid
Amit Roth
Yossi Adi
AuLLM
99
1
0
03 Apr 2025
Scaling Language-Free Visual Representation Learning
David Fan
Shengbang Tong
Jiachen Zhu
Koustuv Sinha
Zhuang Liu
...
Michael G. Rabbat
Nicolas Ballas
Yann LeCun
Amir Bar
Saining Xie
CLIP
VLM
Presented at
ResearchTrend Connect | VLM
on
04 Jun 2025
151
4
0
01 Apr 2025
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
Nvidia
Johan Bjorck
Fernando Castañeda
Nikita Cherniadev
Xingye Da
...
Ao Zhang
Hao Zhang
Yizhou Zhao
Ruijie Zheng
Yuke Zhu
VLM
135
51
0
18 Mar 2025
ReSi: A Comprehensive Benchmark for Representational Similarity Measures
Max Klabunde
Tassilo Wald
Tobias Schumacher
Klaus H. Maier-Hein
Markus Strohmaier
Adriana Iamnitchi
AI4TS
VLM
203
6
0
13 Mar 2025
Mellow: a small audio language model for reasoning
Soham Deshmukh
Satvik Dixit
Rita Singh
Bhiksha Raj
AuLLM
ReLM
LRM
91
3
0
11 Mar 2025
Mixtera: A Data Plane for Foundation Model Training
Maximilian Böther
Xiaozhe Yao
Tolga Kerimoglu
Ana Klimovic
Viktor Gsteiger
Ana Klimovic
MoE
141
0
0
27 Feb 2025
On Pruning State-Space LLMs
Tamer Ghattas
Michael Hassid
Roy Schwartz
80
1
0
26 Feb 2025
Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases
Michael Y. Hu
Jackson Petty
Chuan Shi
William Merrill
Tal Linzen
AI4CE
96
2
0
26 Feb 2025
Machine-generated text detection prevents language model collapse
George Drayson
Emine Yilmaz
Vasileios Lampos
DeLMO
138
1
0
21 Feb 2025
Slamming: Training a Speech Language Model on One GPU in a Day
Gallil Maimon
Avishai Elmakies
Yossi Adi
66
3
0
19 Feb 2025
TituLLMs: A Family of Bangla LLMs with Comprehensive Benchmarking
Shahriar Kabir Nahin
R. N. Nandi
Sagor Sarker
Quazi Sarwar Muhtaseem
Md. Kowsher
Apu Chandraw Shill
Md Ibrahim
Mehadi Hasan Menon
Tareq Al Muntasir
Firoj Alam
135
0
0
16 Feb 2025
Electrocardiogram-Language Model for Few-Shot Question Answering with Meta Learning
Jialu Tang
Tong Xia
Yuan Lu
Cecilia Mascolo
Aaqib Saeed
AI4MH
70
3
0
18 Oct 2024
Masked Mixers for Language Generation and Retrieval
Benjamin L. Badger
108
0
0
02 Sep 2024
Similarity of Neural Network Models: A Survey of Functional and Representational Measures
Max Klabunde
Tobias Schumacher
M. Strohmaier
Florian Lemmerich
128
73
0
10 May 2023
1