Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.16570
Cited By
URLs Help, Topics Guide: Understanding Metadata Utility in LLM Training
22 May 2025
Dongyang Fan
Vinko Sabolčec
Martin Jaggi
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"URLs Help, Topics Guide: Understanding Metadata Utility in LLM Training"
4 / 4 papers shown
Title
When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars
Rei Higuchi
Ryotaro Kawata
Naoki Nishikawa
Kazusato Oko
Shoichiro Yamaguchi
Sosuke Kobayashi
Seiya Tokui
K. Hayashi
Daisuke Okanohara
Taiji Suzuki
AI4CE
83
1
0
24 Apr 2025
Organize the Web: Constructing Domains Enhances Pre-Training Data Curation
Alexander Wettig
Kyle Lo
Sewon Min
Hannaneh Hajishirzi
Danqi Chen
Luca Soldaini
152
16
0
14 Feb 2025
OLMES: A Standard for Language Model Evaluations
Yuling Gu
Oyvind Tafjord
Bailey Kuehl
Dany Haddad
Jesse Dodge
Hannaneh Hajishirzi
ELM
124
20
0
12 Jun 2024
Understanding Emergent Abilities of Language Models from the Loss Perspective
Zhengxiao Du
Aohan Zeng
Yuxiao Dong
Jie Tang
UQCV
LRM
151
56
0
23 Mar 2024
1