Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2001.08361
Cited By
Scaling Laws for Neural Language Models
23 January 2020
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scaling Laws for Neural Language Models"
50 / 966 papers shown
Title
COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs
Liming Liu
Zhenghao Xu
Zixuan Zhang
Hao Kang
Zichong Li
Chen Liang
Weizhu Chen
T. Zhao
125
1
0
24 Feb 2025
Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models
Zhenyu Pan
Haozheng Luo
Manling Li
Han Liu
LRM
48
14
0
24 Feb 2025
TituLLMs: A Family of Bangla LLMs with Comprehensive Benchmarking
Shahriar Kabir Nahin
R. N. Nandi
Sagor Sarker
Quazi Sarwar Muhtaseem
Md. Kowsher
Apu Chandraw Shill
Md Ibrahim
Mehadi Hasan Menon
Tareq Al Muntasir
Firoj Alam
68
0
0
24 Feb 2025
Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding
Tian Jin
Ellie Y. Cheng
Zack Ankner
Nikunj Saunshi
Blake M. Elias
Amir Yazdanbakhsh
Jonathan Ragan-Kelley
Suvinay Subramanian
Michael Carbin
60
3
0
24 Feb 2025
Selective Prompt Anchoring for Code Generation
Yuan Tian
Tianyi Zhang
91
3
0
24 Feb 2025
LongAttn: Selecting Long-context Training Data via Token-level Attention
Longyun Wu
Dawei Zhu
Guangxiang Zhao
Zhuocheng Yu
Junfeng Ran
Xiangyu Wong
Lin Sun
Sujian Li
43
0
0
24 Feb 2025
CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter
Yepeng Weng
Dianwen Mei
Huishi Qiu
Xujie Chen
Li Liu
Jiang Tian
Zhongchao Shi
50
0
0
24 Feb 2025
Beyond Release: Access Considerations for Generative AI Systems
Irene Solaiman
Rishi Bommasani
Dan Hendrycks
Ariel Herbert-Voss
Yacine Jernite
Aviya Skowron
Andrew Trask
62
1
0
23 Feb 2025
A Survey of Model Architectures in Information Retrieval
Zhichao Xu
Fengran Mo
Zhiqi Huang
Crystina Zhang
Puxuan Yu
Bei Wang
Jimmy J. Lin
Vivek Srikumar
KELM
3DV
58
2
0
21 Feb 2025
Dynamic Low-Rank Sparse Adaptation for Large Language Models
Weizhong Huang
Yuxin Zhang
Xiawu Zheng
Yong-Jin Liu
Jing Lin
Yiwu Yao
Rongrong Ji
95
1
0
21 Feb 2025
Forecasting Frontier Language Model Agent Capabilities
Govind Pimpale
Axel Højmark
Jérémy Scheurer
Marius Hobbhahn
LLMAG
ELM
46
1
0
21 Feb 2025
A Frontier AI Risk Management Framework: Bridging the Gap Between Current AI Practices and Established Risk Management
Simeon Campos
Henry Papadatos
Fabien Roger
Chloé Touzet
Malcolm Murray
Otter Quarks
92
2
0
20 Feb 2025
NEAR: A Training-Free Pre-Estimator of Machine Learning Model Performance
Raphael T. Husistein
Markus Reiher
Marco Eckhoff
142
1
0
20 Feb 2025
Is This Collection Worth My LLM's Time? Automatically Measuring Information Potential in Text Corpora
Tristan Karch
Luca Engel
Philippe Schwaller
Frédéric Kaplan
82
0
0
19 Feb 2025
None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks
Eva Sánchez Salido
Julio Gonzalo
Guillermo Marco
ELM
60
2
0
18 Feb 2025
Architect of the Bits World: Masked Autoregressive Modeling for Circuit Generation Guided by Truth Table
Haoyuan Wu
Haisheng Zheng
Shoubo Hu
Zhuolun He
Bei Yu
50
0
0
18 Feb 2025
Token Communications: A Unified Framework for Cross-modal Context-aware Semantic Communications
Li Qiao
Mahdi Boloursaz Mashhadi
Zhen Gao
Rahim Tafazolli
Mehdi Bennis
Dusit Niyato
84
6
0
17 Feb 2025
A Critical Look At Tokenwise Reward-Guided Text Generation
Ahmad Rashid
Ruotian Wu
Julia Grosse
Agustinus Kristiadi
Pascal Poupart
OffRL
73
0
0
17 Feb 2025
An Efficient Large Recommendation Model: Towards a Resource-Optimal Scaling Law
Songpei Xu
Shijia Wang
Da Guo
Xianwen Guo
Qiang Xiao
Fangjian Li
Chuanjiang Luo
80
0
0
17 Feb 2025
Atom of Thoughts for Markov LLM Test-Time Scaling
Fengwei Teng
Zhaoyang Yu
Quan Shi
Jiayi Zhang
Chenglin Wu
Yuyu Luo
MU
LRM
54
13
0
17 Feb 2025
Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
Fan Zhou
Zengzhi Wang
Qian Liu
Junlong Li
Pengfei Liu
ALM
102
15
0
17 Feb 2025
Quantifying the Capability Boundary of DeepSeek Models: An Application-Driven Performance Analysis
Kaikai Zhao
Zhaoxiang Liu
Xuejiao Lei
Rongjia Du
Zhenhong Long
...
Minjie Hua
Kai Wang
Wei Liu
Ning Wang
Kai Wang
ELM
LRM
60
1
0
16 Feb 2025
Large Language Diffusion Models
Shen Nie
Fengqi Zhu
Zebin You
Xiaolu Zhang
Jingyang Ou
Jun Hu
Jun Zhou
Yankai Lin
Ji-Rong Wen
Chongxuan Li
112
15
0
14 Feb 2025
TLOB: A Novel Transformer Model with Dual Attention for Price Trend Prediction with Limit Order Book Data
Leonardo Berti
Gjergji Kasneci
AI4TS
42
0
0
12 Feb 2025
A Large-Scale Benchmark for Vietnamese Sentence Paraphrases
Sang Quang Nguyen
Kiet Van Nguyen
62
0
0
11 Feb 2025
EfficientLLM: Scalable Pruning-Aware Pretraining for Architecture-Agnostic Edge Language Models
Xingrun Xing
Zheng Liu
Shitao Xiao
Boyan Gao
Yiming Liang
Wanpeng Zhang
Haokun Lin
Guoqi Li
Jiajun Zhang
LRM
64
1
0
10 Feb 2025
EquiTabPFN: A Target-Permutation Equivariant Prior Fitted Networks
Michael Arbel
David Salinas
Frank Hutter
70
2
0
10 Feb 2025
Membership Inference Risks in Quantized Models: A Theoretical and Empirical Study
Eric Aubinais
Philippe Formont
Pablo Piantanida
Elisabeth Gassiat
50
0
0
10 Feb 2025
Privacy-Preserving Dataset Combination
Keren Fuentes
Mimee Xu
Irene Chen
38
0
0
09 Feb 2025
FuXi-
α
\alpha
α
: Scaling Recommendation Model with Feature Interaction Enhanced Transformer
Yufei Ye
Wei Guo
Jin Yao Chin
Hao Wang
Hong Zhu
...
Yuyang Ye
Y. Liu
Ruiming Tang
Defu Lian
Enhong Chen
97
2
0
05 Feb 2025
Modular Training of Neural Networks aids Interpretability
Satvik Golechha
Maheep Chaudhary
Joan Velja
Alessandro Abate
Nandi Schoots
79
0
0
04 Feb 2025
Explaining Context Length Scaling and Bounds for Language Models
Jingzhe Shi
Qinwei Ma
Hongyi Liu
Hang Zhao
Jeng-Neng Hwang
Serge Belongie
LRM
79
2
0
03 Feb 2025
A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods
Isha Puri
Shivchander Sudalairaj
Guangxuan Xu
Kai Xu
Akash Srivastava
LRM
76
3
0
03 Feb 2025
Position: AI Scaling: From Up to Down and Out
Yunke Wang
Yanxi Li
Chang Xu
HAI
86
2
0
02 Feb 2025
FinchGPT: a Transformer based language model for birdsong analysis
Kosei Kobayashi
Kosuke Matsuzaki
Masaya Taniguchi
Keisuke Sakaguchi
Kentaro Inui
Kentaro Abe
70
0
0
01 Feb 2025
Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment
Gregor Bachmann
Sotiris Anagnostidis
Albert Pumarola
Markos Georgopoulos
A. Sanakoyeu
Yuming Du
Edgar Schönfeld
Ali K. Thabet
Jonas Kohler
ALM
BDL
93
6
0
31 Jan 2025
LLMs can be Fooled into Labelling a Document as Relevant (best caf\é near me; this paper is perfectly relevant)
Marwah Alaofi
Paul Thomas
Falk Scholer
Mark Sanderson
48
15
0
29 Jan 2025
Privacy-Preserving Edge Speech Understanding with Tiny Foundation Models
A. Benazir
Felix Xiaozhu Lin
47
0
0
29 Jan 2025
Task Arithmetic in Trust Region: A Training-Free Model Merging Approach to Navigate Knowledge Conflicts
Wenju Sun
Qingyong Li
Wen Wang
Yangli-ao Geng
Boyang Li
38
2
0
28 Jan 2025
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
Makoto Shing
Kou Misaki
Han Bao
Sho Yokoi
Takuya Akiba
VLM
57
1
0
28 Jan 2025
Token Democracy: The Architectural Limits of Alignment in Transformer-Based Language Models
Robin Young
49
0
0
28 Jan 2025
Hands-On Tutorial: Labeling with LLM and Human-in-the-Loop
Ekaterina Artemova
Akim Tsvigun
Dominik Schlechtweg
Natalia Fedorova
Konstantin Chernyshev
Sergei Tilga
Boris Obmoroshev
SyDa
VLM
125
0
0
28 Jan 2025
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics
Kai He
Rui Mao
Qika Lin
Yucheng Ruan
Xiang Lan
Mengling Feng
Erik Cambria
LM&MA
AILaw
93
154
0
28 Jan 2025
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
Samira Abnar
Harshay Shah
Dan Busbridge
Alaaeldin Mohamed Elnouby Ali
J. Susskind
Vimal Thilak
MoE
LRM
39
5
0
28 Jan 2025
360Brew: A Decoder-only Foundation Model for Personalized Ranking and Recommendation
Hamed Firooz
Maziar Sanjabi
Adrian Englhardt
Aman Gupta
Ben Levine
...
Xiaoling Zhai
Ya Xu
Yu Wang
Yun Dai
Yun Dai
ALM
42
3
0
27 Jan 2025
Scaling laws for decoding images from brain activity
Hubert J. Banville
Yohann Benchetrit
Stéphane DÁscoli
Jérémy Rapin
J. King
MedIm
52
0
0
25 Jan 2025
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
Kai-Tuo Xu
Feng-Long Xie
Xu Tang
Yao Hu
69
4
0
24 Jan 2025
Top Ten Challenges Towards Agentic Neural Graph Databases
Jiaxin Bai
Zehua Wang
Yukun Zhou
hang Yin
Weizhi Fei
...
Binhang Yuan
Wei Wang
Lei Chen
Xiaofang Zhou
Yangqiu Song
109
0
0
24 Jan 2025
VideoShield: Regulating Diffusion-based Video Generation Models via Watermarking
Runyi Hu
Jingyang Zhang
Y. Li
Jiwei Li
Qing-Wu Guo
Han Qiu
Tianwei Zhang
WIGM
VGen
81
4
0
24 Jan 2025
Learning Versatile Optimizers on a Compute Diet
A. Moudgil
Boris Knyazev
Guillaume Lajoie
Eugene Belilovsky
147
0
0
22 Jan 2025
Previous
1
2
3
4
5
...
18
19
20
Next