ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.07965
  4. Cited By
Rho-1: Not All Tokens Are What You Need

Rho-1: Not All Tokens Are What You Need

11 April 2024
Zheng-Wen Lin
Zhibin Gou
Yeyun Gong
Xiao Liu
Yelong Shen
Ruochen Xu
Chen Lin
Yujiu Yang
Jian Jiao
Nan Duan
Weizhu Chen
    CLL
ArXivPDFHTML

Papers citing "Rho-1: Not All Tokens Are What You Need"

50 / 121 papers shown
Title
Textbooks Are All You Need
Textbooks Are All You Need
Suriya Gunasekar
Yi Zhang
J. Aneja
C. C. T. Mendes
Allison Del Giorno
...
Sébastien Bubeck
Ronen Eldan
Adam Tauman Kalai
Y. Lee
Yuan-Fang Li
AI4CE
ALM
SyDa
62
405
0
20 Jun 2023
Let's Verify Step by Step
Let's Verify Step by Step
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
ALM
OffRL
LRM
139
1,122
0
31 May 2023
Scaling Data-Constrained Language Models
Scaling Data-Constrained Language Models
Niklas Muennighoff
Alexander M. Rush
Boaz Barak
Teven Le Scao
Aleksandra Piktus
Nouamane Tazi
S. Pyysalo
Thomas Wolf
Colin Raffel
ALM
73
215
0
25 May 2023
Revisiting Token Dropping Strategy in Efficient BERT Pretraining
Revisiting Token Dropping Strategy in Efficient BERT Pretraining
Qihuang Zhong
Liang Ding
Juhua Liu
Xuebo Liu
Min Zhang
Bo Du
Dacheng Tao
VLM
44
9
0
24 May 2023
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis
Fuzhao Xue
Yao Fu
Wangchunshu Zhou
Zangwei Zheng
Yang You
103
81
0
22 May 2023
A Pretrainer's Guide to Training Data: Measuring the Effects of Data
  Age, Domain Coverage, Quality, & Toxicity
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity
Shayne Longpre
Gregory Yauney
Emily Reif
Katherine Lee
Adam Roberts
...
Denny Zhou
Jason W. Wei
Kevin Robinson
David M. Mimno
Daphne Ippolito
88
160
0
22 May 2023
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
Sang Michael Xie
Hieu H. Pham
Xuanyi Dong
Nan Du
Hanxiao Liu
Yifeng Lu
Percy Liang
Quoc V. Le
Tengyu Ma
Adams Wei Yu
MoMe
MoE
95
197
0
17 May 2023
Emergent and Predictable Memorization in Large Language Models
Emergent and Predictable Memorization in Large Language Models
Stella Biderman
USVSN Sai Prashanth
Lintang Sutawika
Hailey Schoelkopf
Quentin G. Anthony
Shivanshu Purohit
Edward Raf
62
123
0
21 Apr 2023
AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models
AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models
Wanjun Zhong
Ruixiang Cui
Yiduo Guo
Yaobo Liang
Shuai Lu
Yanlin Wang
Amin Saied
Weizhu Chen
Nan Duan
ALM
ELM
68
528
0
13 Apr 2023
CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual
  Benchmarking on HumanEval-X
CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-X
Qinkai Zheng
Xiao Xia
Xu Zou
Yuxiao Dong
Shanshan Wang
...
Andi Wang
Yang Li
Teng Su
Zhilin Yang
Jie Tang
ELM
ALM
SyDa
96
330
0
30 Mar 2023
InfoBatch: Lossless Training Speed Up by Unbiased Dynamic Data Pruning
InfoBatch: Lossless Training Speed Up by Unbiased Dynamic Data Pruning
Ziheng Qin
Kaidi Wang
Zangwei Zheng
Jianyang Gu
Xiang Peng
...
Daquan Zhou
Lei Shang
Baigui Sun
Xuansong Xie
Yang You
150
51
0
08 Mar 2023
Data Selection for Language Models via Importance Resampling
Data Selection for Language Models via Importance Resampling
Sang Michael Xie
Shibani Santurkar
Tengyu Ma
Percy Liang
94
189
0
06 Feb 2023
Specializing Smaller Language Models towards Multi-Step Reasoning
Specializing Smaller Language Models towards Multi-Step Reasoning
Yao Fu
Hao-Chun Peng
Litu Ou
Ashish Sabharwal
Tushar Khot
ReLM
LRM
86
256
0
30 Jan 2023
Training Trajectories of Language Models Across Scales
Training Trajectories of Language Models Across Scales
Mengzhou Xia
Mikel Artetxe
Chunting Zhou
Xi Lin
Ramakanth Pasunuru
Danqi Chen
Luke Zettlemoyer
Ves Stoyanov
AIFin
LRM
70
61
0
19 Dec 2022
Using Selective Masking as a Bridge between Pre-training and Fine-tuning
Using Selective Masking as a Bridge between Pre-training and Fine-tuning
Tanish Lad
Himanshu Maheshwari
Shreyas Kottukkal
R. Mamidi
49
3
0
24 Nov 2022
PAL: Program-aided Language Models
PAL: Program-aided Language Models
Luyu Gao
Aman Madaan
Shuyan Zhou
Uri Alon
Pengfei Liu
Yiming Yang
Jamie Callan
Graham Neubig
ReLM
LRM
93
442
0
18 Nov 2022
Contrastive Decoding: Open-ended Text Generation as Optimization
Contrastive Decoding: Open-ended Text Generation as Optimization
Xiang Lisa Li
Ari Holtzman
Daniel Fried
Percy Liang
Jason Eisner
Tatsunori Hashimoto
Luke Zettlemoyer
M. Lewis
89
357
0
27 Oct 2022
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
Mirac Suzgun
Nathan Scales
Nathanael Scharli
Sebastian Gehrmann
Yi Tay
...
Aakanksha Chowdhery
Quoc V. Le
Ed H. Chi
Denny Zhou
Jason W. Wei
ALM
ELM
LRM
ReLM
241
1,091
0
17 Oct 2022
Dynamic Prompt Learning via Policy Gradient for Semi-structured
  Mathematical Reasoning
Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning
Pan Lu
Liang Qiu
Kai-Wei Chang
Ying Nian Wu
Song-Chun Zhu
Tanmay Rajpurohit
Peter Clark
Ashwin Kalyan
ReLM
LRM
140
289
0
29 Sep 2022
Scaling Laws vs Model Architectures: How does Inductive Bias Influence
  Scaling?
Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?
Yi Tay
Mostafa Dehghani
Samira Abnar
Hyung Won Chung
W. Fedus
J. Rao
Sharan Narang
Vinh Q. Tran
Dani Yogatama
Donald Metzler
AI4CE
89
102
0
21 Jul 2022
Solving Quantitative Reasoning Problems with Language Models
Solving Quantitative Reasoning Problems with Language Models
Aitor Lewkowycz
Anders Andreassen
David Dohan
Ethan Dyer
Henryk Michalewski
...
Theo Gutman-Solo
Yuhuai Wu
Behnam Neyshabur
Guy Gur-Ari
Vedant Misra
ReLM
ELM
LRM
138
827
0
29 Jun 2022
Emergent Abilities of Large Language Models
Emergent Abilities of Large Language Models
Jason W. Wei
Yi Tay
Rishi Bommasani
Colin Raffel
Barret Zoph
...
Tatsunori Hashimoto
Oriol Vinyals
Percy Liang
J. Dean
W. Fedus
ELM
ReLM
LRM
261
2,462
0
15 Jun 2022
Prioritized Training on Points that are Learnable, Worth Learning, and
  Not Yet Learnt
Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt
Sören Mindermann
J. Brauner
Muhammed Razzak
Mrinank Sharma
Andreas Kirsch
...
Benedikt Höltgen
Aidan Gomez
Adrien Morisot
Sebastian Farquhar
Y. Gal
84
160
0
14 Jun 2022
Memorization Without Overfitting: Analyzing the Training Dynamics of
  Large Language Models
Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models
Kushal Tirumala
Aram H. Markosyan
Luke Zettlemoyer
Armen Aghajanyan
TDI
99
191
0
22 May 2022
Scaling Laws and Interpretability of Learning from Repeated Data
Scaling Laws and Interpretability of Learning from Repeated Data
Danny Hernandez
Tom B. Brown
Tom Conerly
Nova Dassarma
Dawn Drain
...
Catherine Olsson
Dario Amodei
Nicholas Joseph
Jared Kaplan
Sam McCandlish
55
114
0
21 May 2022
Training Compute-Optimal Large Language Models
Training Compute-Optimal Large Language Models
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
...
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
AI4TS
180
1,936
0
29 Mar 2022
Token Dropping for Efficient BERT Pretraining
Token Dropping for Efficient BERT Pretraining
Le Hou
Richard Yuanzhe Pang
Dinesh Manocha
Yuexin Wu
Xinying Song
Xiaodan Song
Denny Zhou
68
46
0
24 Mar 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
754
12,835
0
04 Mar 2022
Should You Mask 15% in Masked Language Modeling?
Should You Mask 15% in Masked Language Modeling?
Alexander Wettig
Tianyu Gao
Zexuan Zhong
Danqi Chen
CVBM
53
164
0
16 Feb 2022
Quantifying Memorization Across Neural Language Models
Quantifying Memorization Across Neural Language Models
Nicholas Carlini
Daphne Ippolito
Matthew Jagielski
Katherine Lee
Florian Tramèr
Chiyuan Zhang
PILM
100
614
0
15 Feb 2022
Deduplicating Training Data Mitigates Privacy Risks in Language Models
Deduplicating Training Data Mitigates Privacy Risks in Language Models
Nikhil Kandpal
Eric Wallace
Colin Raffel
PILM
MU
88
286
0
14 Feb 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
738
9,267
0
28 Jan 2022
Grokking: Generalization Beyond Overfitting on Small Algorithmic
  Datasets
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
Alethea Power
Yuri Burda
Harrison Edwards
Igor Babuschkin
Vedant Misra
73
354
0
06 Jan 2022
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLM
OffRL
LRM
225
4,354
0
27 Oct 2021
Challenges in Detoxifying Language Models
Challenges in Detoxifying Language Models
Johannes Welbl
Amelia Glaese
J. Uesato
Sumanth Dathathri
John F. J. Mellor
Lisa Anne Hendricks
Kirsty Anderson
Pushmeet Kohli
Ben Coppin
Po-Sen Huang
LM&MA
289
197
0
15 Sep 2021
The Grammar-Learning Trajectories of Neural Language Models
The Grammar-Learning Trajectories of Neural Language Models
Leshem Choshen
Guy Hacohen
D. Weinshall
Omri Abend
67
28
0
13 Sep 2021
Program Synthesis with Large Language Models
Program Synthesis with Large Language Models
Jacob Austin
Augustus Odena
Maxwell Nye
Maarten Bosma
Henryk Michalewski
...
Ellen Jiang
Carrie J. Cai
Michael Terry
Quoc V. Le
Charles Sutton
ELM
AIMat
ReCod
ALM
165
1,925
0
16 Aug 2021
Deduplicating Training Data Makes Language Models Better
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
348
623
0
14 Jul 2021
A Diverse Corpus for Evaluating and Developing English Math Word Problem
  Solvers
A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers
Shen-Yun Miao
Chao-Chun Liang
Keh-Yih Su
62
338
0
30 Jun 2021
RETRIEVE: Coreset Selection for Efficient and Robust Semi-Supervised
  Learning
RETRIEVE: Coreset Selection for Efficient and Robust Semi-Supervised Learning
Krishnateja Killamsetty
Xujiang Zhao
F. Chen
Rishabh K. Iyer
62
83
0
14 Jun 2021
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean
  Crawled Corpus
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus
Jesse Dodge
Maarten Sap
Ana Marasović
William Agnew
Gabriel Ilharco
Dirk Groeneveld
Margaret Mitchell
Matt Gardner
AILaw
96
443
0
18 Apr 2021
Probing Across Time: What Does RoBERTa Know and When?
Probing Across Time: What Does RoBERTa Know and When?
Leo Z. Liu
Yizhong Wang
Jungo Kasai
Hannaneh Hajishirzi
Noah A. Smith
KELM
74
85
0
16 Apr 2021
Are NLP Models really able to Solve Simple Math Word Problems?
Are NLP Models really able to Solve Simple Math Word Problems?
Arkil Patel
S. Bhattamishra
Navin Goyal
ReLM
LRM
78
825
0
12 Mar 2021
Measuring Mathematical Problem Solving With the MATH Dataset
Measuring Mathematical Problem Solving With the MATH Dataset
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
D. Song
Jacob Steinhardt
ReLM
FaML
147
2,220
0
05 Mar 2021
Scaling Laws for Transfer
Scaling Laws for Transfer
Danny Hernandez
Jared Kaplan
T. Henighan
Sam McCandlish
69
243
0
02 Feb 2021
Measuring Massive Multitask Language Understanding
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
D. Song
Jacob Steinhardt
ELM
RALM
157
4,377
0
07 Sep 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
596
41,736
0
28 May 2020
Train No Evil: Selective Masking for Task-Guided Pre-Training
Train No Evil: Selective Masking for Task-Guided Pre-Training
Yuxian Gu
Zhengyan Zhang
Xiaozhi Wang
Zhiyuan Liu
Maosong Sun
97
59
0
21 Apr 2020
TyDi QA: A Benchmark for Information-Seeking Question Answering in
  Typologically Diverse Languages
TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages
J. Clark
Eunsol Choi
Michael Collins
Dan Garrette
Tom Kwiatkowski
Vitaly Nikolaev
J. Palomaki
130
607
0
10 Mar 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
522
4,773
0
23 Jan 2020
Previous
123
Next