ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.14168
  4. Cited By
Training Verifiers to Solve Math Word Problems
v1v2 (latest)

Training Verifiers to Solve Math Word Problems

27 October 2021
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
Lukasz Kaiser
Matthias Plappert
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
    ReLMOffRLLRM
ArXiv (abs)PDFHTML

Papers citing "Training Verifiers to Solve Math Word Problems"

50 / 490 papers shown
Title
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training
Youliang Yuan
Wenxiang Jiao
Wenxuan Wang
Jen-tse Huang
Jiahao Xu
Tian Liang
Pinjia He
Zhaopeng Tu
91
30
0
12 Jul 2024
Benchmarking Language Model Creativity: A Case Study on Code Generation
Benchmarking Language Model Creativity: A Case Study on Code Generation
Yining Lu
Dixuan Wang
Tianjian Li
Dongwei Jiang
Daniel Khashabi
Meng Jiang
Daniel Khashabi
LRM
97
15
0
12 Jul 2024
Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing
Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing
Huanqian Wang
Yang Yue
Rui Lu
Jingxin Shi
Andrew Zhao
Shenzhi Wang
Shiji Song
Gao Huang
LM&RoKELM
114
0
0
11 Jul 2024
Training on the Test Task Confounds Evaluation and Emergence
Training on the Test Task Confounds Evaluation and Emergence
Ricardo Dominguez-Olmedo
Florian E. Dorner
Moritz Hardt
ELM
131
9
1
10 Jul 2024
Prompting Techniques for Secure Code Generation: A Systematic Investigation
Prompting Techniques for Secure Code Generation: A Systematic Investigation
Catherine Tony
Nicolás E. Díaz Ferreyra
Markus Mutas
Salem Dhiff
Riccardo Scandariato
SILM
131
12
0
09 Jul 2024
On Speeding Up Language Model Evaluation
On Speeding Up Language Model Evaluation
Jin Peng Zhou
Christian K. Belardi
Ruihan Wu
Travis Zhang
Carla P. Gomes
Wen Sun
Kilian Q. Weinberger
114
2
0
08 Jul 2024
RegMix: Data Mixture as Regression for Language Model Pre-training
RegMix: Data Mixture as Regression for Language Model Pre-training
Qian Liu
Xiaosen Zheng
Niklas Muennighoff
Guangtao Zeng
Longxu Dou
Tianyu Pang
Jing Jiang
Min Lin
MoE
156
52
1
01 Jul 2024
Large Language Models Are Involuntary Truth-Tellers: Exploiting Fallacy Failure for Jailbreak Attacks
Large Language Models Are Involuntary Truth-Tellers: Exploiting Fallacy Failure for Jailbreak Attacks
Yue Zhou
Henry Peng Zou
Barbara Di Eugenio
Yang Zhang
LRMHILM
80
6
0
01 Jul 2024
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
Yuheng Zhang
Dian Yu
Baolin Peng
Linfeng Song
Ye Tian
Mingyue Huo
Nan Jiang
Haitao Mi
Dong Yu
194
18
0
30 Jun 2024
ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting
ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting
Boyao Wang
Dylan Zhang
Hanning Zhang
Xingyuan Pan
Minrui Xu
Jipeng Zhang
Renjie Pi
Xiaoyu Wang
Tong Zhang
106
10
0
28 Jun 2024
RouteLLM: Learning to Route LLMs with Preference Data
RouteLLM: Learning to Route LLMs with Preference Data
Isaac Ong
Amjad Almahairi
Vincent Wu
Wei-Lin Chiang
Tianhao Wu
Joseph E. Gonzalez
M. W. Kadous
Ion Stoica
119
102
0
26 Jun 2024
OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure
OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure
Jikai Wang
Yi Su
Juntao Li
Qingrong Xia
Zi Ye
Xinyu Duan
Zhefeng Wang
Min Zhang
106
18
0
25 Jun 2024
M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models
M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models
Rishabh Maheshwary
Vikas Yadav
Hoang Nguyen
Khyati Mahajan
Sathwik Tejaswi Madhusudhan
97
3
0
24 Jun 2024
PORT: Preference Optimization on Reasoning Traces
PORT: Preference Optimization on Reasoning Traces
Salem Lahlou
Abdalgader Abubaker
Hakim Hacid
LRM
94
5
0
23 Jun 2024
AudioBench: A Universal Benchmark for Audio Large Language Models
AudioBench: A Universal Benchmark for Audio Large Language Models
Bin Wang
Xunlong Zou
Geyu Lin
Siyang Song
Zhuohan Liu
Wenyu Zhang
Zhengyuan Liu
AiTi Aw
Nancy F. Chen
AuLLMELMLM&MA
143
35
0
23 Jun 2024
Teaching LLMs to Abstain across Languages via Multilingual Feedback
Teaching LLMs to Abstain across Languages via Multilingual Feedback
Shangbin Feng
Weijia Shi
Yike Wang
Wenxuan Ding
Orevaoghene Ahia
Shuyue Stella Li
Vidhisha Balachandran
Sunayana Sitaram
Yulia Tsvetkov
128
7
0
22 Jun 2024
Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph
Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph
Roman Vashurin
Ekaterina Fadeeva
Artem Vazhentsev
Akim Tsvigun
Daniil Vasilev
...
Timothy Baldwin
Timothy Baldwin
Maxim Panov
Artem Shelmanov
Artem Shelmanov
HILM
130
28
0
21 Jun 2024
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Aman Singh Thakur
Kartik Choudhary
Venkat Srinik Ramayapally
Sankaran Vaidyanathan
Dieuwke Hupkes
ELMALM
139
65
0
18 Jun 2024
Exploring Safety-Utility Trade-Offs in Personalized Language Models
Exploring Safety-Utility Trade-Offs in Personalized Language Models
Anvesh Rao Vijjini
Somnath Basu Roy Chowdhury
Snigdha Chaturvedi
133
9
0
17 Jun 2024
Do Not Design, Learn: A Trainable Scoring Function for Uncertainty Estimation in Generative LLMs
Do Not Design, Learn: A Trainable Scoring Function for Uncertainty Estimation in Generative LLMs
D. Yaldiz
Yavuz Faruk Bakman
Baturalp Buyukates
Chenyang Tao
Anil Ramakrishna
Dimitrios Dimitriadis
Jieyu Zhao
Salman Avestimehr
111
8
0
17 Jun 2024
REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
Tomer Ashuach
Martin Tutek
Yonatan Belinkov
MUKELM
106
7
0
13 Jun 2024
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
Xiaoshuai Song
Muxi Diao
Guanting Dong
Zhengyang Wang
Yujia Fu
...
Yejie Wang
Zhuoma Gongque
Jianing Yu
Qiuna Tan
Weiran Xu
ELM
119
14
0
12 Jun 2024
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Liliang Ren
Yang Liu
Yadong Lu
Yelong Shen
Chen Liang
Weizhu Chen
Mamba
154
69
0
11 Jun 2024
Chain-of-Scrutiny: Detecting Backdoor Attacks for Large Language Models
Chain-of-Scrutiny: Detecting Backdoor Attacks for Large Language Models
Xi Li
Ruofan Mao
Yusen Zhang
Renze Lou
Chen Wu
Jiaqi Wang
LRMAAML
72
14
0
10 Jun 2024
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Seungone Kim
Juyoung Suk
Ji Yong Cho
Shayne Longpre
Chaeeun Kim
...
Sean Welleck
Graham Neubig
Moontae Lee
Kyungjae Lee
Minjoon Seo
ELMALMLM&MA
175
44
0
09 Jun 2024
Is On-Device AI Broken and Exploitable? Assessing the Trust and Ethics in Small Language Models
Is On-Device AI Broken and Exploitable? Assessing the Trust and Ethics in Small Language Models
Kalyan Nakka
Jimmy Dani
Nitesh Saxena
131
1
0
08 Jun 2024
CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning
CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning
Yibo Yang
Xiaojie Li
Zhongzhu Zhou
Shuaiwen Leon Song
Jianlong Wu
Liqiang Nie
Guohao Li
77
14
0
07 Jun 2024
LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMs
LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMs
Arash Gholami Davoodi
Seyed Pouyan Mousavi Davoudi
Pouya Pezeshkpour
ELMLRM
86
4
0
07 Jun 2024
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
Marianna Nezhurina
Lucia Cipolina-Kun
Mehdi Cherti
J. Jitsev
LLMAGLRMELMReLM
130
35
0
04 Jun 2024
Towards Hierarchical Multi-Agent Workflows for Zero-Shot Prompt Optimization
Towards Hierarchical Multi-Agent Workflows for Zero-Shot Prompt Optimization
Yuchi Liu
Jaskirat Singh
Gaowen Liu
Ali Payani
Liang Zheng
LLMAG
99
6
0
30 May 2024
Learning diverse attacks on large language models for robust red-teaming and safety tuning
Learning diverse attacks on large language models for robust red-teaming and safety tuning
Seanie Lee
Minsu Kim
Lynn Cherif
David Dobre
Juho Lee
...
Kenji Kawaguchi
Gauthier Gidel
Yoshua Bengio
Nikolay Malkin
Moksh Jain
AAML
128
20
0
28 May 2024
ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation
ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation
Houxing Ren
Mingjie Zhan
Zhongyuan Wu
Aojun Zhou
Junting Pan
Hongsheng Li
SyDa
98
7
0
27 May 2024
Perturbation-Restrained Sequential Model Editing
Perturbation-Restrained Sequential Model Editing
Junjie Ma
Hong Wang
Haoyang Xu
Zhen-Hua Ling
Jia-Chen Gu
KELM
125
11
0
27 May 2024
Sparse Matrix in Large Language Model Fine-tuning
Sparse Matrix in Large Language Model Fine-tuning
Haoze He
Juncheng Billy Li
Xuan Jiang
Heather Miller
MoE
80
5
0
24 May 2024
Can LLMs Solve longer Math Word Problems Better?
Can LLMs Solve longer Math Word Problems Better?
Xin Xu
Tong Xiao
Zitong Chao
Zhenya Huang
Can Yang
Yang Wang
113
14
0
23 May 2024
OAC: Output-adaptive Calibration for Accurate Post-training Quantization
OAC: Output-adaptive Calibration for Accurate Post-training Quantization
Ali Edalati
Alireza Ghaffari
M. Asgharian
Lu Hou
Boxing Chen
Vahid Partovi Nia
V. Nia
MQ
126
0
0
23 May 2024
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Chameleon Team
MLLM
191
333
0
16 May 2024
LLMs can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-Thought
LLMs can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-Thought
Zhuoxuan Jiang
Haoyuan Peng
Shanshan Feng
Fan Li
Dongsheng Li
KELMLRM
77
16
0
09 May 2024
Chain of Thoughtlessness? An Analysis of CoT in Planning
Chain of Thoughtlessness? An Analysis of CoT in Planning
Kaya Stechly
Karthik Valmeekam
Subbarao Kambhampati
LRMLM&Ro
145
50
0
08 May 2024
Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems
Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems
Qihuang Zhong
Kang Wang
Ziyang Xu
Juhua Liu
Liang Ding
Bo Du
LRMAIMat
118
4
0
23 Apr 2024
Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks
Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks
Amir Saeidi
Shivanshu Verma
Chitta Baral
Chitta Baral
ALM
87
26
0
23 Apr 2024
AdvisorQA: Towards Helpful and Harmless Advice-seeking Question Answering with Collective Intelligence
AdvisorQA: Towards Helpful and Harmless Advice-seeking Question Answering with Collective Intelligence
Minbeom Kim
Hwanhee Lee
Joonsuk Park
Hwaran Lee
Kyomin Jung
94
3
0
18 Apr 2024
Learn Your Reference Model for Real Good Alignment
Learn Your Reference Model for Real Good Alignment
Alexey Gorbatovski
Boris Shaposhnikov
Alexey Malakhov
Nikita Surnachev
Yaroslav Aksenov
Ian Maksimov
Nikita Balagansky
Daniil Gavrilov
OffRL
106
35
0
15 Apr 2024
MathVC: An LLM-Simulated Multi-Character Virtual Classroom for Mathematics Education
MathVC: An LLM-Simulated Multi-Character Virtual Classroom for Mathematics Education
Murong Yue
Wijdane Mifdal
Yixuan Zhang
Jennifer Suh
Yixuan Zhang
Ziyu Yao
LLMAG
135
21
0
10 Apr 2024
Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition
Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition
Kehua Feng
Keyan Ding
Hongzhi Tan
Kede Ma
Zhihua Wang
...
Yuzhou Cheng
Ge Sun
Guozhou Zheng
Qiang Zhang
H. Chen
83
12
0
10 Apr 2024
Evaluating Mathematical Reasoning Beyond Accuracy
Evaluating Mathematical Reasoning Beyond Accuracy
Shijie Xia
Xuefeng Li
Yixin Liu
Tongshuang Wu
Pengfei Liu
LRMReLM
126
28
0
08 Apr 2024
Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data
Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data
Jingyu Zhang
Marc Marone
Tianjian Li
Benjamin Van Durme
Daniel Khashabi
145
9
0
05 Apr 2024
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models
Fanxu Meng
Zhaohui Wang
Muhan Zhang
VLM
130
103
0
03 Apr 2024
The Unreasonable Ineffectiveness of the Deeper Layers
The Unreasonable Ineffectiveness of the Deeper Layers
Andrey Gromov
Kushal Tirumala
Hassan Shapourian
Paolo Glorioso
Daniel A. Roberts
126
103
0
26 Mar 2024
Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art
Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art
Neeloy Chakraborty
Melkior Ornik
Katherine Driggs-Campbell
LRM
197
12
0
25 Mar 2024
Previous
123...1089
Next