ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.14168
  4. Cited By
Training Verifiers to Solve Math Word Problems

Training Verifiers to Solve Math Word Problems

27 October 2021
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
Lukasz Kaiser
Matthias Plappert
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
    ReLM
    OffRL
    LRM
ArXivPDFHTML

Papers citing "Training Verifiers to Solve Math Word Problems"

50 / 3,033 papers shown
Title
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
Jiasheng Ye
Zaixiang Zheng
Yu Bao
Lihua Qian
Quanquan Gu
DiffM
54
14
0
23 Aug 2023
LatEval: An Interactive LLMs Evaluation Benchmark with Incomplete
  Information from Lateral Thinking Puzzles
LatEval: An Interactive LLMs Evaluation Benchmark with Incomplete Information from Lateral Thinking Puzzles
Shulin Huang
Shirong Ma
Hai-Tao Zheng
Mengzuo Huang
Wuhe Zou
Weidong Zhang
Haitao Zheng
LLMAG
LRM
33
27
0
21 Aug 2023
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring
  Emergent Behaviors
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors
Weize Chen
Yusheng Su
Jingwei Zuo
Cheng Yang
Chenfei Yuan
...
Xin Cong
Ruobing Xie
Zhiyuan Liu
Maosong Sun
Jie Zhou
AI4CE
LLMAG
LM&Ro
38
265
0
21 Aug 2023
Instruction Tuning for Large Language Models: A Survey
Instruction Tuning for Large Language Models: A Survey
Shengyu Zhang
Linfeng Dong
Xiaoya Li
Sen Zhang
Xiaofei Sun
...
Jiwei Li
Runyi Hu
Tianwei Zhang
Fei Wu
Guoyin Wang
LM&MA
24
546
0
21 Aug 2023
Exploring Equation as a Better Intermediate Meaning Representation for
  Numerical Reasoning
Exploring Equation as a Better Intermediate Meaning Representation for Numerical Reasoning
Dingzirui Wang
Longxu Dou
Wenbin Zhang
Junyu Zeng
Wanxiang Che
AIMat
38
6
0
21 Aug 2023
A Human-on-the-Loop Optimization Autoformalism Approach for
  Sustainability
A Human-on-the-Loop Optimization Autoformalism Approach for Sustainability
Ming Jin
Bilgehan Sel
Fnu Hardeep
W. Yin
AI4CE
36
2
0
20 Aug 2023
PACE: Improving Prompt with Actor-Critic Editing for Large Language
  Model
PACE: Improving Prompt with Actor-Critic Editing for Large Language Model
Yihong Dong
Kangcheng Luo
Xue Jiang
Zhi Jin
Ge Li
LRM
KELM
36
9
0
19 Aug 2023
GraphReason: Enhancing Reasoning Capabilities of Large Language Models
  through A Graph-Based Verification Approach
GraphReason: Enhancing Reasoning Capabilities of Large Language Models through A Graph-Based Verification Approach
Lang Cao
LRM
39
11
0
18 Aug 2023
MaScQA: A Question Answering Dataset for Investigating Materials Science
  Knowledge of Large Language Models
MaScQA: A Question Answering Dataset for Investigating Materials Science Knowledge of Large Language Models
Mohd Zaki
J. Jayadeva
Mausam
N. M. A. Krishnan
ELM
27
4
0
17 Aug 2023
CodeCoT: Tackling Code Syntax Errors in CoT Reasoning for Code
  Generation
CodeCoT: Tackling Code Syntax Errors in CoT Reasoning for Code Generation
Dong Huang
Qi Bu
Yuhao Qing
Heming Cui
LRM
32
16
0
17 Aug 2023
Time Travel in LLMs: Tracing Data Contamination in Large Language Models
Time Travel in LLMs: Tracing Data Contamination in Large Language Models
Shahriar Golchin
Mihai Surdeanu
35
93
0
16 Aug 2023
Separate the Wheat from the Chaff: Model Deficiency Unlearning via
  Parameter-Efficient Module Operation
Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation
Xinshuo Hu
Dongfang Li
Baotian Hu
Zihao Zheng
Zhenyu Liu
Hao Fei
KELM
MU
35
26
0
16 Aug 2023
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with
  Code-based Self-Verification
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification
Aojun Zhou
Ke Wang
Zimu Lu
Weikang Shi
Sichun Luo
...
Shaoqing Lu
Anya Jia
Linqi Song
Mingjie Zhan
Hongsheng Li
ReLM
LRM
36
146
0
15 Aug 2023
Through the Lens of Core Competency: Survey on Evaluation of Large
  Language Models
Through the Lens of Core Competency: Survey on Evaluation of Large Language Models
Ziyu Zhuang
Qiguang Chen
Longxuan Ma
Mingda Li
Yi Han
Yushan Qian
Haopeng Bai
Zixian Feng
Weinan Zhang
Ting Liu
ELM
26
9
0
15 Aug 2023
Forward-Backward Reasoning in Large Language Models for Mathematical
  Verification
Forward-Backward Reasoning in Large Language Models for Mathematical Verification
Weisen Jiang
Han Shi
L. Yu
Zheng Liu
Yu Zhang
Zhenguo Li
James T. Kwok
LRM
45
26
0
15 Aug 2023
Better Zero-Shot Reasoning with Role-Play Prompting
Better Zero-Shot Reasoning with Role-Play Prompting
Aobo Kong
Shiwan Zhao
Hao Chen
Qicheng Li
Yong Qin
Ruiqi Sun
Xiaoxia Zhou
Enzhi Wang
Xiaohang Dong
ReLM
LLMAG
LRM
33
149
0
15 Aug 2023
A Survey on Model Compression for Large Language Models
A Survey on Model Compression for Large Language Models
Xunyu Zhu
Jian Li
Yong Liu
Can Ma
Weiping Wang
36
193
0
15 Aug 2023
#InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of
  Large Language Models
#InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of Large Language Models
Keming Lu
Hongyi Yuan
Zheng Yuan
Runji Lin
Junyang Lin
Chuanqi Tan
Chang Zhou
Jingren Zhou
ALM
LRM
35
65
0
14 Aug 2023
Token-Scaled Logit Distillation for Ternary Weight Generative Language
  Models
Token-Scaled Logit Distillation for Ternary Weight Generative Language Models
Minsoo Kim
Sihwa Lee
Jangwhan Lee
S. Hong
Duhyeuk Chang
Wonyong Sung
Jungwook Choi
MQ
24
14
0
13 Aug 2023
TorchQL: A Programming Framework for Integrity Constraints in Machine
  Learning
TorchQL: A Programming Framework for Integrity Constraints in Machine Learning
Aaditya Naik
Adam Stein
Yinjun Wu
Mayur Naik
Eric Wong
35
3
0
13 Aug 2023
Enhancing Network Management Using Code Generated by Large Language
  Models
Enhancing Network Management Using Code Generated by Large Language Models
Sathiya Kumaran Mani
Yajie Zhou
Kevin Hsieh
Santiago Segarra
Ranveer Chandra
Srikanth Kandula
44
22
0
11 Aug 2023
Shepherd: A Critic for Language Model Generation
Shepherd: A Critic for Language Model Generation
Tianlu Wang
Ping Yu
Xiaoqing Ellen Tan
Sean O'Brien
Ramakanth Pasunuru
Jane Dwivedi-Yu
O. Yu. Golovneva
Luke Zettlemoyer
Maryam Fazel-Zarandi
Asli Celikyilmaz
ALM
42
79
0
08 Aug 2023
Gentopia: A Collaborative Platform for Tool-Augmented LLMs
Gentopia: A Collaborative Platform for Tool-Augmented LLMs
Binfeng Xu
Xukun Liu
Hua Shen
Zeyu Han
Yuhan Li
Murong Yue
Zhi-Ping Peng
Yuchen Liu
Ziyu Yao
Dongkuan Xu
LLMAG
30
19
0
08 Aug 2023
TPTU: Large Language Model-based AI Agents for Task Planning and Tool
  Usage
TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage
Jingqing Ruan
Yihong Chen
Bin Zhang
Zhiwei Xu
Tianpeng Bao
...
Shiwei Shi
Hangyu Mao
Ziyue Li
Xingyu Zeng
Rui Zhao
LLMAG
LM&Ro
44
32
0
07 Aug 2023
Automatically Correcting Large Language Models: Surveying the landscape
  of diverse self-correction strategies
Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies
Liangming Pan
Michael Stephen Saxon
Wenda Xu
Deepak Nathani
Xinyi Wang
William Yang Wang
KELM
LRM
47
201
0
06 Aug 2023
Scaling Relationship on Learning Mathematical Reasoning with Large
  Language Models
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models
Zheng Yuan
Hongyi Yuan
Cheng Li
Guanting Dong
Keming Lu
Chuanqi Tan
Chang Zhou
Jingren Zhou
LRM
ALM
33
167
0
03 Aug 2023
Tool Documentation Enables Zero-Shot Tool-Usage with Large Language
  Models
Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models
Cheng-Yu Hsieh
Sibei Chen
Chun-Liang Li
Yasuhisa Fujii
Alexander Ratner
Chen-Yu Lee
Ranjay Krishna
Tomas Pfister
LLMAG
SyDa
46
41
0
01 Aug 2023
SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step
  Reasoning
SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning
Ning Miao
Yee Whye Teh
Tom Rainforth
ReLM
LRM
25
112
0
01 Aug 2023
Skills-in-Context Prompting: Unlocking Compositionality in Large
  Language Models
Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models
Jiaao Chen
Xiaoman Pan
Dian Yu
Kaiqiang Song
Xiaoyang Wang
Dong Yu
Jianshu Chen
ReLM
LRM
21
24
0
01 Aug 2023
Instructed to Bias: Instruction-Tuned Language Models Exhibit Emergent
  Cognitive Bias
Instructed to Bias: Instruction-Tuned Language Models Exhibit Emergent Cognitive Bias
Itay Itzhak
Gabriel Stanovsky
Nir Rosenfeld
Yonatan Belinkov
29
20
0
01 Aug 2023
Backdooring Instruction-Tuned Large Language Models with Virtual Prompt
  Injection
Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection
Jun Yan
Vikas Yadav
Shiyang Li
Lichang Chen
Zheng Tang
Hai Wang
Vijay Srinivasan
Xiang Ren
Hongxia Jin
SILM
28
82
0
31 Jul 2023
Scaling Sentence Embeddings with Large Language Models
Scaling Sentence Embeddings with Large Language Models
Ting Jiang
Shaohan Huang
Zhongzhi Luan
Deqing Wang
Fuzhen Zhuang
LRM
44
40
0
31 Jul 2023
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic
  Control
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
Anthony Brohan
Noah Brown
Justice Carbajal
Yevgen Chebotar
Xi Chen
...
Ted Xiao
Peng Xu
Sichun Xu
Tianhe Yu
Brianna Zitkovich
LM&Ro
LRM
30
1,110
0
28 Jul 2023
Three Bricks to Consolidate Watermarks for Large Language Models
Three Bricks to Consolidate Watermarks for Large Language Models
Pierre Fernandez
Antoine Chaffin
Karim Tit
Vivien Chappelier
Teddy Furon
WaLM
21
47
0
26 Jul 2023
ARB: Advanced Reasoning Benchmark for Large Language Models
ARB: Advanced Reasoning Benchmark for Large Language Models
Tomohiro Sawada
Daniel Paleka
Alexander Havrilla
Pranav Tadepalli
Paula Vidas
Alexander Kranias
John J. Nay
Kshitij Gupta
Aran Komatsuzaki
ELM
LRM
45
37
0
25 Jul 2023
FacTool: Factuality Detection in Generative AI -- A Tool Augmented
  Framework for Multi-Task and Multi-Domain Scenarios
FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios
Ethan Chern
Steffi Chern
Shiqi Chen
Weizhe Yuan
Kehua Feng
Chunting Zhou
Junxian He
Graham Neubig
Pengfei Liu
HILM
27
193
0
25 Jul 2023
Analyzing Chain-of-Thought Prompting in Large Language Models via
  Gradient-based Feature Attributions
Analyzing Chain-of-Thought Prompting in Large Language Models via Gradient-based Feature Attributions
Skyler Wu
Eric Meng Shen
Charumathi Badrinath
Jiaqi Ma
Himabindu Lakkaraju
LRM
38
26
0
25 Jul 2023
A Real-World WebAgent with Planning, Long Context Understanding, and
  Program Synthesis
A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis
Izzeddin Gur
Hiroki Furuta
Austin Huang
Mustafa Safdari
Yutaka Matsuo
Douglas Eck
Aleksandra Faust
LM&Ro
LLMAG
39
201
0
24 Jul 2023
CohortGPT: An Enhanced GPT for Participant Recruitment in Clinical Study
CohortGPT: An Enhanced GPT for Participant Recruitment in Clinical Study
Zihan Guan
Zihao Wu
Zheng Liu
Dufan Wu
Hui Ren
Quanzheng Li
Xiang Li
Ninghao Liu
LM&MA
33
25
0
21 Jul 2023
L-Eval: Instituting Standardized Evaluation for Long Context Language
  Models
L-Eval: Instituting Standardized Evaluation for Long Context Language Models
Chen An
Shansan Gong
Ming Zhong
Xingjian Zhao
Mukai Li
Jun Zhang
Lingpeng Kong
Xipeng Qiu
ELM
ALM
40
132
0
20 Jul 2023
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill
  Sets
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
Seonghyeon Ye
Doyoung Kim
Sungdong Kim
Hyeonbin Hwang
Seungone Kim
Yongrae Jo
James Thorne
Juho Kim
Minjoon Seo
ALM
46
99
0
20 Jul 2023
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities
  of Large Language Models
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models
Xiaoxuan Wang
Ziniu Hu
Pan Lu
Yanqiao Zhu
Jieyu Zhang
Satyen Subramaniam
Arjun R. Loomba
Shichang Zhang
Yizhou Sun
Wei Wang
ELM
LRM
30
86
0
20 Jul 2023
Multi-Method Self-Training: Improving Code Generation With Text, And
  Vice Versa
Multi-Method Self-Training: Improving Code Generation With Text, And Vice Versa
Shriyash Upadhyay
Etan Ginsberg
SyDa
LRM
26
0
0
20 Jul 2023
Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in
  Language Model Prompting
Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting
Rylan Schaeffer
Kateryna Pistunova
Samarth Khanna
Sarthak Consul
Oluwasanmi Koyejo
ReLM
LRM
39
10
0
20 Jul 2023
Instruction-following Evaluation through Verbalizer Manipulation
Instruction-following Evaluation through Verbalizer Manipulation
Shiyang Li
Jun Yan
Hai Wang
Zheng Tang
Xiang Ren
Vijay Srinivasan
Hongxia Jin
36
25
0
20 Jul 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
126
11,099
0
18 Jul 2023
REX: Rapid Exploration and eXploitation for AI Agents
REX: Rapid Exploration and eXploitation for AI Agents
Rithesh Murthy
Shelby Heinecke
Juan Carlos Niebles
Zhiwei Liu
Le Xue
...
Ran Xu
P. Mùi
Haiquan Wang
Caiming Xiong
Silvio Savarese
OffRL
31
8
0
18 Jul 2023
GEAR: Augmenting Language Models with Generalizable and Efficient Tool
  Resolution
GEAR: Augmenting Language Models with Generalizable and Efficient Tool Resolution
Yining Lu
Haoping Yu
Daniel Khashabi
LLMAG
39
9
0
17 Jul 2023
A mixed policy to improve performance of language models on math
  problems
A mixed policy to improve performance of language models on math problems
Gang Chen
ReLM
MoE
LRM
22
0
0
17 Jul 2023
Do Emergent Abilities Exist in Quantized Large Language Models: An
  Empirical Study
Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study
Peiyu Liu
Zikang Liu
Ze-Feng Gao
Dawei Gao
Wayne Xin Zhao
Yaliang Li
Bolin Ding
Ji-Rong Wen
MQ
LRM
35
33
0
16 Jul 2023
Previous
123...535455...596061
Next