ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.11610
  4. Cited By
Large Language Models Can Self-Improve

Large Language Models Can Self-Improve

20 October 2022
Jiaxin Huang
S. Gu
Le Hou
Yuexin Wu
Xuezhi Wang
Hongkun Yu
Jiawei Han
    ReLM
    AI4MH
    LRM
ArXivPDFHTML

Papers citing "Large Language Models Can Self-Improve"

50 / 410 papers shown
Title
ToolACE-DEV: Self-Improving Tool Learning via Decomposition and EVolution
ToolACE-DEV: Self-Improving Tool Learning via Decomposition and EVolution
X. Huang
Weiwen Liu
Xingshan Zeng
Y. Huang
Xinlong Hao
...
Yirong Zeng
Chuhan Wu
Y. Wang
R. Tang
Defu Lian
KELM
31
0
0
12 May 2025
REFINE-AF: A Task-Agnostic Framework to Align Language Models via Self-Generated Instructions using Reinforcement Learning from Automated Feedback
REFINE-AF: A Task-Agnostic Framework to Align Language Models via Self-Generated Instructions using Reinforcement Learning from Automated Feedback
Aniruddha Roy
Pretam Ray
Abhilash Nandy
Somak Aditya
Pawan Goyal
ALM
29
0
0
10 May 2025
Reasoning Models Don't Always Say What They Think
Reasoning Models Don't Always Say What They Think
Yanda Chen
Joe Benton
Ansh Radhakrishnan
Jonathan Uesato
Carson E. Denison
...
Vlad Mikulik
Samuel R. Bowman
Jan Leike
Jared Kaplan
E. Perez
ReLM
LRM
67
12
1
08 May 2025
X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains
X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains
Qianchu Liu
Sheng Zhang
Guanghui Qin
Timothy Ossowski
Yu Gu
...
Sam Preston
Mu-Hsin Wei
Paul Vozila
Tristan Naumann
Hoifung Poon
OOD
LRM
VLM
54
0
0
06 May 2025
On the generalization of language models from in-context learning and finetuning: a controlled study
On the generalization of language models from in-context learning and finetuning: a controlled study
Andrew Kyle Lampinen
Arslan Chaudhry
Stephanie Chan
Cody Wild
Diane Wan
Alex Ku
Jorg Bornschein
Razvan Pascanu
Murray Shanahan
James L. McClelland
46
0
0
01 May 2025
Learning to Plan Before Answering: Self-Teaching LLMs to Learn Abstract Plans for Problem Solving
Learning to Plan Before Answering: Self-Teaching LLMs to Learn Abstract Plans for Problem Solving
J. Zhang
Flood Sung
Z. Yang
Yang Gao
Chongjie Zhang
LLMAG
40
0
0
28 Apr 2025
Automatically Generating Rules of Malicious Software Packages via Large Language Model
Automatically Generating Rules of Malicious Software Packages via Large Language Model
XiangRui Zhang
HaoYu Chen
YongZhong He
Wenjia Niu
Qiang Li
35
0
0
24 Apr 2025
a1: Steep Test-time Scaling Law via Environment Augmented Generation
a1: Steep Test-time Scaling Law via Environment Augmented Generation
Lingrui Mei
Shenghua Liu
Yiwei Wang
Baolong Bi
Yuyao Ge
Jun Wan
Yurong Wu
Xueqi Cheng
LRM
27
0
0
20 Apr 2025
Trans-Zero: Self-Play Incentivizes Large Language Models for Multilingual Translation Without Parallel Data
Trans-Zero: Self-Play Incentivizes Large Language Models for Multilingual Translation Without Parallel Data
Wei Zou
Sen Yang
Yu Bao
Shujian Huang
Jiajun Chen
Shanbo Cheng
SyDa
29
0
0
20 Apr 2025
A Data-Centric Approach for Safe and Secure Large Language Models against Threatening and Toxic Content
A Data-Centric Approach for Safe and Secure Large Language Models against Threatening and Toxic Content
Chaima Njeh
Haïfa Nakouri
Fehmi Jaafar
22
0
0
19 Apr 2025
Direct Advantage Regression: Aligning LLMs with Online AI Reward
Direct Advantage Regression: Aligning LLMs with Online AI Reward
Li He
He Zhao
Stephen Wan
Dadong Wang
Lina Yao
Tongliang Liu
27
0
0
19 Apr 2025
Prejudge-Before-Think: Enhancing Large Language Models at Test-Time by Process Prejudge Reasoning
Prejudge-Before-Think: Enhancing Large Language Models at Test-Time by Process Prejudge Reasoning
J. T. Wang
Jin Jiang
Yang Liu
M. Zhang
Xunliang Cai
LRM
34
0
0
18 Apr 2025
Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning
Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning
Can Jin
Hongwu Peng
Qixin Zhang
Yujin Tang
Dimitris N. Metaxas
Tong Che
LLMAG
LRM
143
2
0
14 Apr 2025
SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills
SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills
Boyuan Zheng
Michael Y. Fatemi
Xiaolong Jin
Z. Wang
Apurva Gandhi
...
Yu Gu
Jayanth Srinivasa
Gaowen Liu
Graham Neubig
Yu Su
CLL
36
0
0
09 Apr 2025
Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization
Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization
Qingyang Zhang
Haitao Wu
Changqing Zhang
Peilin Zhao
Yatao Bian
ReLM
LRM
79
3
0
08 Apr 2025
Towards Visual Text Grounding of Multimodal Large Language Model
Towards Visual Text Grounding of Multimodal Large Language Model
Ming Li
Ruiyi Zhang
Jian Chen
Jiuxiang Gu
Yufan Zhou
Franck Dernoncourt
Wanrong Zhu
Tianyi Zhou
Tong Sun
41
2
0
07 Apr 2025
Entropy-Based Adaptive Weighting for Self-Training
Entropy-Based Adaptive Weighting for Self-Training
Xiaoxuan Wang
Yihe Deng
Mingyu Derek Ma
Wei Wang
LRM
47
0
0
31 Mar 2025
TwT: Thinking without Tokens by Habitual Reasoning Distillation with Multi-Teachers' Guidance
TwT: Thinking without Tokens by Habitual Reasoning Distillation with Multi-Teachers' Guidance
Jingxian Xu
Mengyu Zhou
W. Liu
Hanbing Liu
Shi Han
Dongmei Zhang
LRM
45
1
0
31 Mar 2025
CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation
CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation
Jixuan Leng
Chengsong Huang
Langlin Huang
Bill Yuchen Lin
William W. Cohen
Haohan Wang
Jiaxin Huang
LRM
42
0
0
30 Mar 2025
Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs
Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs
Sanjoy Chowdhury
Hanan Gani
Nishit Anand
Sayan Nag
Ruohan Gao
Mohamed Elhoseiny
Salman Khan
Dinesh Manocha
LRM
52
0
0
29 Mar 2025
CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis
CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis
Anjiang Wei
Tarun Suresh
Jiannan Cao
Naveen Kannan
Yuheng Wu
Kai Yan
Thiago S. F. X. Teixeira
Ke Wang
Alex Aiken
ELM
LRM
41
0
0
29 Mar 2025
Evidencing Unauthorized Training Data from AI Generated Content using Information Isotopes
Evidencing Unauthorized Training Data from AI Generated Content using Information Isotopes
Qi Tao
Yin Jinhua
Cai Dongqi
Xie Yueqi
Wang Huili
...
Zhou Zhili
Wang Shangguang
Lyu Lingjuan
Huang Yongfeng
Lane Nicholas
40
0
0
24 Mar 2025
AgentRxiv: Towards Collaborative Autonomous Research
AgentRxiv: Towards Collaborative Autonomous Research
Samuel Schmidgall
Michael Moor
61
3
0
23 Mar 2025
LEMMA: Learning from Errors for MatheMatical Advancement in LLMs
LEMMA: Learning from Errors for MatheMatical Advancement in LLMs
Zhuoshi Pan
Yu-Hu Li
Honglin Lin
Qizhi Pei
Zinan Tang
Wei Yu Wu
Chenlin Ming
H. V. Zhao
Conghui He
Lijun Wu
LRM
59
0
0
21 Mar 2025
The Lighthouse of Language: Enhancing LLM Agents via Critique-Guided Improvement
The Lighthouse of Language: Enhancing LLM Agents via Critique-Guided Improvement
Ruihan Yang
Fanghua Ye
Jian Li
Siyu Yuan
Yikai Zhang
Zhaopeng Tu
Xiaolong Li
Deqing Yang
LLMAG
68
3
0
20 Mar 2025
Corrective In-Context Learning: Evaluating Self-Correction in Large Language Models
Corrective In-Context Learning: Evaluating Self-Correction in Large Language Models
Mario Sanz-Guerrero
Katharina von der Wense
LRM
43
0
0
20 Mar 2025
Efficient Federated Fine-Tuning of Large Language Models with Layer Dropout
Shilong Wang
Jianchun Liu
Hongli Xu
Jiaming Yan
Xianjun Gao
53
1
0
13 Mar 2025
SCE: Scalable Consistency Ensembles Make Blackbox Large Language Model Generation More Reliable
Jiaxin Zhang
Z. Li
Wendi Cui
Kamalika Das
Bradley Malin
Sricharan Kumar
41
0
0
13 Mar 2025
Dr Genre: Reinforcement Learning from Decoupled LLM Feedback for Generic Text Rewriting
Yufei Li
John Nham
Ganesh Jawahar
Lei Shu
David C. Uthus
Yun-hsuan Sung
Chengrun Yang
Itai Rolnick
Yi Qiao
Cong Liu
OffRL
60
0
0
09 Mar 2025
AppAgentX: Evolving GUI Agents as Proficient Smartphone Users
AppAgentX: Evolving GUI Agents as Proficient Smartphone Users
Wenjia Jiang
Yangyang Zhuang
Chenxi Song
Xu Yang
Chi Zhang
Chi Zhang
LLMAG
89
1
0
04 Mar 2025
Language Models can Self-Improve at State-Value Estimation for Better Search
Ethan Mendes
Alan Ritter
LRM
60
3
0
04 Mar 2025
An Efficient and Precise Training Data Construction Framework for Process-supervised Reward Model in Mathematical Reasoning
Wei Sun
Qianlong Du
Fuwei Cui
Jiajun Zhang
OffRL
LRM
37
0
0
04 Mar 2025
AskToAct: Enhancing LLMs Tool Use via Self-Correcting Clarification
Xuan Zhang
Yongliang Shen
Zhe Zheng
Linjuan Wu
Wenqi Zhang
Yuchen Yan
Qiuying Peng
J. Wang
Weiming Lu
KELM
77
1
0
03 Mar 2025
Generate, Discriminate, Evolve: Enhancing Context Faithfulness via Fine-Grained Sentence-Level Self-Evolution
K. Li
Tianhua Zhang
Yunxiang Li
Hongyin Luo
Abdalla Moustafa
Xixin Wu
James Glass
H. Meng
61
0
0
03 Mar 2025
A Survey of Uncertainty Estimation Methods on Large Language Models
Zhiqiu Xia
Jinxuan Xu
Yuqian Zhang
Hang Liu
38
1
0
28 Feb 2025
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers
Shalev Lifshitz
Sheila A. McIlraith
Yilun Du
LRM
46
5
0
27 Feb 2025
A Theoretical Perspective: How to Prevent Model Collapse in Self-consuming Training Loops
A Theoretical Perspective: How to Prevent Model Collapse in Self-consuming Training Loops
Shi Fu
Yingjie Wang
Yuzhu Chen
Xinmei Tian
Dacheng Tao
53
1
0
26 Feb 2025
AgentRM: Enhancing Agent Generalization with Reward Modeling
AgentRM: Enhancing Agent Generalization with Reward Modeling
Yu Xia
Jingru Fan
Weize Chen
Siyu Yan
Xin Cong
Zhong Zhang
Y. Lu
Yankai Lin
Zhiyuan Liu
Maosong Sun
51
1
0
25 Feb 2025
Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling
Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling
Yiwen Ding
Zhiheng Xi
Wei He
Zhuoyuan Li
Yitao Zhai
Xiaowei Shi
Xunliang Cai
Tao Gui
Qi Zhang
Xuanjing Huang
LRM
69
3
0
24 Feb 2025
Wrong Answers Can Also Be Useful: PlausibleQA -- A Large-Scale QA Dataset with Answer Plausibility Scores
Wrong Answers Can Also Be Useful: PlausibleQA -- A Large-Scale QA Dataset with Answer Plausibility Scores
Jamshid Mozafari
Abdelrahman Abdallah
Bhawna Piryani
Adam Jatowt
42
0
0
22 Feb 2025
IPO: Your Language Model is Secretly a Preference Classifier
IPO: Your Language Model is Secretly a Preference Classifier
Shivank Garg
Ayush Singh
Shweta Singh
Paras Chopra
128
1
0
22 Feb 2025
CLIPPER: Compression enables long-context synthetic data generation
CLIPPER: Compression enables long-context synthetic data generation
Chau Minh Pham
Yapei Chang
Mohit Iyyer
SyDa
77
1
0
21 Feb 2025
GiFT: Gibbs Fine-Tuning for Code Generation
GiFT: Gibbs Fine-Tuning for Code Generation
Haochen Li
Wanjin Feng
Xin Zhou
Zhiqi Shen
SyDa
75
1
0
17 Feb 2025
Preference Optimization for Reasoning with Pseudo Feedback
Preference Optimization for Reasoning with Pseudo Feedback
Fangkai Jiao
Geyang Guo
Xingxing Zhang
Nancy F. Chen
Shafiq R. Joty
Furu Wei
LRM
99
9
0
17 Feb 2025
Self-Consistency of the Internal Reward Models Improves Self-Rewarding Language Models
Self-Consistency of the Internal Reward Models Improves Self-Rewarding Language Models
Xin Zhou
Yiwen Guo
Ruotian Ma
Tao Gui
Qi Zhang
Xuanjing Huang
LRM
84
2
0
13 Feb 2025
RefineCoder: Iterative Improving of Large Language Models via Adaptive Critique Refinement for Code Generation
RefineCoder: Iterative Improving of Large Language Models via Adaptive Critique Refinement for Code Generation
C. Zhou
Xinyu Zhang
Dandan Song
Xiancai Chen
Wanli Gu
Huipeng Ma
Yuhang Tian
M. Zhang
Linmei Hu
63
1
0
13 Feb 2025
Escaping Collapse: The Strength of Weak Data for Large Language Model Training
Escaping Collapse: The Strength of Weak Data for Large Language Model Training
Kareem Amin
Sara Babakniya
Alex Bie
Weiwei Kong
Umar Syed
Sergei Vassilvitskii
70
1
0
13 Feb 2025
Rationalization Models for Text-to-SQL
Rationalization Models for Text-to-SQL
Gaetano Rossiello
Nhan Pham
Michael R. Glass
Junkyu Lee
Shankar Subramanian
ReLM
LRM
50
0
0
10 Feb 2025
Few-shot LLM Synthetic Data with Distribution Matching
Few-shot LLM Synthetic Data with Distribution Matching
Jiyuan Ren
Zhaocheng Du
Zhihao Wen
Qinglin Jia
Sunhao Dai
Chuhan Wu
Zhenhua Dong
SyDa
77
0
0
09 Feb 2025
OntoTune: Ontology-Driven Self-training for Aligning Large Language Models
OntoTune: Ontology-Driven Self-training for Aligning Large Language Models
Zhiqiang Liu
Chengtao Gan
Junjie Wang
Y. Zhang
Zhongpu Bo
Mengshu Sun
H. Chen
Wen Zhang
65
0
0
08 Feb 2025
123456789
Next