ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.16927
  4. Cited By
Latent Principle Discovery for Language Model Self-Improvement

Latent Principle Discovery for Language Model Self-Improvement

22 May 2025
Keshav Ramji
Tahira Naseem
Ramón Fernandez Astudillo
    LRM
ArXivPDFHTML

Papers citing "Latent Principle Discovery for Language Model Self-Improvement"

13 / 13 papers shown
Title
Inference-Time Scaling for Generalist Reward Modeling
Inference-Time Scaling for Generalist Reward Modeling
Zijun Liu
P. Wang
Ran Xu
Shirong Ma
Chong Ruan
Ziwei Sun
Yang Liu
Y. Wu
OffRL
LRM
137
40
0
03 Apr 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
333
1,641
0
22 Jan 2025
MTRAG: A Multi-Turn Conversational Benchmark for Evaluating Retrieval-Augmented Generation Systems
MTRAG: A Multi-Turn Conversational Benchmark for Evaluating Retrieval-Augmented Generation Systems
Yannis Katsis
Sara Rosenthal
Kshitij P. Fadnis
Chulaka Gunasekara
Young-Suk Lee
Lucian Popa
Vraj Shah
Huaiyu Zhu
Danish Contractor
Marina Danilevsky
RALM
LRM
44
11
0
08 Jan 2025
Inverse Constitutional AI: Compressing Preferences into Principles
Inverse Constitutional AI: Compressing Preferences into Principles
Arduin Findeis
Timo Kaufmann
Eyke Hüllermeier
Samuel Albanie
Robert Mullins
SyDa
78
12
0
02 Jun 2024
Large Language Models Can Self-Improve At Web Agent Tasks
Large Language Models Can Self-Improve At Web Agent Tasks
Ajay Patel
M. Hofmarcher
Claudiu Leoveanu-Condrei
Marius-Constantin Dinu
Chris Callison-Burch
Sepp Hochreiter
LLMAG
79
31
0
30 May 2024
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Yann Dubois
Balázs Galambosi
Percy Liang
Tatsunori Hashimoto
ALM
95
379
0
06 Apr 2024
SALMON: Self-Alignment with Instructable Reward Models
SALMON: Self-Alignment with Instructable Reward Models
Zhiqing Sun
Songlin Yang
Hongxin Zhang
Qinhong Zhou
Zhenfang Chen
David D. Cox
Yiming Yang
Chuang Gan
ALM
SyDa
79
39
0
09 Oct 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
320
4,298
0
09 Jun 2023
Principle-Driven Self-Alignment of Language Models from Scratch with
  Minimal Human Supervision
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
Zhiqing Sun
Songlin Yang
Qinhong Zhou
Hongxin Zhang
Zhenfang Chen
David D. Cox
Yiming Yang
Chuang Gan
SyDa
ALM
80
331
0
04 May 2023
Large Language Models Can Self-Improve
Large Language Models Can Self-Improve
Jiaxin Huang
S. Gu
Le Hou
Yuexin Wu
Xuezhi Wang
Hongkun Yu
Jiawei Han
ReLM
AI4MH
LRM
158
608
0
20 Oct 2022
STaR: Bootstrapping Reasoning With Reasoning
STaR: Bootstrapping Reasoning With Reasoning
E. Zelikman
Yuhuai Wu
Jesse Mu
Noah D. Goodman
ReLM
LRM
107
486
0
28 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
768
9,351
0
28 Jan 2022
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question
  Answering
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Zhilin Yang
Peng Qi
Saizheng Zhang
Yoshua Bengio
William W. Cohen
Ruslan Salakhutdinov
Christopher D. Manning
RALM
156
2,647
0
25 Sep 2018
1