ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.05802
  4. Cited By
Self-critiquing models for assisting human evaluators
v1v2 (latest)

Self-critiquing models for assisting human evaluators

12 June 2022
William Saunders
Catherine Yeh
Jeff Wu
Steven Bills
Ouyang Long
Jonathan Ward
Jan Leike
    ALMELM
ArXiv (abs)PDFHTML

Papers citing "Self-critiquing models for assisting human evaluators"

50 / 238 papers shown
Title
AI Control: Improving Safety Despite Intentional Subversion
AI Control: Improving Safety Despite Intentional Subversion
Ryan Greenblatt
Buck Shlegeris
Kshitij Sachan
Fabien Roger
120
55
0
12 Dec 2023
Evaluating and Mitigating Discrimination in Language Model Decisions
Evaluating and Mitigating Discrimination in Language Model Decisions
Alex Tamkin
Amanda Askell
Liane Lovitt
Esin Durmus
Nicholas Joseph
Shauna Kravec
Karina Nguyen
Jared Kaplan
Deep Ganguli
97
76
0
06 Dec 2023
Eliciting Latent Knowledge from Quirky Language Models
Eliciting Latent Knowledge from Quirky Language Models
Alex Troy Mallen
Madeline Brumley
Julia Kharchenko
Nora Belrose
HILMRALMKELM
114
33
0
02 Dec 2023
Data-Efficient Alignment of Large Language Models with Human Feedback
  Through Natural Language
Data-Efficient Alignment of Large Language Models with Human Feedback Through Natural Language
Di Jin
Shikib Mehri
Devamanyu Hazarika
Aishwarya Padmakumar
Sungjin Lee
Yang Liu
Mahdi Namazifar
ALM
83
17
0
24 Nov 2023
Scalable AI Safety via Doubly-Efficient Debate
Scalable AI Safety via Doubly-Efficient Debate
Jonah Brown-Cohen
Geoffrey Irving
Georgios Piliouras
73
18
0
23 Nov 2023
Digital Socrates: Evaluating LLMs through Explanation Critiques
Digital Socrates: Evaluating LLMs through Explanation Critiques
Yuling Gu
Oyvind Tafjord
Peter Clark
ELMLRM
89
2
0
16 Nov 2023
Towards Evaluating AI Systems for Moral Status Using Self-Reports
Towards Evaluating AI Systems for Moral Status Using Self-Reports
Ethan Perez
Robert Long
ELM
72
12
0
14 Nov 2023
LLMs cannot find reasoning errors, but can correct them given the error
  location
LLMs cannot find reasoning errors, but can correct them given the error location
Gladys Tyen
Hassan Mansoor
Victor Carbune
Peter Chen
Tony Mak
LRM
150
79
0
14 Nov 2023
A Survey on Hallucination in Large Language Models: Principles,
  Taxonomy, Challenges, and Open Questions
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
Lei Huang
Weijiang Yu
Weitao Ma
Weihong Zhong
Zhangyin Feng
...
Qianglong Chen
Weihua Peng
Xiaocheng Feng
Bing Qin
Ting Liu
LRMHILM
145
939
0
09 Nov 2023
Clover: Closed-Loop Verifiable Code Generation
Clover: Closed-Loop Verifiable Code Generation
Chuyue Sun
Ying Sheng
Oded Padon
Clark W. Barrett
OffRLALM
148
31
0
26 Oct 2023
Unpacking the Ethical Value Alignment in Big Models
Unpacking the Ethical Value Alignment in Big Models
Xiaoyuan Yi
Jing Yao
Xiting Wang
Xing Xie
82
13
0
26 Oct 2023
PreWoMe: Exploiting Presuppositions as Working Memory for Long Form
  Question Answering
PreWoMe: Exploiting Presuppositions as Working Memory for Long Form Question Answering
Wookje Han
Jinsol Park
Kyungjae Lee
82
4
0
24 Oct 2023
Teaching Language Models to Self-Improve through Interactive
  Demonstrations
Teaching Language Models to Self-Improve through Interactive Demonstrations
Xiao Yu
Baolin Peng
Michel Galley
Jianfeng Gao
Zhou Yu
LRMReLM
104
22
0
20 Oct 2023
Towards Understanding Sycophancy in Language Models
Towards Understanding Sycophancy in Language Models
Mrinank Sharma
Meg Tong
Tomasz Korbak
David Duvenaud
Amanda Askell
...
Oliver Rausch
Nicholas Schiefer
Da Yan
Miranda Zhang
Ethan Perez
371
247
0
20 Oct 2023
Pseudointelligence: A Unifying Framework for Language Model Evaluation
Pseudointelligence: A Unifying Framework for Language Model Evaluation
Shikhar Murty
Orr Paradise
Pratyusha Sharma
40
0
0
18 Oct 2023
Denevil: Towards Deciphering and Navigating the Ethical Values of Large
  Language Models via Instruction Learning
Denevil: Towards Deciphering and Navigating the Ethical Values of Large Language Models via Instruction Learning
Shitong Duan
Xiaoyuan Yi
Peng Zhang
Tun Lu
Xing Xie
Ning Gu
90
12
0
17 Oct 2023
Factored Verification: Detecting and Reducing Hallucination in Summaries
  of Academic Papers
Factored Verification: Detecting and Reducing Hallucination in Summaries of Academic Papers
Charlie George
Andreas Stuhlmuller
HILM
30
5
0
16 Oct 2023
Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake
  Analysis
Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis
Kai Chen
Chunwei Wang
Kuo Yang
Jianhua Han
Lanqing Hong
...
Zhenguo Li
Dit-Yan Yeung
Lifeng Shang
Xin Jiang
Qun Liu
183
36
0
16 Oct 2023
Visual Data-Type Understanding does not emerge from Scaling
  Vision-Language Models
Visual Data-Type Understanding does not emerge from Scaling Vision-Language Models
Vishaal Udandarao
Max F. Burg
Samuel Albanie
Matthias Bethge
VLM
67
9
0
12 Oct 2023
Constructive Large Language Models Alignment with Diverse Feedback
Constructive Large Language Models Alignment with Diverse Feedback
Tianshu Yu
Ting-En Lin
Yuchuan Wu
Min Yang
Fei Huang
Yongbin Li
ALM
108
9
0
10 Oct 2023
Let Models Speak Ciphers: Multiagent Debate through Embeddings
Let Models Speak Ciphers: Multiagent Debate through Embeddings
Chau Pham
Boyi Liu
Yingxiang Yang
Zhengyu Chen
Tianyi Liu
Jianbo Yuan
Bryan A. Plummer
Zhaoran Wang
Hongxia Yang
LLMAG
111
19
0
10 Oct 2023
Generative Judge for Evaluating Alignment
Generative Judge for Evaluating Alignment
Junlong Li
Shichao Sun
Weizhe Yuan
Run-Ze Fan
Hai Zhao
Pengfei Liu
ELMALM
119
91
0
09 Oct 2023
Critique Ability of Large Language Models
Critique Ability of Large Language Models
Liangchen Luo
Zi Lin
Yinxiao Liu
Lei Shu
Yun Zhu
Jingbo Shang
Lei Meng
AI4MHLRMELM
71
16
0
07 Oct 2023
Thought Propagation: An Analogical Approach to Complex Reasoning with
  Large Language Models
Thought Propagation: An Analogical Approach to Complex Reasoning with Large Language Models
Junchi Yu
Ran He
Rex Ying
LRM
134
31
0
06 Oct 2023
Assessing Large Language Models on Climate Information
Assessing Large Language Models on Climate Information
Jannis Bulian
Mike S. Schäfer
Afra Amini
Heidi Lam
Massimiliano Ciaramita
...
Michelle Chen Huebscher
Christian Buck
Niels G. Mede
Markus Leippold
Nadine Strauss
ELM
89
22
0
04 Oct 2023
Reward Model Ensembles Help Mitigate Overoptimization
Reward Model Ensembles Help Mitigate Overoptimization
Thomas Coste
Usman Anwar
Robert Kirk
David M. Krueger
NoLaALM
118
139
0
04 Oct 2023
ValueDCG: Measuring Comprehensive Human Value Understanding Ability of
  Language Models
ValueDCG: Measuring Comprehensive Human Value Understanding Ability of Language Models
Zhaowei Zhang
Fengshuo Bai
Jun Gao
Yaodong Yang
PILMELM
78
3
0
30 Sep 2023
How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking
  Unrelated Questions
How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions
Lorenzo Pacchiardi
A. J. Chan
Sören Mindermann
Ilan Moscovitz
Alexa Y. Pan
Y. Gal
Owain Evans
J. Brauner
LLMAGHILM
84
54
0
26 Sep 2023
Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language Models
  through Logic
Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language Models through Logic
Xufeng Zhao
Mengdi Li
Wenhao Lu
C. Weber
Jae Hee Lee
Kun-Mo Chu
S. Wermter
LRMAI4CEReLM
102
37
0
23 Sep 2023
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language
  Feedback
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback
Xingyao Wang
Zihan Wang
Jiateng Liu
Yangyi Chen
Lifan Yuan
Hao Peng
Heng Ji
LRM
225
162
0
19 Sep 2023
SYNDICOM: Improving Conversational Commonsense with Error-Injection and
  Natural Language Feedback
SYNDICOM: Improving Conversational Commonsense with Error-Injection and Natural Language Feedback
Christopher Richardson
Anirudh S. Sundar
Larry Heck
LRM
123
4
0
18 Sep 2023
Facilitating NSFW Text Detection in Open-Domain Dialogue Systems via Knowledge Distillation
Huachuan Qiu
Shuai Zhang
Hongliang He
Anqi Li
Zhenzhong Lan
29
2
0
18 Sep 2023
Exploring the impact of low-rank adaptation on the performance,
  efficiency, and regularization of RLHF
Exploring the impact of low-rank adaptation on the performance, efficiency, and regularization of RLHF
Simeng Sun
Dhawal Gupta
Mohit Iyyer
89
20
0
16 Sep 2023
ICLEF: In-Context Learning with Expert Feedback for Explainable Style
  Transfer
ICLEF: In-Context Learning with Expert Feedback for Explainable Style Transfer
Arkadiy Saakyan
Smaranda Muresan
93
4
0
15 Sep 2023
Large Language Model for Science: A Study on P vs. NP
Large Language Model for Science: A Study on P vs. NP
Qingxiu Dong
Li Dong
Ke Xu
Guangyan Zhou
Y. Hao
Zhifang Sui
Furu Wei
LRM
45
17
0
11 Sep 2023
Cognitive Architectures for Language Agents
Cognitive Architectures for Language Agents
T. Sumers
Shunyu Yao
Karthik Narasimhan
Thomas Griffiths
LLMAGLM&Ro
166
182
0
05 Sep 2023
ZYN: Zero-Shot Reward Models with Yes-No Questions for RLAIF
ZYN: Zero-Shot Reward Models with Yes-No Questions for RLAIF
Víctor Gallego
SyDa
88
4
0
11 Aug 2023
Self-Alignment with Instruction Backtranslation
Self-Alignment with Instruction Backtranslation
Xian Li
Ping Yu
Chunting Zhou
Timo Schick
Omer Levy
Luke Zettlemoyer
Jason Weston
M. Lewis
SyDa
122
135
0
11 Aug 2023
A Preliminary Study of the Intrinsic Relationship between Complexity and
  Alignment
A Preliminary Study of the Intrinsic Relationship between Complexity and Alignment
Ying Zhao
Yu Bowen
Binyuan Hui
Haiyang Yu
Fei Huang
Yongbin Li
N. Zhang
136
25
0
10 Aug 2023
Shepherd: A Critic for Language Model Generation
Shepherd: A Critic for Language Model Generation
Tianlu Wang
Ping Yu
Xiaoqing Ellen Tan
Sean O'Brien
Ramakanth Pasunuru
Jane Dwivedi-Yu
O. Yu. Golovneva
Luke Zettlemoyer
Maryam Fazel-Zarandi
Asli Celikyilmaz
ALM
84
87
0
08 Aug 2023
Simple synthetic data reduces sycophancy in large language models
Simple synthetic data reduces sycophancy in large language models
Jerry W. Wei
Da Huang
Yifeng Lu
Denny Zhou
Quoc V. Le
117
74
0
07 Aug 2023
Automatically Correcting Large Language Models: Surveying the landscape
  of diverse self-correction strategies
Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies
Liangming Pan
Michael Stephen Saxon
Wenda Xu
Deepak Nathani
Xinyi Wang
William Yang Wang
KELMLRM
116
216
0
06 Aug 2023
Wider and Deeper LLM Networks are Fairer LLM Evaluators
Wider and Deeper LLM Networks are Fairer LLM Evaluators
Xinghua Zhang
Yu Bowen
Haiyang Yu
Yangyu Lv
Tingwen Liu
Fei Huang
Hongbo Xu
Yongbin Li
ALM
146
90
0
03 Aug 2023
HAGRID: A Human-LLM Collaborative Dataset for Generative
  Information-Seeking with Attribution
HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution
Ehsan Kamalloo
A. Jafari
Xinyu Crystina Zhang
Nandan Thakur
Jimmy J. Lin
70
44
0
31 Jul 2023
Open Problems and Fundamental Limitations of Reinforcement Learning from
  Human Feedback
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Casper
Xander Davies
Claudia Shi
T. Gilbert
Jérémy Scheurer
...
Erdem Biyik
Anca Dragan
David M. Krueger
Dorsa Sadigh
Dylan Hadfield-Menell
ALMOffRL
162
535
0
27 Jul 2023
Does Circuit Analysis Interpretability Scale? Evidence from Multiple
  Choice Capabilities in Chinchilla
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Tom Lieberum
Matthew Rahtz
János Kramár
Neel Nanda
G. Irving
Rohin Shah
Vladimir Mikulik
103
115
0
18 Jul 2023
RLTF: Reinforcement Learning from Unit Test Feedback
RLTF: Reinforcement Learning from Unit Test Feedback
Jiate Liu
Yiqin Zhu
Kaiwen Xiao
Qiang Fu
Xiao Han
Wei Yang
Deheng Ye
OffRL
103
62
0
10 Jul 2023
Let Me Teach You: Pedagogical Foundations of Feedback for Language
  Models
Let Me Teach You: Pedagogical Foundations of Feedback for Language Models
Beatriz Borges
Niket Tandon
Tanja Käser
Antoine Bosselut
158
4
0
01 Jul 2023
System-Level Natural Language Feedback
System-Level Natural Language Feedback
Weizhe Yuan
Kyunghyun Cho
Jason Weston
119
5
0
23 Jun 2023
Improving Open Language Models by Learning from Organic Interactions
Improving Open Language Models by Learning from Organic Interactions
Jing Xu
Da Ju
Joshua Lane
M. Komeili
Eric Michael Smith
...
Rashel Moritz
Sainbayar Sukhbaatar
Y-Lan Boureau
Jason Weston
Kurt Shuster
79
9
0
07 Jun 2023
Previous
12345
Next