Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.07723
Cited By
Generalization Analogies: A Testbed for Generalizing AI Oversight to Hard-To-Measure Domains
13 November 2023
Joshua Clymer
Garrett Baker
Rohan Subramani
Sam Wang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Generalization Analogies: A Testbed for Generalizing AI Oversight to Hard-To-Measure Domains"
10 / 10 papers shown
Title
Non-maximizing policies that fulfill multi-criterion aspirations in expectation
Simon Dima
Simon Fischer
J. Heitzig
Joss Oliver
23
1
0
08 Aug 2024
A Baseline Analysis of Reward Models' Ability To Accurately Analyze Foundation Models Under Distribution Shift
Will LeVine
Benjamin Pikus
Tony Chen
Sean Hendryx
34
8
0
21 Nov 2023
Towards Understanding Sycophancy in Language Models
Mrinank Sharma
Meg Tong
Tomasz Korbak
D. Duvenaud
Amanda Askell
...
Oliver Rausch
Nicholas Schiefer
Da Yan
Miranda Zhang
Ethan Perez
213
178
0
20 Oct 2023
Instruction Tuning with GPT-4
Baolin Peng
Chunyuan Li
Pengcheng He
Michel Galley
Jianfeng Gao
SyDa
ALM
LM&MA
159
579
0
06 Apr 2023
Can Large Language Models Truly Understand Prompts? A Case Study with Negated Prompts
Joel Jang
Seonghyeon Ye
Minjoon Seo
ELM
LRM
95
64
0
26 Sep 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,915
0
04 Mar 2022
Measure and Improve Robustness in NLP Models: A Survey
Xuezhi Wang
Haohan Wang
Diyi Yang
139
130
0
15 Dec 2021
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
D. Song
Jacob Steinhardt
ELM
AIMat
ALM
199
624
0
20 May 2021
CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP
Qinyuan Ye
Bill Yuchen Lin
Xiang Ren
211
179
0
18 Apr 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
280
3,844
0
18 Apr 2021
1