Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.15605
Cited By
Benchmarks for Detecting Measurement Tampering
29 August 2023
Fabien Roger
Ryan Greenblatt
Max Nadeau
Buck Shlegeris
Nate Thomas
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Benchmarks for Detecting Measurement Tampering"
15 / 15 papers shown
Title
LEACE: Perfect linear concept erasure in closed form
Nora Belrose
David Schneider-Joseph
Shauli Ravfogel
Ryan Cotterell
Edward Raff
Stella Biderman
KELM
MU
68
115
0
06 Jun 2023
Discovering Latent Knowledge in Language Models Without Supervision
Collin Burns
Haotian Ye
Dan Klein
Jacob Steinhardt
122
368
0
07 Dec 2022
Formalizing the presumption of independence
Paul Christiano
Eric Neyman
Mark Xu
LRM
24
7
0
12 Nov 2022
BackdoorBench: A Comprehensive Benchmark of Backdoor Learning
Baoyuan Wu
Hongrui Chen
Ruotong Wang
Zihao Zhu
Shaokui Wei
Danni Yuan
Chaoxiao Shen
ELM
AAML
60
144
0
25 Jun 2022
Last Layer Re-Training is Sufficient for Robustness to Spurious Correlations
Polina Kirichenko
Pavel Izmailov
A. Wilson
OOD
76
333
0
06 Apr 2022
Formal Mathematics Statement Curriculum Learning
Stanislas Polu
Jesse Michael Han
Kunhao Zheng
Mantas Baksys
Igor Babuschkin
Ilya Sutskever
AIMat
113
124
0
03 Feb 2022
Program Synthesis with Large Language Models
Jacob Austin
Augustus Odena
Maxwell Nye
Maarten Bosma
Henryk Michalewski
...
Ellen Jiang
Carrie J. Cai
Michael Terry
Quoc V. Le
Charles Sutton
ELM
AIMat
ReCod
ALM
186
1,948
0
16 Aug 2021
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
D. Song
Jacob Steinhardt
ELM
AIMat
ALM
243
679
0
20 May 2021
Consequences of Misaligned AI
Simon Zhuang
Dylan Hadfield-Menell
59
75
0
07 Feb 2021
Backdoor Learning: A Survey
Yiming Li
Yong Jiang
Zhifeng Li
Shutao Xia
AAML
89
602
0
17 Jul 2020
An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models
Lifu Tu
Garima Lalwani
Spandana Gella
He He
LRM
85
186
0
14 Jul 2020
Neural Unsupervised Domain Adaptation in NLP---A Survey
Alan Ramponi
Barbara Plank
OOD
82
258
0
31 May 2020
Deep Learning for Anomaly Detection: A Survey
Raghavendra Chalapathy
Sanjay Chawla
AI4TS
154
1,494
0
10 Jan 2019
Playing Atari with Deep Reinforcement Learning
Volodymyr Mnih
Koray Kavukcuoglu
David Silver
Alex Graves
Ioannis Antonoglou
Daan Wierstra
Martin Riedmiller
119
12,223
0
19 Dec 2013
A Survey on Multi-view Learning
Chang Xu
Dacheng Tao
Chao Xu
AI4TS
103
1,128
0
20 Apr 2013
1