Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.18893
Cited By
v1
v2
v3
v4 (latest)
Reality Check: A New Evaluation Ecosystem Is Necessary to Understand AI's Real World Effects
24 May 2025
Reva Schwartz
Rumman Chowdhury
Akash Kundu
Heather Frase
Marzieh Fadaee
Tom David
Gabriella Waters
Afaf Taik
Morgan Briggs
Patrick Hall
Shomik Jain
Kyra Yee
Spencer Thomas
Sundeep Bhandari
Paul Duncan
Andrew Thompson
Maya Carlyle
Qinghua Lu
Matthew Holmes
Theodora Skeadas
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Reality Check: A New Evaluation Ecosystem Is Necessary to Understand AI's Real World Effects"
48 / 48 papers shown
Title
Real-World Gaps in AI Governance Research
Ilan Strauss
Isobel Moure
Tim O'Reilly
Sruly Rosenblat
132
1
0
30 Apr 2025
Decentralized Collective World Model for Emergent Communication and Coordination
Kentaro Nomura
Tatsuya Aoki
Tadahiro Taniguchi
Takato Horii
164
1
0
04 Apr 2025
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
Nathaniel Li
Ziwen Han
Ian Steneker
Willow Primack
Riley Goodside
Hugh Zhang
Zifan Wang
Cristina Menghini
Summer Yue
AAML
MU
82
55
0
27 Aug 2024
A Collaborative, Human-Centred Taxonomy of AI, Algorithmic, and Automation Harms
Gavin Abercrombie
Djalel Benbouzid
Paolo Giudici
Delaram Golpayegani
Julio Hernandez
...
Ushnish Sengupta
Arthit Suriyawongful
Ruby Thelot
Sofia Vei
Laura Waltersdorfer
55
12
0
01 Jul 2024
Challenging the Machine: Contestability in Government AI Systems
Susan Landau
James X. Dempsey
Ece Kamar
S. Bellovin
Robert Pool
ELM
SILM
44
3
0
14 Jun 2024
Harmful Speech Detection by Language Models Exhibits Gender-Queer Dialect Bias
Rebecca Dorn
Lee Kezar
Fred Morstatter
Kristina Lerman
81
11
0
23 May 2024
Analytical results for uncertainty propagation through trained machine learning regression models
Andrew Thompson
47
4
0
17 Apr 2024
From Representational Harms to Quality-of-Service Harms: A Case Study on Llama 2 Safety Safeguards
Khaoula Chehbouni
Megha Roshan
Emmanuel Ma
Futian Andrew Wei
Afaf Taik
Jackie CK Cheung
G. Farnadi
64
8
0
20 Mar 2024
Stealing Part of a Production Language Model
Nicholas Carlini
Daniel Paleka
Krishnamurthy Dvijotham
Thomas Steinke
Jonathan Hayase
...
Arthur Conmy
Itay Yona
Eric Wallace
David Rolnick
Florian Tramèr
MLAU
AAML
58
83
0
11 Mar 2024
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Nathaniel Li
Alexander Pan
Anjali Gopal
Summer Yue
Daniel Berrios
...
Yan Shoshitaishvili
Jimmy Ba
K. Esvelt
Alexandr Wang
Dan Hendrycks
ELM
98
185
0
05 Mar 2024
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Mikayel Samvelyan
Sharath Chandra Raparthy
Andrei Lupu
Eric Hambro
Aram H. Markosyan
...
Minqi Jiang
Jack Parker-Holder
Jakob Foerster
Tim Rocktaschel
Roberta Raileanu
SyDa
101
85
0
26 Feb 2024
Do Membership Inference Attacks Work on Large Language Models?
Michael Duan
Anshuman Suri
Niloofar Mireshghallah
Sewon Min
Weijia Shi
Luke Zettlemoyer
Yulia Tsvetkov
Yejin Choi
David Evans
Hanna Hajishirzi
MIALM
104
91
0
12 Feb 2024
Task Contamination: Language Models May Not Be Few-Shot Anymore
Changmao Li
Jeffrey Flanigan
156
102
0
26 Dec 2023
Concrete Problems in AI Safety, Revisited
Inioluwa Deborah Raji
Roel Dobbe
55
16
0
18 Dec 2023
TaskBench: Benchmarking Large Language Models for Task Automation
Yongliang Shen
Kaitao Song
Xu Tan
Wenqi Zhang
Kan Ren
Siyu Yuan
Weiming Lu
Dongsheng Li
Yueting Zhuang
90
65
0
30 Nov 2023
Sociotechnical Safety Evaluation of Generative AI Systems
Laura Weidinger
Maribeth Rauh
Nahema Marchal
Arianna Manzini
Lisa Anne Hendricks
...
Conor Griffin
Ben Bariach
Iason Gabriel
Verena Rieser
William S. Isaac
EGVM
47
139
0
18 Oct 2023
Jailbreaking Black Box Large Language Models in Twenty Queries
Patrick Chao
Alexander Robey
Yan Sun
Hamed Hassani
George J. Pappas
Eric Wong
AAML
110
690
0
12 Oct 2023
Beyond Memorization: Violating Privacy Via Inference with Large Language Models
Robin Staab
Mark Vero
Mislav Balunović
Martin Vechev
PILM
57
90
0
11 Oct 2023
FELM: Benchmarking Factuality Evaluation of Large Language Models
Shiqi Chen
Yiran Zhao
Jinghan Zhang
Ethan Chern
Siyang Gao
Pengfei Liu
Junxian He
HILM
97
39
0
01 Oct 2023
Calibration in Deep Learning: A Survey of the State-of-the-Art
Cheng Wang
UQCV
89
43
0
02 Aug 2023
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Casper
Xander Davies
Claudia Shi
T. Gilbert
Jérémy Scheurer
...
Erdem Biyik
Anca Dragan
David M. Krueger
Dorsa Sadigh
Dylan Hadfield-Menell
ALM
OffRL
129
517
0
27 Jul 2023
A Survey on Evaluation of Large Language Models
Yu-Chu Chang
Xu Wang
Jindong Wang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELM
LM&MA
ALM
128
1,686
0
06 Jul 2023
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models
Wei Ping
Weixin Chen
Hengzhi Pei
Chulin Xie
Mintong Kang
...
Zinan Lin
Yuk-Kit Cheng
Sanmi Koyejo
Basel Alomair
Yue Liu
98
416
0
20 Jun 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
361
4,388
0
09 Jun 2023
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Kai Greshake
Sahar Abdelnabi
Shailesh Mishra
C. Endres
Thorsten Holz
Mario Fritz
SILM
126
488
0
23 Feb 2023
Sociotechnical Harms of Algorithmic Systems: Scoping a Taxonomy for Harm Reduction
Renee Shelby
Shalaleh Rismani
Kathryn Henne
AJung Moon
Negar Rostamzadeh
...
N'Mah Yilla-Akbari
Jess Gallegos
A. Smart
Emilio Garcia
Gurleen Virk
79
200
0
11 Oct 2022
On the Impossible Safety of Large AI Models
El-Mahdi El-Mhamdi
Sadegh Farhadkhani
R. Guerraoui
Nirupam Gupta
L. Hoang
Rafael Pinot
Sébastien Rouault
John Stephan
87
33
0
30 Sep 2022
Leakage and the Reproducibility Crisis in ML-based Science
Sayash Kapoor
Arvind Narayanan
55
181
0
14 Jul 2022
The Fallacy of AI Functionality
Inioluwa Deborah Raji
Indra Elizabeth Kumar
Aaron Horowitz
Andrew D. Selbst
68
187
0
20 Jun 2022
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Aarohi Srivastava
Abhinav Rastogi
Abhishek Rao
Abu Awal Md Shoeb
Abubakar Abid
...
Zhuoye Zhao
Zijian Wang
Zijie J. Wang
Zirui Wang
Ziyi Wu
ELM
192
1,768
0
09 Jun 2022
REAL ML: Recognizing, Exploring, and Articulating Limitations of Machine Learning Research
Jessie J. Smith
Saleema Amershi
Solon Barocas
Hanna M. Wallach
J. W. Vaughan
30
32
0
05 May 2022
A Deeper Look into Aleatoric and Epistemic Uncertainty Disentanglement
Matias Valdenegro-Toro
Daniel Saromo
UD
PER
BDL
UQCV
56
83
0
20 Apr 2022
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
...
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
254
2,561
0
12 Apr 2022
Quantifying Memorization Across Neural Language Models
Nicholas Carlini
Daphne Ippolito
Matthew Jagielski
Katherine Lee
Florian Tramèr
Chiyuan Zhang
PILM
121
628
0
15 Feb 2022
Survey of Hallucination in Natural Language Generation
Ziwei Ji
Nayeon Lee
Rita Frieske
Tiezheng Yu
D. Su
...
Delong Chen
Wenliang Dai
Ho Shu Chan
Andrea Madotto
Pascale Fung
HILM
LRM
210
2,394
0
08 Feb 2022
Just What do You Think You're Doing, Dave?' A Checklist for Responsible Data Use in NLP
Anna Rogers
Timothy Baldwin
Kobi Leins
116
66
0
14 Sep 2021
Hard Choices in Artificial Intelligence
Roel Dobbe
T. Gilbert
Yonatan Dov Mintz
53
56
0
10 Jun 2021
Dynabench: Rethinking Benchmarking in NLP
Douwe Kiela
Max Bartolo
Yixin Nie
Divyansh Kaushik
Atticus Geiger
...
Pontus Stenetorp
Robin Jia
Joey Tianyi Zhou
Christopher Potts
Adina Williams
203
407
0
07 Apr 2021
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
Basel Alomair
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
492
1,923
0
14 Dec 2020
Underspecification Presents Challenges for Credibility in Modern Machine Learning
Alexander DÁmour
Katherine A. Heller
D. Moldovan
Ben Adlam
B. Alipanahi
...
Kellie Webster
Steve Yadlowsky
T. Yun
Xiaohua Zhai
D. Sculley
OffRL
117
687
0
06 Nov 2020
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
Samuel Gehman
Suchin Gururangan
Maarten Sap
Yejin Choi
Noah A. Smith
158
1,209
0
24 Sep 2020
Participation is not a Design Fix for Machine Learning
Mona Sloane
Emanuel Moss
O. Awomolo
Laura Forlano
HAI
72
212
0
05 Jul 2020
Sponge Examples: Energy-Latency Attacks on Neural Networks
Ilia Shumailov
Yiren Zhao
Daniel Bates
Nicolas Papernot
Robert D. Mullins
Ross J. Anderson
SILM
61
135
0
05 Jun 2020
The Problem with Metrics is a Fundamental Problem for AI
Rachel L. Thomas
D. Uminsky
111
68
0
20 Feb 2020
Show Your Work: Improved Reporting of Experimental Results
Jesse Dodge
Suchin Gururangan
Dallas Card
Roy Schwartz
Noah A. Smith
72
255
0
06 Sep 2019
Do ImageNet Classifiers Generalize to ImageNet?
Benjamin Recht
Rebecca Roelofs
Ludwig Schmidt
Vaishaal Shankar
OOD
SSeg
VLM
113
1,715
0
13 Feb 2019
Fairwashing: the risk of rationalization
Ulrich Aïvodji
Hiromi Arai
O. Fortineau
Sébastien Gambs
Satoshi Hara
Alain Tapp
FaML
52
147
0
28 Jan 2019
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
Y. Gal
Zoubin Ghahramani
UQCV
BDL
829
9,318
0
06 Jun 2015
1