ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.19406
  4. Cited By
An Auditing Test To Detect Behavioral Shift in Language Models
v1v2 (latest)

An Auditing Test To Detect Behavioral Shift in Language Models

25 October 2024
Leo Richter
Xuanli He
Pasquale Minervini
Matt J. Kusner
ArXiv (abs)PDFHTML

Papers citing "An Auditing Test To Detect Behavioral Shift in Language Models"

30 / 80 papers shown
Title
Red Teaming Language Models with Language Models
Red Teaming Language Models with Language Models
Ethan Perez
Saffron Huang
Francis Song
Trevor Cai
Roman Ring
John Aslanides
Amelia Glaese
Nat McAleese
G. Irving
AAML
183
668
0
07 Feb 2022
pvCNN: Privacy-Preserving and Verifiable Convolutional Neural Network
  Testing
pvCNN: Privacy-Preserving and Verifiable Convolutional Neural Network Testing
Jiasi Weng
Jian Weng
Gui Tang
Anjia Yang
Ming Li
Jia-Nan Liu
63
32
0
23 Jan 2022
Nonparametric Two-Sample Testing by Betting
Nonparametric Two-Sample Testing by Betting
S. Shekhar
Aaditya Ramdas
79
30
0
16 Dec 2021
Protecting Intellectual Property of Language Generation APIs with
  Lexical Watermark
Protecting Intellectual Property of Language Generation APIs with Lexical Watermark
Xuanli He
Xingliang Yuan
Lingjuan Lyu
Fangzhao Wu
Chenguang Wang
WaLM
240
98
0
05 Dec 2021
Unsolved Problems in ML Safety
Unsolved Problems in ML Safety
Dan Hendrycks
Nicholas Carlini
John Schulman
Jacob Steinhardt
260
294
0
28 Sep 2021
MPC-Friendly Commitments for Publicly Verifiable Covert Security
MPC-Friendly Commitments for Publicly Verifiable Covert Security
Nitin Agrawal
James Bell
Adria Gascon
Matt J. Kusner
70
6
0
15 Sep 2021
Large-Scale Differentially Private BERT
Large-Scale Differentially Private BERT
Rohan Anil
Badih Ghazi
Vineet Gupta
Ravi Kumar
Pasin Manurangsi
80
138
0
03 Aug 2021
Deduplicating Training Data Makes Language Models Better
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
360
636
0
14 Jul 2021
VeriDL: Integrity Verification of Outsourced Deep Learning Services
  (Extended Version)
VeriDL: Integrity Verification of Outsourced Deep Learning Services (Extended Version)
Boxiang Dong
Bo Zhang
Hui
Wendy Hui Wang
23
8
0
01 Jul 2021
Cross-Task Generalization via Natural Language Crowdsourcing
  Instructions
Cross-Task Generalization via Natural Language Crowdsourcing Instructions
Swaroop Mishra
Daniel Khashabi
Chitta Baral
Hannaneh Hajishirzi
LRM
173
753
0
18 Apr 2021
There are natural scores: Full comment on Shafer, "Testing by betting: A
  strategy for statistical and scientific communication"
There are natural scores: Full comment on Shafer, "Testing by betting: A strategy for statistical and scientific communication"
S. Greenland
FAtt
53
126
0
10 Feb 2021
Learning from the Worst: Dynamically Generated Datasets to Improve
  Online Hate Detection
Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection
Bertie Vidgen
Tristan Thrush
Zeerak Talat
Douwe Kiela
136
273
0
31 Dec 2020
CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked
  Language Models
CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models
Nikita Nangia
Clara Vania
Rasika Bhalerao
Samuel R. Bowman
131
689
0
30 Sep 2020
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language
  Models
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
Samuel Gehman
Suchin Gururangan
Maarten Sap
Yejin Choi
Noah A. Smith
168
1,221
0
24 Sep 2020
Black Box to White Box: Discover Model Characteristics Based on
  Strategic Probing
Black Box to White Box: Discover Model Characteristics Based on Strategic Probing
Josh Kalin
Matthew Ciolino
David Noever
Gerry V. Dozier
AAML
10
9
0
07 Sep 2020
Fairness in the Eyes of the Data: Certifying Machine-Learning Models
Fairness in the Eyes of the Data: Certifying Machine-Learning Models
Shahar Segal
Yossi Adi
Benny Pinkas
Carsten Baum
C. Ganesh
Joseph Keshet
FedML
59
37
0
03 Sep 2020
SOLID: A Large-Scale Semi-Supervised Dataset for Offensive Language
  Identification
SOLID: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification
Sara Rosenthal
Pepa Atanasova
Georgi Karadzhov
Marcos Zampieri
Preslav Nakov
61
165
0
29 Apr 2020
Weight Poisoning Attacks on Pre-trained Models
Weight Poisoning Attacks on Pre-trained Models
Keita Kurita
Paul Michel
Graham Neubig
AAMLSILM
138
454
0
14 Apr 2020
On the Ethics of Building AI in a Responsible Manner
On the Ethics of Building AI in a Responsible Manner
Shai Shalev-Shwartz
Shaked Shammah
Amnon Shashua
23
6
0
30 Mar 2020
Predicting the Type and Target of Offensive Posts in Social Media
Predicting the Type and Target of Offensive Posts in Social Media
Marcos Zampieri
S. Malmasi
Preslav Nakov
Sara Rosenthal
N. Farra
Ritesh Kumar
89
780
0
25 Feb 2019
Time-uniform, nonparametric, nonasymptotic confidence sequences
Time-uniform, nonparametric, nonasymptotic confidence sequences
Steven R. Howard
Aaditya Ramdas
Jon D. McAuliffe
Jasjeet Sekhon
101
245
0
18 Oct 2018
VerIDeep: Verifying Integrity of Deep Neural Networks through
  Sensitive-Sample Fingerprinting
VerIDeep: Verifying Integrity of Deep Neural Networks through Sensitive-Sample Fingerprinting
Zecheng He
Tianwei Zhang
R. Lee
FedMLAAMLMLAU
62
19
0
09 Aug 2018
HiDDeN: Hiding Data With Deep Networks
HiDDeN: Hiding Data With Deep Networks
Jiren Zhu
Russell Kaplan
Justin Johnson
Li Fei-Fei
WIGM
54
755
0
26 Jul 2018
Know What You Don't Know: Unanswerable Questions for SQuAD
Know What You Don't Know: Unanswerable Questions for SQuAD
Pranav Rajpurkar
Robin Jia
Percy Liang
RALMELM
292
2,854
0
11 Jun 2018
Blind Justice: Fairness with Encrypted Sensitive Attributes
Blind Justice: Fairness with Encrypted Sensitive Attributes
Niki Kilbertus
Adria Gascon
Matt J. Kusner
Michael Veale
Krishna P. Gummadi
Adrian Weller
60
152
0
08 Jun 2018
Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods
Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods
Jieyu Zhao
Tianlu Wang
Mark Yatskar
Vicente Ordonez
Kai-Wei Chang
133
944
0
18 Apr 2018
SafetyNets: Verifiable Execution of Deep Neural Networks on an Untrusted
  Cloud
SafetyNets: Verifiable Execution of Deep Neural Networks on an Untrusted Cloud
Zahra Ghodsi
Tianyu Gu
S. Garg
76
161
0
30 Jun 2017
Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word
  Embeddings
Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings
Tolga Bolukbasi
Kai-Wei Chang
James Zou
Venkatesh Saligrama
Adam Kalai
CVBMFaML
114
3,156
0
21 Jul 2016
Deep Learning with Differential Privacy
Deep Learning with Differential Privacy
Martín Abadi
Andy Chu
Ian Goodfellow
H. B. McMahan
Ilya Mironov
Kunal Talwar
Li Zhang
FedMLSyDa
216
6,172
0
01 Jul 2016
Optimal Algorithms for Testing Closeness of Discrete Distributions
Optimal Algorithms for Testing Closeness of Discrete Distributions
S. Chan
Ilias Diakonikolas
Paul Valiant
Gregory Valiant
85
225
0
19 Aug 2013
Previous
12