Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.04108
Cited By
Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs
4 July 2024
Sara Price
Arjun Panickssery
Sam Bowman
Asa Cooper Stickland
LLMSV
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs"
3 / 3 papers shown
Title
AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Teun van der Weij
Felix Hofstätter
Ollie Jaffe
Samuel F. Brown
Francis Rhys Ward
ELM
45
23
0
11 Jun 2024
Poisoning Language Models During Instruction Tuning
Alexander Wan
Eric Wallace
Sheng Shen
Dan Klein
SILM
92
124
0
01 May 2023
Probing Classifiers: Promises, Shortcomings, and Advances
Yonatan Belinkov
226
405
0
24 Feb 2021
1