Assessing Language Model Deployment with Risk Cards

31 March 2023

Leon Derczynski

Hannah Rose Kirk

Vidhisha Balachandran

Papers citing "Assessing Language Model Deployment with Risk Cards"

28 / 28 papers shown

Title
Position: A taxonomy for reporting and describing AI security incidents L. Bieringer Kevin Paeth Andreas Wespi Kathrin Grosse Alexandre Alahi Kathrin Grosse 78 0 0 19 Dec 2024
Ethics Whitepaper: Whitepaper on Ethical Research into Large Language Models Eddie L. Ungless Nikolas Vitsakis Zeerak Talat James Garforth Bjorn Ross Arno Onken Atoosa Kasirzadeh Alexandra Birch 28 1 0 17 Oct 2024
BenchmarkCards: Large Language Model and Risk Reporting Anna Sokol Nuno Moniz Elizabeth M. Daly Michael Hind Nitesh V. Chawla 31 0 0 16 Oct 2024
The Art of Saying No: Contextual Noncompliance in Language Models Faeze Brahman Sachin Kumar Vidhisha Balachandran Pradeep Dasigi Valentina Pyatkin ... Jack Hessel Yulia Tsvetkov Noah A. Smith Yejin Choi Hannaneh Hajishirzi 72 20 0 02 Jul 2024
Documentation Practices of Artificial Intelligence Stefan Arnold Dilara Yesilbas Rene Gröbner Dominik Riedelbauch Maik Horn Sven Weinzierl AI4TS 26 0 0 26 Jun 2024
garak: A Framework for Security Probing Large Language Models Leon Derczynski Erick Galinkin Jeffrey Martin Subho Majumdar Nanna Inie AAML ELM 38 16 0 16 Jun 2024
Hacc-Man: An Arcade Game for Jailbreaking LLMs Matheus Valentim Jeanette Falk Nanna Inie LLMAG 29 5 0 24 May 2024
Risks and Opportunities of Open-Source Generative AI Francisco Eiras Aleksander Petrov Bertie Vidgen Christian Schroeder Fabio Pizzati ... Matthew Jackson Phillip H. S. Torr Trevor Darrell Y. Lee Jakob N. Foerster 40 18 0 14 May 2024
Near to Mid-term Risks and Opportunities of Open-Source Generative AI Francisco Eiras Aleksandar Petrov Bertie Vidgen Christian Schroeder de Witt Fabio Pizzati ... Paul Röttger Philip H. S. Torr Trevor Darrell Y. Lee Jakob N. Foerster 46 6 0 25 Apr 2024
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order Taishi Nakamura Mayank Mishra Simone Tedeschi Yekun Chai Jason T Stillerman ... Virendra Mehta Matthew Blumberg Victor May Huu Nguyen S. Pyysalo LRM 28 7 0 30 Mar 2024
"I'm categorizing LLM as a productivity tool": Examining ethics of LLM use in HCI research practices Shivani Kapania Ruiyi Wang Toby Jia-Jun Li Tianshi Li Hong Shen 34 7 0 28 Mar 2024
What Motivates People to Trust ÁI' Systems? Nanna Inie 21 1 0 09 Mar 2024
Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation Jessica Quaye Alicia Parrish Oana Inel Charvi Rastogi Hannah Rose Kirk ... Nathan Clement Rafael Mosquera Juan Ciro Vijay Janapa Reddi Lora Aroyo 31 7 0 14 Feb 2024
Mapping the Ethics of Generative AI: A Comprehensive Scoping Review Thilo Hagendorff 21 35 0 13 Feb 2024
Temporal Blind Spots in Large Language Models Jonas Wallat Adam Jatowt Avishek Anand 38 3 0 22 Jan 2024
Red Teaming for Large Language Models At Scale: Tackling Hallucinations on Mathematics Tasks Aleksander Buszydlik Karol Dobiczek Michal Teodor Okoñ Konrad Skublicki Philip Lippmann Jie-jin Yang LRM ReLM 24 3 0 30 Dec 2023
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values Hannah Rose Kirk Andrew M. Bean Bertie Vidgen Paul Röttger Scott A. Hale ALM 19 41 0 11 Oct 2023
Regulation and NLP (RegNLP): Taming Large Language Models Catalina Goanta Nikolaos Aletras Ilias Chalkidis S. Ranchordas Gerasimos Spanakis AILaw 10 3 0 09 Oct 2023
Can LLM-Generated Misinformation Be Detected? Canyu Chen Kai Shu DeLMO 31 158 0 25 Sep 2023
Unlocking Model Insights: A Dataset for Automated Model Card Generation Shruti Singh Hitesh Lodwal Husain Malwat Rakesh Thakur Mayank Singh SyDa 24 3 0 22 Sep 2023
Frontier AI Regulation: Managing Emerging Risks to Public Safety Markus Anderljung Joslyn Barnhart Anton Korinek Jade Leung Cullen O'Keefe ... Jonas Schuett Yonadav Shavit Divya Siddarth Robert F. Trager Kevin J. Wolf SILM 44 118 0 06 Jul 2023
Seeing Seeds Beyond Weeds: Green Teaming Generative AI for Beneficial Uses Logan Stapleton Jordan Taylor Sarah E Fox Tongshuang Wu Haiyi Zhu 28 13 0 30 May 2023
Adversarial Nibbler: A Data-Centric Challenge for Improving the Safety of Text-to-Image Models Alicia Parrish Hannah Rose Kirk Jessica Quaye Charvi Rastogi Max Bartolo ... Addison Howard William J. Cukierski D. Sculley Vijay Janapa Reddi Lora Aroyo DiffM 43 13 0 22 May 2023
Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey Sachin Kumar Vidhisha Balachandran Lucille Njoo Antonios Anastasopoulos Yulia Tsvetkov ELM 74 85 0 14 Oct 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned Deep Ganguli Liane Lovitt John Kernion Amanda Askell Yuntao Bai ... Nicholas Joseph Sam McCandlish C. Olah Jared Kaplan Jack Clark 225 444 0 23 Aug 2022
Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-based Hate Hannah Rose Kirk B. Vidgen Paul Röttger Tristan Thrush Scott A. Hale 65 57 0 12 Aug 2021
A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation Tianyu Liu Yizhe Zhang Chris Brockett Yi Mao Zhifang Sui Weizhu Chen W. Dolan HILM 219 143 0 18 Apr 2021
Towards generalisable hate speech detection: a review on obstacles and solutions Wenjie Yin A. Zubiaga 117 164 0 17 Feb 2021