ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.18190
  4. Cited By
Assessing Language Model Deployment with Risk Cards

Assessing Language Model Deployment with Risk Cards

31 March 2023
Leon Derczynski
Hannah Rose Kirk
Vidhisha Balachandran
Sachin Kumar
Yulia Tsvetkov
M. Leiser
Saif Mohammad
ArXivPDFHTML

Papers citing "Assessing Language Model Deployment with Risk Cards"

28 / 28 papers shown
Title
Position: A taxonomy for reporting and describing AI security incidents
Position: A taxonomy for reporting and describing AI security incidents
L. Bieringer
Kevin Paeth
Andreas Wespi
Kathrin Grosse
Alexandre Alahi
Kathrin Grosse
78
0
0
19 Dec 2024
Ethics Whitepaper: Whitepaper on Ethical Research into Large Language
  Models
Ethics Whitepaper: Whitepaper on Ethical Research into Large Language Models
Eddie L. Ungless
Nikolas Vitsakis
Zeerak Talat
James Garforth
Bjorn Ross
Arno Onken
Atoosa Kasirzadeh
Alexandra Birch
28
1
0
17 Oct 2024
BenchmarkCards: Large Language Model and Risk Reporting
BenchmarkCards: Large Language Model and Risk Reporting
Anna Sokol
Nuno Moniz
Elizabeth M. Daly
Michael Hind
Nitesh V. Chawla
31
0
0
16 Oct 2024
The Art of Saying No: Contextual Noncompliance in Language Models
The Art of Saying No: Contextual Noncompliance in Language Models
Faeze Brahman
Sachin Kumar
Vidhisha Balachandran
Pradeep Dasigi
Valentina Pyatkin
...
Jack Hessel
Yulia Tsvetkov
Noah A. Smith
Yejin Choi
Hannaneh Hajishirzi
72
20
0
02 Jul 2024
Documentation Practices of Artificial Intelligence
Documentation Practices of Artificial Intelligence
Stefan Arnold
Dilara Yesilbas
Rene Gröbner
Dominik Riedelbauch
Maik Horn
Sven Weinzierl
AI4TS
26
0
0
26 Jun 2024
garak: A Framework for Security Probing Large Language Models
garak: A Framework for Security Probing Large Language Models
Leon Derczynski
Erick Galinkin
Jeffrey Martin
Subho Majumdar
Nanna Inie
AAML
ELM
38
16
0
16 Jun 2024
Hacc-Man: An Arcade Game for Jailbreaking LLMs
Hacc-Man: An Arcade Game for Jailbreaking LLMs
Matheus Valentim
Jeanette Falk
Nanna Inie
LLMAG
29
5
0
24 May 2024
Risks and Opportunities of Open-Source Generative AI
Risks and Opportunities of Open-Source Generative AI
Francisco Eiras
Aleksander Petrov
Bertie Vidgen
Christian Schroeder
Fabio Pizzati
...
Matthew Jackson
Phillip H. S. Torr
Trevor Darrell
Y. Lee
Jakob N. Foerster
40
18
0
14 May 2024
Near to Mid-term Risks and Opportunities of Open-Source Generative AI
Near to Mid-term Risks and Opportunities of Open-Source Generative AI
Francisco Eiras
Aleksandar Petrov
Bertie Vidgen
Christian Schroeder de Witt
Fabio Pizzati
...
Paul Röttger
Philip H. S. Torr
Trevor Darrell
Y. Lee
Jakob N. Foerster
46
6
0
25 Apr 2024
Aurora-M: The First Open Source Multilingual Language Model Red-teamed
  according to the U.S. Executive Order
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order
Taishi Nakamura
Mayank Mishra
Simone Tedeschi
Yekun Chai
Jason T Stillerman
...
Virendra Mehta
Matthew Blumberg
Victor May
Huu Nguyen
S. Pyysalo
LRM
28
7
0
30 Mar 2024
"I'm categorizing LLM as a productivity tool": Examining ethics of LLM
  use in HCI research practices
"I'm categorizing LLM as a productivity tool": Examining ethics of LLM use in HCI research practices
Shivani Kapania
Ruiyi Wang
Toby Jia-Jun Li
Tianshi Li
Hong Shen
34
7
0
28 Mar 2024
What Motivates People to Trust ÁI' Systems?
What Motivates People to Trust ÁI' Systems?
Nanna Inie
21
1
0
09 Mar 2024
Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse
  Harms in Text-to-Image Generation
Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation
Jessica Quaye
Alicia Parrish
Oana Inel
Charvi Rastogi
Hannah Rose Kirk
...
Nathan Clement
Rafael Mosquera
Juan Ciro
Vijay Janapa Reddi
Lora Aroyo
31
7
0
14 Feb 2024
Mapping the Ethics of Generative AI: A Comprehensive Scoping Review
Mapping the Ethics of Generative AI: A Comprehensive Scoping Review
Thilo Hagendorff
21
35
0
13 Feb 2024
Temporal Blind Spots in Large Language Models
Temporal Blind Spots in Large Language Models
Jonas Wallat
Adam Jatowt
Avishek Anand
38
3
0
22 Jan 2024
Red Teaming for Large Language Models At Scale: Tackling Hallucinations
  on Mathematics Tasks
Red Teaming for Large Language Models At Scale: Tackling Hallucinations on Mathematics Tasks
Aleksander Buszydlik
Karol Dobiczek
Michal Teodor Okoñ
Konrad Skublicki
Philip Lippmann
Jie-jin Yang
LRM
ReLM
24
3
0
30 Dec 2023
The Past, Present and Better Future of Feedback Learning in Large
  Language Models for Subjective Human Preferences and Values
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values
Hannah Rose Kirk
Andrew M. Bean
Bertie Vidgen
Paul Röttger
Scott A. Hale
ALM
19
41
0
11 Oct 2023
Regulation and NLP (RegNLP): Taming Large Language Models
Regulation and NLP (RegNLP): Taming Large Language Models
Catalina Goanta
Nikolaos Aletras
Ilias Chalkidis
S. Ranchordas
Gerasimos Spanakis
AILaw
10
3
0
09 Oct 2023
Can LLM-Generated Misinformation Be Detected?
Can LLM-Generated Misinformation Be Detected?
Canyu Chen
Kai Shu
DeLMO
31
158
0
25 Sep 2023
Unlocking Model Insights: A Dataset for Automated Model Card Generation
Unlocking Model Insights: A Dataset for Automated Model Card Generation
Shruti Singh
Hitesh Lodwal
Husain Malwat
Rakesh Thakur
Mayank Singh
SyDa
24
3
0
22 Sep 2023
Frontier AI Regulation: Managing Emerging Risks to Public Safety
Frontier AI Regulation: Managing Emerging Risks to Public Safety
Markus Anderljung
Joslyn Barnhart
Anton Korinek
Jade Leung
Cullen O'Keefe
...
Jonas Schuett
Yonadav Shavit
Divya Siddarth
Robert F. Trager
Kevin J. Wolf
SILM
44
118
0
06 Jul 2023
Seeing Seeds Beyond Weeds: Green Teaming Generative AI for Beneficial
  Uses
Seeing Seeds Beyond Weeds: Green Teaming Generative AI for Beneficial Uses
Logan Stapleton
Jordan Taylor
Sarah E Fox
Tongshuang Wu
Haiyi Zhu
28
13
0
30 May 2023
Adversarial Nibbler: A Data-Centric Challenge for Improving the Safety
  of Text-to-Image Models
Adversarial Nibbler: A Data-Centric Challenge for Improving the Safety of Text-to-Image Models
Alicia Parrish
Hannah Rose Kirk
Jessica Quaye
Charvi Rastogi
Max Bartolo
...
Addison Howard
William J. Cukierski
D. Sculley
Vijay Janapa Reddi
Lora Aroyo
DiffM
43
13
0
22 May 2023
Language Generation Models Can Cause Harm: So What Can We Do About It?
  An Actionable Survey
Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey
Sachin Kumar
Vidhisha Balachandran
Lucille Njoo
Antonios Anastasopoulos
Yulia Tsvetkov
ELM
74
85
0
14 Oct 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors,
  and Lessons Learned
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
225
444
0
23 Aug 2022
Hatemoji: A Test Suite and Adversarially-Generated Dataset for
  Benchmarking and Detecting Emoji-based Hate
Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-based Hate
Hannah Rose Kirk
B. Vidgen
Paul Röttger
Tristan Thrush
Scott A. Hale
65
57
0
12 Aug 2021
A Token-level Reference-free Hallucination Detection Benchmark for
  Free-form Text Generation
A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation
Tianyu Liu
Yizhe Zhang
Chris Brockett
Yi Mao
Zhifang Sui
Weizhu Chen
W. Dolan
HILM
219
143
0
18 Apr 2021
Towards generalisable hate speech detection: a review on obstacles and
  solutions
Towards generalisable hate speech detection: a review on obstacles and solutions
Wenjie Yin
A. Zubiaga
117
164
0
17 Feb 2021
1