Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2112.04359
Cited By
Ethical and social risks of harm from Language Models
8 December 2021
Laura Weidinger
John F. J. Mellor
Maribeth Rauh
Conor Griffin
J. Uesato
Po-Sen Huang
Myra Cheng
Mia Glaese
Borja Balle
Atoosa Kasirzadeh
Zachary Kenton
S. Brown
Will Hawkins
T. Stepleton
Courtney Biles
Abeba Birhane
Julia Haas
Laura Rimell
Lisa Anne Hendricks
William S. Isaac
Sean Legassick
G. Irving
Iason Gabriel
PILM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Ethical and social risks of harm from Language Models"
50 / 634 papers shown
Title
Enhancing user experience in large language models through human-centered design: Integrating theoretical insights with an experimental study to meet diverse software learning needs with a single document knowledge base
Yuchen Wang
Yin-Shan Lin
Ruixin Huang
Jinyin Wang
Sensen Liu
60
7
0
19 May 2024
Conformal Alignment: Knowing When to Trust Foundation Models with Guarantees
Yu Gui
Ying Jin
Zhimei Ren
MedIm
237
24
0
16 May 2024
Simulating Policy Impacts: Developing a Generative Scenario Writing Method to Evaluate the Perceived Effects of Regulation
Julia Barnett
Kimon Kieslich
Nicholas Diakopoulos
57
4
0
15 May 2024
Navigating LLM Ethics: Advancements, Challenges, and Future Directions
Junfeng Jiao
S. Afroogh
Yiming Xu
Connor Phillips
AILaw
134
23
0
14 May 2024
LLM Theory of Mind and Alignment: Opportunities and Risks
Winnie Street
83
10
0
13 May 2024
Designing and Evaluating Dialogue LLMs for Co-Creative Improvised Theatre
Boyd Branch
Piotr Wojciech Mirowski
Kory W. Mathewson
Sophia Ppali
A. Covaci
113
2
0
11 May 2024
Believing Anthropomorphism: Examining the Role of Anthropomorphic Cues on Trust in Large Language Models
Michelle Cohn
Mahima Pushkarna
Gbolahan O. Olanubi
Joseph M. Moran
Daniel Padgett
Zion Mengesha
Courtney Heldreth
63
24
0
09 May 2024
People cannot distinguish GPT-4 from a human in a Turing test
Cameron R. Jones
Benjamin K. Bergen
ELM
DeLMO
90
34
0
09 May 2024
The Silicon Ceiling: Auditing GPT's Race and Gender Biases in Hiring
Lena Armstrong
Abbey Liu
Stephen MacNeil
D. Metaxa
71
21
0
07 May 2024
FairMonitor: A Dual-framework for Detecting Stereotypes and Biases in Large Language Models
Yanhong Bai
Jiabao Zhao
Jinxin Shi
Zhentao Xie
Xingjiao Wu
Liang He
59
3
0
06 May 2024
Social Life Simulation for Non-Cognitive Skills Learning
Zihan Yan
Yaohong Xiang
Yun Huang
53
1
0
01 May 2024
More RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model Trustworthiness
Aaron Jiaxun Li
Satyapriya Krishna
Himabindu Lakkaraju
54
4
0
29 Apr 2024
LangBiTe: A Platform for Testing Bias in Large Language Models
Sergio Morales
Robert Clarisó
Jordi Cabot
43
2
0
29 Apr 2024
KGValidator: A Framework for Automatic Validation of Knowledge Graph Construction
Jack Boylan
Shashank Mangla
Dominic Thorn
D. Ghalandari
Parsa Ghaffari
Chris Hokamp
SLR
94
2
0
24 Apr 2024
Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches
Pablo Biedma
Xiaoyuan Yi
Linus Huang
Maosong Sun
Xing Xie
PILM
101
6
0
19 Apr 2024
Token-level Direct Preference Optimization
Yongcheng Zeng
Guoqing Liu
Weiyu Ma
Ning Yang
Haifeng Zhang
Jun Wang
116
64
0
18 Apr 2024
Taxonomy to Regulation: A (Geo)Political Taxonomy for AI Risks and Regulatory Measures in the EU AI Act
Sinan Arda
38
3
0
17 Apr 2024
Fewer Truncations Improve Language Modeling
Hantian Ding
Zijian Wang
Giovanni Paolini
Varun Kumar
Anoop Deoras
Dan Roth
Stefano Soatto
111
14
0
16 Apr 2024
Private Attribute Inference from Images with Vision-Language Models
Batuhan Tömekçe
Mark Vero
Robin Staab
Martin Vechev
VLM
PILM
90
10
0
16 Apr 2024
DESTEIN: Navigating Detoxification of Language Models via Universal Steering Pairs and Head-wise Activation Fusion
Yu Li
Zhihua Wei
Han Jiang
Chuanyang Gong
LLMSV
75
3
0
16 Apr 2024
Unveiling LLM Evaluation Focused on Metrics: Challenges and Solutions
Taojun Hu
Xiao-Hua Zhou
ELM
81
18
0
14 Apr 2024
Best Practices and Lessons Learned on Synthetic Data for Language Models
Ruibo Liu
Jerry W. Wei
Fangyu Liu
Chenglei Si
Yanzhe Zhang
...
Steven Zheng
Daiyi Peng
Diyi Yang
Denny Zhou
Andrew M. Dai
SyDa
EgoV
126
96
0
11 Apr 2024
Automatic Authorities: Power and AI
Seth Lazar
52
2
0
09 Apr 2024
ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming
Simone Tedeschi
Felix Friedrich
P. Schramowski
Kristian Kersting
Roberto Navigli
Huu Nguyen
Bo Li
ELM
112
52
0
06 Apr 2024
Uncertainty in Language Models: Assessment through Rank-Calibration
Xinmeng Huang
Shuo Li
Mengxin Yu
Matteo Sesia
Hamed Hassani
Insup Lee
Osbert Bastani
Yan Sun
97
19
0
04 Apr 2024
GPT-DETOX: An In-Context Learning-Based Paraphraser for Text Detoxification
Ali Pesaranghader
Nikhil Verma
Manasa Bharadwaj
89
5
0
03 Apr 2024
Explainability in JupyterLab and Beyond: Interactive XAI Systems for Integrated and Collaborative Workflows
G. Guo
Dustin L. Arendt
Alex Endert
71
1
0
02 Apr 2024
GUARD-D-LLM: An LLM-Based Risk Assessment Engine for the Downstream uses of LLMs
Sundaraparipurnan Narayanan
Sandeep Vishwakarma
110
3
0
02 Apr 2024
Lipsum-FT: Robust Fine-Tuning of Zero-Shot Models Using Random Text Guidance
G. Nam
Byeongho Heo
Juho Lee
VLM
73
7
0
01 Apr 2024
Fairness in Large Language Models: A Taxonomic Survey
Zhibo Chu
Zichong Wang
Wenbin Zhang
AILaw
127
42
0
31 Mar 2024
"I'm categorizing LLM as a productivity tool": Examining ethics of LLM use in HCI research practices
Shivani Kapania
Ruiyi Wang
Toby Jia-Jun Li
Tianshi Li
Hong Shen
87
11
0
28 Mar 2024
Measuring Political Bias in Large Language Models: What Is Said and How It Is Said
Yejin Bang
Delong Chen
Nayeon Lee
Pascale Fung
85
41
0
27 Mar 2024
Large Language Models for Education: A Survey and Outlook
Shen Wang
Tianlong Xu
Hang Li
Chaoli Zhang
Joleen Liang
Jiliang Tang
Philip S. Yu
Qingsong Wen
AI4Ed
114
121
0
26 Mar 2024
Assessment of Multimodal Large Language Models in Alignment with Human Values
Zhelun Shi
Zhipin Wang
Hongxing Fan
Zaibin Zhang
Lijun Li
Yongting Zhang
Zhen-fei Yin
Lu Sheng
Yu Qiao
Jing Shao
77
22
0
26 Mar 2024
Targeted Visualization of the Backbone of Encoder LLMs
Isaac Roberts
Alexander Schulz
L. Hermes
Barbara Hammer
49
0
0
26 Mar 2024
Decoding the Digital Fine Print: Navigating the potholes in Terms of service/ use of GenAI tools against the emerging need for Transparent and Trustworthy Tech Futures
Sundaraparipurnan Narayanan
41
0
0
26 Mar 2024
Don't Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models
Zhiyuan Yu
Xiaogeng Liu
Shunning Liang
Zach Cameron
Chaowei Xiao
Ning Zhang
91
53
0
26 Mar 2024
Risk and Response in Large Language Models: Evaluating Key Threat Categories
Bahareh Harandizadeh
A. Salinas
Fred Morstatter
98
4
0
22 Mar 2024
The opportunities and risks of large language models in mental health
Hannah R. Lawrence
Renee A. Schneider
Susan B. Rubin
Maja J. Mataric
Daniel McDuff
Megan Jones Bell
AI4MH
81
45
0
21 Mar 2024
Testing the Limits of Jailbreaking Defenses with the Purple Problem
Taeyoun Kim
Suhas Kotha
Aditi Raghunathan
AAML
84
6
0
20 Mar 2024
Adaptive Ensembles of Fine-Tuned Transformers for LLM-Generated Text Detection
Zhixin Lai
Xuesheng Zhang
Suiyao Chen
DeLMO
75
36
0
20 Mar 2024
From Representational Harms to Quality-of-Service Harms: A Case Study on Llama 2 Safety Safeguards
Khaoula Chehbouni
Megha Roshan
Emmanuel Ma
Futian Andrew Wei
Afaf Taik
Jackie CK Cheung
G. Farnadi
66
9
0
20 Mar 2024
Can AI Outperform Human Experts in Creating Social Media Creatives?
Eunkyung Park
Raymond K. Wong
Junbum Kwon
62
0
0
19 Mar 2024
Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices
Sara Abdali
Richard Anarfi
C. Barberan
Jia He
Erfan Shayegani
PILM
137
31
0
19 Mar 2024
OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety
Chuang Liu
Linhao Yu
Jiaxuan Li
Renren Jin
Yufei Huang
...
Tao Liu
Jinwang Song
Hongying Zan
Sun Li
Deyi Xiong
ELM
100
7
0
18 Mar 2024
Reinforcement Learning with Token-level Feedback for Controllable Text Generation
Wendi Li
Xiaoye Qu
Kaihe Xu
Wenfeng Xie
Dangyang Chen
Yu Cheng
90
7
0
18 Mar 2024
Safeguarding Marketing Research: The Generation, Identification, and Mitigation of AI-Fabricated Disinformation
Anirban Mukherjee
62
4
0
17 Mar 2024
SOTOPIA-
π
π
π
: Interactive Learning of Socially Intelligent Language Agents
Ruiyi Wang
Haofei Yu
W. Zhang
Zhengyang Qi
Maarten Sap
Graham Neubig
Yonatan Bisk
Hao Zhu
LLMAG
111
44
0
13 Mar 2024
Language models scale reliably with over-training and on downstream tasks
S. Gadre
Georgios Smyrnis
Vaishaal Shankar
Suchin Gururangan
Mitchell Wortsman
...
Y. Carmon
Achal Dave
Reinhard Heckel
Niklas Muennighoff
Ludwig Schmidt
ALM
ELM
LRM
180
48
0
13 Mar 2024
Knowledge Conflicts for LLMs: A Survey
Rongwu Xu
Zehan Qi
Zhijiang Guo
Cunxiang Wang
Hongru Wang
Yue Zhang
Wei Xu
296
122
0
13 Mar 2024
Previous
1
2
3
4
5
6
...
11
12
13
Next