Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.10328
Cited By
Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets
18 June 2021
Irene Solaiman
Christy Dennison
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets"
22 / 22 papers shown
Title
Following the Whispers of Values: Unraveling Neural Mechanisms Behind Value-Oriented Behaviors in LLMs
Ling Hu
Yuemei Xu
Xiaoyang Gu
Letao Han
79
0
0
07 Apr 2025
GuardReasoner: Towards Reasoning-based LLM Safeguards
Yue Liu
Hongcheng Gao
Shengfang Zhai
Jun Xia
Tianyi Wu
Zhiwei Xue
Yuxiao Chen
Kenji Kawaguchi
Jiaheng Zhang
Bryan Hooi
AI4TS
LRM
146
20
0
30 Jan 2025
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training
Youliang Yuan
Wenxiang Jiao
Wenxuan Wang
Jen-tse Huang
Jiahao Xu
Tian Liang
Pinjia He
Zhaopeng Tu
56
24
0
12 Jul 2024
Few-shot Personalization of LLMs with Mis-aligned Responses
Jaehyung Kim
Yiming Yang
81
9
0
26 Jun 2024
Taxonomy and Analysis of Sensitive User Queries in Generative AI Search
Hwiyeol Jo
Taiwoo Park
Nayoung Choi
Changbong Kim
Ohjoon Kwon
...
Kyoungho Shin
Sun Suk Lim
Kyungmi Kim
Jihye Lee
Sun Kim
70
0
0
05 Apr 2024
ADEPT: A DEbiasing PrompT Framework
Ke Yang
Charles Yu
Yi R. Fung
Manling Li
Heng Ji
40
24
0
10 Nov 2022
Detoxifying Language Models Risks Marginalizing Minority Voices
Albert Xu
Eshaan Pathak
Eric Wallace
Suchin Gururangan
Maarten Sap
Dan Klein
31
125
0
13 Apr 2021
Alignment of Language Agents
Zachary Kenton
Tom Everitt
Laura Weidinger
Iason Gabriel
Vladimir Mikulik
G. Irving
32
163
0
26 Mar 2021
BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation
Jwala Dhamala
Tony Sun
Varun Kumar
Satyapriya Krishna
Yada Pruksachatkun
Kai-Wei Chang
Rahul Gupta
50
384
0
27 Jan 2021
Persistent Anti-Muslim Bias in Large Language Models
Abubakar Abid
Maheen Farooqi
James Zou
AILaw
60
545
0
14 Jan 2021
Data and its (dis)contents: A survey of dataset development and use in machine learning research
Amandalynne Paullada
Inioluwa Deborah Raji
Emily M. Bender
Emily L. Denton
A. Hanna
73
518
0
09 Dec 2020
Learning from others' mistakes: Avoiding dataset biases without modeling them
Victor Sanh
Thomas Wolf
Yonatan Belinkov
Alexander M. Rush
30
116
0
02 Dec 2020
Recipes for Safety in Open-domain Chatbots
Jing Xu
Da Ju
Margaret Li
Y-Lan Boureau
Jason Weston
Emily Dinan
35
232
0
14 Oct 2020
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
Samuel Gehman
Suchin Gururangan
Maarten Sap
Yejin Choi
Noah A. Smith
104
1,168
0
24 Sep 2020
Towards Debiasing Sentence Representations
Paul Pu Liang
Irene Li
Emily Zheng
Y. Lim
Ruslan Salakhutdinov
Louis-Philippe Morency
44
236
0
16 Jul 2020
Decolonial AI: Decolonial Theory as Sociotechnical Foresight in Artificial Intelligence
Shakir Mohamed
Marie-Therese Png
William S. Isaac
49
402
0
08 Jul 2020
Extending the Machine Learning Abstraction Boundary: A Complex Systems Approach to Incorporate Societal Context
Donald Martin
Vinodkumar Prabhakaran
Jill A. Kuhlberg
A. Smart
William S. Isaac
FaML
33
40
0
17 Jun 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
364
41,106
0
28 May 2020
Language (Technology) is Power: A Critical Survey of "Bias" in NLP
Su Lin Blodgett
Solon Barocas
Hal Daumé
Hanna M. Wallach
77
1,211
0
28 May 2020
Don't Stop Pretraining: Adapt Language Models to Domains and Tasks
Suchin Gururangan
Ana Marasović
Swabha Swayamdipta
Kyle Lo
Iz Beltagy
Doug Downey
Noah A. Smith
VLM
AI4CE
CLL
90
2,380
0
23 Apr 2020
The Curious Case of Neural Text Degeneration
Ari Holtzman
Jan Buys
Li Du
Maxwell Forbes
Yejin Choi
133
3,133
0
22 Apr 2019
Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings
Tolga Bolukbasi
Kai-Wei Chang
James Zou
Venkatesh Saligrama
Adam Kalai
CVBM
FaML
38
3,115
0
21 Jul 2016
1