Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.17551
Cited By
Unpacking the Ethical Value Alignment in Big Models
26 October 2023
Xiaoyuan Yi
Jing Yao
Xiting Wang
Xing Xie
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Unpacking the Ethical Value Alignment in Big Models"
38 / 38 papers shown
Title
GPT for Games: An Updated Scoping Review (2020-2024)
Daijin Yang
Erica Kleinman
Casper Harteveld
LLMAG
AI4TS
AI4CE
133
3
0
01 Nov 2024
Neuron to Graph: Interpreting Language Model Neurons at Scale
Alex Foote
Neel Nanda
Esben Kran
Ioannis Konstas
Shay B. Cohen
Fazl Barez
MILM
68
26
0
31 May 2023
Aligning Large Language Models through Synthetic Feedback
Sungdong Kim
Sanghwan Bae
Jamin Shin
Soyoung Kang
Donghyun Kwak
Kang Min Yoo
Minjoon Seo
ALM
SyDa
110
70
0
23 May 2023
Explaining black box text modules in natural language with language models
Chandan Singh
Aliyah R. Hsu
Richard Antonello
Shailee Jain
Alexander G. Huth
Bin Yu
Jianfeng Gao
MILM
58
56
0
17 May 2023
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
Zhiqing Sun
Songlin Yang
Qinhong Zhou
Hongxin Zhang
Zhenfang Chen
David D. Cox
Yiming Yang
Chuang Gan
SyDa
ALM
90
332
0
04 May 2023
Boosting Theory-of-Mind Performance in Large Language Models via Prompting
Shima Rahimi Moghaddam
C. Honey
LLMAG
LRM
AI4CE
70
82
0
22 Apr 2023
RRHF: Rank Responses to Align Language Models with Human Feedback without tears
Zheng Yuan
Hongyi Yuan
Chuanqi Tan
Wei Wang
Songfang Huang
Feiran Huang
ALM
159
374
0
11 Apr 2023
TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs
Yaobo Liang
Chenfei Wu
Ting Song
Wenshan Wu
Yan Xia
...
Shaoguang Mao
Yuntao Wang
Linjun Shou
Ming Gong
Nan Duan
LLMAG
CLL
73
201
0
29 Mar 2023
Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback
Hannah Rose Kirk
Bertie Vidgen
Paul Röttger
Scott A. Hale
72
106
0
09 Mar 2023
Constitutional AI: Harmlessness from AI Feedback
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
...
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
SyDa
MoMe
196
1,618
0
15 Dec 2022
Scaling Instruction-Finetuned Language Models
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLM
LRM
189
3,128
0
20 Oct 2022
Unified Detoxifying and Debiasing in Language Generation via Inference-time Adaptive Optimization
Zonghan Yang
Xiaoyuan Yi
Peng Li
Yang Liu
Xing Xie
81
34
0
10 Oct 2022
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
346
1,091
0
05 Oct 2022
Moral Mimicry: Large Language Models Produce Moral Rationalizations Tailored to Political Identity
Gabriel Simmons
150
63
0
24 Sep 2022
The Alignment Problem from a Deep Learning Perspective
Richard Ngo
Lawrence Chan
Sören Mindermann
105
192
0
30 Aug 2022
Self-critiquing models for assisting human evaluators
William Saunders
Catherine Yeh
Jeff Wu
Steven Bills
Ouyang Long
Jonathan Ward
Jan Leike
ALM
ELM
103
302
0
12 Jun 2022
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya A. Ramesh
Prafulla Dhariwal
Alex Nichol
Casey Chu
Mark Chen
VLM
DiffM
401
6,866
0
13 Apr 2022
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILM
LRM
489
6,240
0
05 Apr 2022
A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning
Hugo Elias Berg
S. Hall
Yash Bhalgat
Wonsuk Yang
Hannah Rose Kirk
Aleksandar Shtedritski
Max Bain
VLM
81
101
0
22 Mar 2022
DeepNet: Scaling Transformers to 1,000 Layers
Hongyu Wang
Shuming Ma
Li Dong
Shaohan Huang
Dongdong Zhang
Furu Wei
MoE
AI4CE
126
162
0
01 Mar 2022
Controllable Natural Language Generation with Contrastive Prefixes
Jing Qian
Li Dong
Yelong Shen
Furu Wei
Weizhu Chen
67
98
0
27 Feb 2022
Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models
Wei Ping
Ming-Yu Liu
Chaowei Xiao
Peng Xu
M. Patwary
Mohammad Shoeybi
Yue Liu
Anima Anandkumar
Bryan Catanzaro
98
70
0
08 Feb 2022
Ethical and social risks of harm from Language Models
Laura Weidinger
John F. J. Mellor
Maribeth Rauh
Conor Griffin
J. Uesato
...
Lisa Anne Hendricks
William S. Isaac
Sean Legassick
G. Irving
Iason Gabriel
PILM
111
1,036
0
08 Dec 2021
A General Language Assistant as a Laboratory for Alignment
Amanda Askell
Yuntao Bai
Anna Chen
Dawn Drain
Deep Ganguli
...
Tom B. Brown
Jack Clark
Sam McCandlish
C. Olah
Jared Kaplan
ALM
118
779
0
01 Dec 2021
An Explanation of In-context Learning as Implicit Bayesian Inference
Sang Michael Xie
Aditi Raghunathan
Percy Liang
Tengyu Ma
ReLM
BDL
VPVLM
LRM
198
751
0
03 Nov 2021
Can Machines Learn Morality? The Delphi Experiment
Liwei Jiang
Jena D. Hwang
Chandra Bhagavatula
Ronan Le Bras
Jenny T Liang
...
Yulia Tsvetkov
Oren Etzioni
Maarten Sap
Regina A. Rini
Yejin Choi
FaML
179
121
0
14 Oct 2021
Towards Understanding and Mitigating Social Biases in Language Models
Paul Pu Liang
Chiyu Wu
Louis-Philippe Morency
Ruslan Salakhutdinov
93
389
0
24 Jun 2021
DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts
Alisa Liu
Maarten Sap
Ximing Lu
Swabha Swayamdipta
Chandra Bhagavatula
Noah A. Smith
Yejin Choi
MU
107
372
0
07 May 2021
FUDGE: Controlled Text Generation With Future Discriminators
Kevin Kaichuang Yang
Dan Klein
103
333
0
12 Apr 2021
GLM: General Language Model Pretraining with Autoregressive Blank Infilling
Zhengxiao Du
Yujie Qian
Xiao Liu
Ming Ding
J. Qiu
Zhilin Yang
Jie Tang
BDL
AI4CE
137
1,545
0
18 Mar 2021
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
Timo Schick
Sahana Udupa
Hinrich Schütze
306
385
0
28 Feb 2021
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
Basel Alomair
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
489
1,923
0
14 Dec 2020
Value Alignment Verification
Daniel S. Brown
Jordan Jack Schneider
Anca D. Dragan
S. Niekum
62
31
0
02 Dec 2020
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
Samuel Gehman
Suchin Gururangan
Maarten Sap
Yejin Choi
Noah A. Smith
158
1,199
0
24 Sep 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
605
4,822
0
23 Jan 2020
Plug and Play Language Models: A Simple Approach to Controlled Text Generation
Sumanth Dathathri
Andrea Madotto
Janice Lan
Jane Hung
Eric Frank
Piero Molino
J. Yosinski
Rosanne Liu
KELM
136
976
0
04 Dec 2019
Gender Bias in Neural Natural Language Processing
Kaiji Lu
Piotr (Peter) Mardziel
Fangjing Wu
Preetam Amancharla
Anupam Datta
117
355
0
31 Jul 2018
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
Kyunghyun Cho
B. V. Merrienboer
Çağlar Gülçehre
Dzmitry Bahdanau
Fethi Bougares
Holger Schwenk
Yoshua Bengio
AIMat
1.0K
23,354
0
03 Jun 2014
1