ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.00861
  4. Cited By
A General Language Assistant as a Laboratory for Alignment

A General Language Assistant as a Laboratory for Alignment

1 December 2021
Amanda Askell
Yuntao Bai
Anna Chen
Dawn Drain
Deep Ganguli
T. Henighan
Andy Jones
Nicholas Joseph
Benjamin Mann
Nova Dassarma
Nelson Elhage
Zac Hatfield-Dodds
Danny Hernandez
John Kernion
Kamal Ndousse
Catherine Olsson
Dario Amodei
Tom B. Brown
Jack Clark
Sam McCandlish
C. Olah
Jared Kaplan
    ALM
ArXivPDFHTML

Papers citing "A General Language Assistant as a Laboratory for Alignment"

32 / 182 papers shown
Title
Large Language Models Are Human-Level Prompt Engineers
Large Language Models Are Human-Level Prompt Engineers
Yongchao Zhou
Andrei Ioan Muresanu
Ziwen Han
Keiran Paster
Silviu Pitis
Harris Chan
Jimmy Ba
ALM
LLMAG
21
835
0
03 Nov 2022
When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad
  Responses into Good Labels
When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels
Weiyan Shi
Emily Dinan
Kurt Shuster
Jason Weston
Jing Xu
52
19
0
28 Oct 2022
Broken Neural Scaling Laws
Broken Neural Scaling Laws
Ethan Caballero
Kshitij Gupta
Irina Rish
David M. Krueger
30
74
0
26 Oct 2022
Continued Pretraining for Better Zero- and Few-Shot Promptability
Continued Pretraining for Better Zero- and Few-Shot Promptability
Zhaofeng Wu
IV RobertL.Logan
Pete Walsh
Akshita Bhagia
Dirk Groeneveld
Sameer Singh
Iz Beltagy
VLM
44
12
0
19 Oct 2022
Mitigating Covertly Unsafe Text within Natural Language Systems
Mitigating Covertly Unsafe Text within Natural Language Systems
Alex Mei
Anisha Kabir
Sharon Levy
Melanie Subbiah
Emily Allaway
J. Judge
D. Patton
Bruce Bimber
Kathleen McKeown
William Yang Wang
53
13
0
17 Oct 2022
When to Make Exceptions: Exploring Language Models as Accounts of Human
  Moral Judgment
When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment
Zhijing Jin
Sydney Levine
Fernando Gonzalez
Ojasv Kamal
Maarten Sap
Mrinmaya Sachan
Rada Mihalcea
J. Tenenbaum
Bernhard Schölkopf
ELM
LRM
34
90
0
04 Oct 2022
Improving alignment of dialogue agents via targeted human judgements
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
239
506
0
28 Sep 2022
In conversation with Artificial Intelligence: aligning language models
  with human values
In conversation with Artificial Intelligence: aligning language models with human values
Atoosa Kasirzadeh
Iason Gabriel
24
98
0
01 Sep 2022
Towards Boosting the Open-Domain Chatbot with Human Feedback
Towards Boosting the Open-Domain Chatbot with Human Feedback
Hua Lu
Siqi Bao
H. He
Fan Wang
Hua Wu
Haifeng Wang
ALM
20
18
0
30 Aug 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors,
  and Lessons Learned
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
234
447
0
23 Aug 2022
Learning New Skills after Deployment: Improving open-domain
  internet-driven dialogue with human feedback
Learning New Skills after Deployment: Improving open-domain internet-driven dialogue with human feedback
Jing Xu
Megan Ung
M. Komeili
Kushal Arora
Y-Lan Boureau
Jason Weston
30
37
0
05 Aug 2022
A Hazard Analysis Framework for Code Synthesis Large Language Models
A Hazard Analysis Framework for Code Synthesis Large Language Models
Heidy Khlaaf
Pamela Mishkin
Joshua Achiam
Gretchen Krueger
Miles Brundage
ELM
22
28
0
25 Jul 2022
Language Models (Mostly) Know What They Know
Language Models (Mostly) Know What They Know
Saurav Kadavath
Tom Conerly
Amanda Askell
T. Henighan
Dawn Drain
...
Nicholas Joseph
Benjamin Mann
Sam McCandlish
C. Olah
Jared Kaplan
ELM
59
722
0
11 Jul 2022
Machine Learning Model Sizes and the Parameter Gap
Machine Learning Model Sizes and the Parameter Gap
Pablo Villalobos
J. Sevilla
T. Besiroglu
Lennart Heim
A. Ho
Marius Hobbhahn
ALM
ELM
AI4CE
33
58
0
05 Jul 2022
DIRECTOR: Generator-Classifiers For Supervised Language Modeling
DIRECTOR: Generator-Classifiers For Supervised Language Modeling
Kushal Arora
Kurt Shuster
Sainbayar Sukhbaatar
Jason Weston
VLM
32
40
0
15 Jun 2022
Emergent Abilities of Large Language Models
Emergent Abilities of Large Language Models
Jason W. Wei
Yi Tay
Rishi Bommasani
Colin Raffel
Barret Zoph
...
Tatsunori Hashimoto
Oriol Vinyals
Percy Liang
J. Dean
W. Fedus
ELM
ReLM
LRM
90
2,364
0
15 Jun 2022
Teaching Models to Express Their Uncertainty in Words
Teaching Models to Express Their Uncertainty in Words
Stephanie C. Lin
Jacob Hilton
Owain Evans
OOD
35
368
0
28 May 2022
Non-Programmers Can Label Programs Indirectly via Active Examples: A
  Case Study with Text-to-SQL
Non-Programmers Can Label Programs Indirectly via Active Examples: A Case Study with Text-to-SQL
Ruiqi Zhong
Charles Burton Snell
Dan Klein
Jason Eisner
24
8
0
25 May 2022
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
Sid Black
Stella Biderman
Eric Hallahan
Quentin G. Anthony
Leo Gao
...
Shivanshu Purohit
Laria Reynolds
J. Tow
Benqi Wang
Samuel Weinbach
102
803
0
14 Apr 2022
Training a Helpful and Harmless Assistant with Reinforcement Learning
  from Human Feedback
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
...
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
95
2,352
0
12 Apr 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
372
12,081
0
04 Mar 2022
Red Teaming Language Models with Language Models
Red Teaming Language Models with Language Models
Ethan Perez
Saffron Huang
Francis Song
Trevor Cai
Roman Ring
John Aslanides
Amelia Glaese
Nat McAleese
G. Irving
AAML
13
611
0
07 Feb 2022
Datasheet for the Pile
Datasheet for the Pile
Stella Biderman
Kieran Bicheno
Leo Gao
52
35
0
13 Jan 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
218
1,663
0
15 Oct 2021
Unsolved Problems in ML Safety
Unsolved Problems in ML Safety
Dan Hendrycks
Nicholas Carlini
John Schulman
Jacob Steinhardt
186
276
0
28 Sep 2021
Challenges in Detoxifying Language Models
Challenges in Detoxifying Language Models
Johannes Welbl
Amelia Glaese
J. Uesato
Sumanth Dathathri
John F. J. Mellor
Lisa Anne Hendricks
Kirsty Anderson
Pushmeet Kohli
Ben Coppin
Po-Sen Huang
LM&MA
253
193
0
15 Sep 2021
TruthfulQA: Measuring How Models Mimic Human Falsehoods
TruthfulQA: Measuring How Models Mimic Human Falsehoods
Stephanie C. Lin
Jacob Hilton
Owain Evans
HILM
57
1,750
0
08 Sep 2021
Internet-Augmented Dialogue Generation
Internet-Augmented Dialogue Generation
M. Komeili
Kurt Shuster
Jason Weston
RALM
244
281
0
15 Jul 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
280
3,872
0
18 Apr 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
282
2,000
0
31 Dec 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
264
4,505
0
23 Jan 2020
AI safety via debate
AI safety via debate
G. Irving
Paul Christiano
Dario Amodei
204
203
0
02 May 2018
Previous
1234