ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.13449
  4. Cited By
The Poison of Alignment

The Poison of Alignment

25 August 2023
Aibek Bekbayev
Sungbae Chun
Yerzat Dulat
James Yamazaki
ArXivPDFHTML

Papers citing "The Poison of Alignment"

12 / 12 papers shown
Title
Narrative-of-Thought: Improving Temporal Reasoning of Large Language
  Models via Recounted Narratives
Narrative-of-Thought: Improving Temporal Reasoning of Large Language Models via Recounted Narratives
Xinliang Frederick Zhang
Nick Beauchamp
Lu Wang
LRM
AI4CE
27
3
0
07 Oct 2024
PrimeGuard: Safe and Helpful LLMs through Tuning-Free Routing
PrimeGuard: Safe and Helpful LLMs through Tuning-Free Routing
Blazej Manczak
Eliott Zemour
Eric Lin
Vaikkunth Mugunthan
26
2
0
23 Jul 2024
Would I Lie To You? Inference Time Alignment of Language Models using
  Direct Preference Heads
Would I Lie To You? Inference Time Alignment of Language Models using Direct Preference Heads
Avelina Asada Hadji-Kyriacou
Ognjen Arandjelović
20
1
0
30 May 2024
Constructing Benchmarks and Interventions for Combating Hallucinations
  in LLMs
Constructing Benchmarks and Interventions for Combating Hallucinations in LLMs
Adi Simhi
Jonathan Herzig
Idan Szpektor
Yonatan Belinkov
HILM
48
10
0
15 Apr 2024
Nevermind: Instruction Override and Moderation in Large Language Models
Nevermind: Instruction Override and Moderation in Large Language Models
Edward Kim
ALM
18
0
0
05 Feb 2024
Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models
Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models
Xianjun Yang
Xiao Wang
Qi Zhang
Linda R. Petzold
William Yang Wang
Xun Zhao
Dahua Lin
23
161
0
04 Oct 2023
Poisoning Language Models During Instruction Tuning
Poisoning Language Models During Instruction Tuning
Alexander Wan
Eric Wallace
Sheng Shen
Dan Klein
SILM
92
124
0
01 May 2023
Large Language Model Instruction Following: A Survey of Progresses and
  Challenges
Large Language Model Instruction Following: A Survey of Progresses and Challenges
Renze Lou
Kai Zhang
Wenpeng Yin
ALM
LRM
29
20
0
18 Mar 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,953
0
04 Mar 2022
Deduplicating Training Data Makes Language Models Better
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
242
593
0
14 Jul 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
256
1,996
0
31 Dec 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
246
4,489
0
23 Jan 2020
1