Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.10150
Cited By
Jailbreak Instruction-Tuned LLMs via end-of-sentence MLP Re-weighting
14 October 2024
Yifan Luo
Zhennan Zhou
Meitan Wang
Bin Dong
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Jailbreak Instruction-Tuned LLMs via end-of-sentence MLP Re-weighting"
7 / 7 papers shown
Title
Removing RLHF Protections in GPT-4 via Fine-Tuning
Qiusi Zhan
Richard Fang
R. Bindu
Akul Gupta
Tatsunori Hashimoto
Daniel Kang
MU
AAML
53
101
0
09 Nov 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
1.4K
14,359
0
15 Mar 2023
Scaling Instruction-Finetuned Language Models
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLM
LRM
187
3,128
0
20 Oct 2022
Toy Models of Superposition
Nelson Elhage
Tristan Hume
Catherine Olsson
Nicholas Schiefer
T. Henighan
...
Sam McCandlish
Jared Kaplan
Dario Amodei
Martin Wattenberg
C. Olah
AAML
MILM
183
368
0
21 Sep 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
874
12,973
0
04 Mar 2022
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLM
OffRL
LRM
285
4,408
0
27 Oct 2021
Finetuned Language Models Are Zero-Shot Learners
Jason W. Wei
Maarten Bosma
Vincent Zhao
Kelvin Guu
Adams Wei Yu
Brian Lester
Nan Du
Andrew M. Dai
Quoc V. Le
ALM
UQCV
198
3,750
0
03 Sep 2021
1