ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.05954
  4. Cited By
Aligning Large Language Models with Representation Editing: A Control
  Perspective

Aligning Large Language Models with Representation Editing: A Control Perspective

10 June 2024
Lingkai Kong
Haorui Wang
Wenhao Mu
Yuanqi Du
Yuchen Zhuang
Yifei Zhou
Yue Song
Rongzhi Zhang
Kai Wang
Chao Zhang
ArXivPDFHTML

Papers citing "Aligning Large Language Models with Representation Editing: A Control Perspective"

12 / 12 papers shown
Title
Dual Filter: A Mathematical Framework for Inference using Transformer-like Architectures
Dual Filter: A Mathematical Framework for Inference using Transformer-like Architectures
Heng-Sheng Chang
P. Mehta
39
0
0
01 May 2025
Compositional Subspace Representation Fine-tuning for Adaptive Large Language Models
Compositional Subspace Representation Fine-tuning for Adaptive Large Language Models
Andy Zhou
MoMe
94
0
0
13 Mar 2025
Representation-based Reward Modeling for Efficient Safety Alignment of Large Language Model
Qiyuan Deng
X. Bai
Kehai Chen
Yaowei Wang
Liqiang Nie
Min Zhang
OffRL
66
0
0
13 Mar 2025
Personalize Your LLM: Fake it then Align it
Yijing Zhang
Dyah Adila
Changho Shin
Frederic Sala
88
0
0
02 Mar 2025
Is Free Self-Alignment Possible?
Is Free Self-Alignment Possible?
Dyah Adila
Changho Shin
Yijing Zhang
Frederic Sala
MoMe
118
2
0
24 Feb 2025
Representation Engineering for Large-Language Models: Survey and Research Challenges
Representation Engineering for Large-Language Models: Survey and Research Challenges
Lukasz Bartoszcze
Sarthak Munshi
Bryan Sukidi
Jennifer Yen
Zejia Yang
David Williams-King
Linh Le
Kosi Asuzu
Carsten Maple
102
0
0
24 Feb 2025
Mixture of Attentions For Speculative Decoding
Mixture of Attentions For Speculative Decoding
Matthieu Zimmer
Milan Gritta
Gerasimos Lampouras
Haitham Bou Ammar
Jun Wang
76
4
0
04 Oct 2024
Programming Refusal with Conditional Activation Steering
Programming Refusal with Conditional Activation Steering
Bruce W. Lee
Inkit Padhi
K. Ramamurthy
Erik Miehling
Pierre L. Dognin
Manish Nagireddy
Amit Dhurandhar
LLMSV
105
14
0
06 Sep 2024
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank
  Modifications
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Boyi Wei
Kaixuan Huang
Yangsibo Huang
Tinghao Xie
Xiangyu Qi
Mengzhou Xia
Prateek Mittal
Mengdi Wang
Peter Henderson
AAML
60
79
0
07 Feb 2024
The Falcon Series of Open Language Models
The Falcon Series of Open Language Models
Ebtesam Almazrouei
Hamza Alobeidli
Abdulaziz Alshamsi
Alessandro Cappelli
Ruxandra-Aimée Cojocaru
...
Quentin Malartic
Daniele Mazzotta
Badreddine Noune
B. Pannier
Guilherme Penedo
AI4TS
ALM
121
404
0
28 Nov 2023
Finding Alignments Between Interpretable Causal Variables and
  Distributed Neural Representations
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Atticus Geiger
Zhengxuan Wu
Christopher Potts
Thomas Icard
Noah D. Goodman
CML
75
99
0
05 Mar 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
339
12,003
0
04 Mar 2022
1