Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.10949
Cited By
SelfIE: Self-Interpretation of Large Language Model Embeddings
16 March 2024
Haozhe Chen
Carl Vondrick
Chengzhi Mao
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SelfIE: Self-Interpretation of Large Language Model Embeddings"
18 / 18 papers shown
Title
LARGO: Latent Adversarial Reflection through Gradient Optimization for Jailbreaking LLMs
Ran Li
Hao Wang
Chengzhi Mao
AAML
23
0
0
16 May 2025
Designing Role Vectors to Improve LLM Inference Behaviour
Daniele Potertì
Andrea Seveso
Fabio Mercorio
LLMSV
54
0
0
17 Feb 2025
SEER: Self-Explainability Enhancement of Large Language Models' Representations
Guanxu Chen
Dongrui Liu
Tao Luo
Jing Shao
LRM
MILM
67
1
0
07 Feb 2025
ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models
Junzhe Chen
Tianshu Zhang
S. Huang
Yuwei Niu
Linfeng Zhang
Lijie Wen
Xuming Hu
MLLM
VLM
189
2
0
22 Nov 2024
Controllable Context Sensitivity and the Knob Behind It
Julian Minder
Kevin Du
Niklas Stoehr
Giovanni Monea
Chris Wendler
Robert West
Ryan Cotterell
KELM
55
3
0
11 Nov 2024
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders
Viacheslav Surkov
Chris Wendler
Mikhail Terekhov
Justin Deschenaux
Robert West
Çağlar Gülçehre
VLM
40
13
0
28 Oct 2024
Causal Abstraction in Model Interpretability: A Compact Survey
Yihao Zhang
33
0
0
26 Oct 2024
Meta-Models: An Architecture for Decoding LLM Behaviors Through Interpreted Embeddings and Natural Language
Anthony Costarelli
Mat Allen
Severin Field
27
1
0
03 Oct 2024
Racing Thoughts: Explaining Contextualization Errors in Large Language Models
Michael A. Lepori
Michael Mozer
Asma Ghandeharioun
LRM
85
1
0
02 Oct 2024
Unraveling Text Generation in LLMs: A Stochastic Differential Equation Approach
Yukun Zhang
DiffM
32
0
0
17 Aug 2024
Unveiling LLM Mechanisms Through Neural ODEs and Control Theory
Yukun Zhang
Qi Dong
38
0
0
23 Jun 2024
Distributional reasoning in LLMs: Parallel reasoning processes in multi-hop reasoning
Yuval Shalev
Amir Feder
Ariel Goldstein
LRM
42
4
0
19 Jun 2024
Who's asking? User personas and the mechanics of latent misalignment
Asma Ghandeharioun
Ann Yuan
Marius Guerard
Emily Reif
Michael A. Lepori
Lucas Dixon
LLMSV
44
7
0
17 Jun 2024
Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller
Min Cai
Yuchen Zhang
Shichang Zhang
Fan Yin
Difan Zou
Yisong Yue
Ziniu Hu
30
0
0
04 Jun 2024
FaithLM: Towards Faithful Explanations for Large Language Models
Yu-Neng Chuang
Guanchu Wang
Chia-Yuan Chang
Ruixiang Tang
Shaochen Zhong
Fan Yang
Mengnan Du
Xuanting Cai
Xia Hu
LRM
77
0
0
07 Feb 2024
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
395
8,495
0
28 Jan 2022
Natural Language Descriptions of Deep Visual Features
Evan Hernandez
Sarah Schwettmann
David Bau
Teona Bagashvili
Antonio Torralba
Jacob Andreas
MILM
204
117
0
26 Jan 2022
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
347
5,785
0
29 Apr 2021
1