ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.10949
  4. Cited By
SelfIE: Self-Interpretation of Large Language Model Embeddings

SelfIE: Self-Interpretation of Large Language Model Embeddings

16 March 2024
Haozhe Chen
Carl Vondrick
Chengzhi Mao
ArXivPDFHTML

Papers citing "SelfIE: Self-Interpretation of Large Language Model Embeddings"

18 / 18 papers shown
Title
LARGO: Latent Adversarial Reflection through Gradient Optimization for Jailbreaking LLMs
LARGO: Latent Adversarial Reflection through Gradient Optimization for Jailbreaking LLMs
Ran Li
Hao Wang
Chengzhi Mao
AAML
21
0
0
16 May 2025
Designing Role Vectors to Improve LLM Inference Behaviour
Designing Role Vectors to Improve LLM Inference Behaviour
Daniele Potertì
Andrea Seveso
Fabio Mercorio
LLMSV
54
0
0
17 Feb 2025
SEER: Self-Explainability Enhancement of Large Language Models' Representations
SEER: Self-Explainability Enhancement of Large Language Models' Representations
Guanxu Chen
Dongrui Liu
Tao Luo
Jing Shao
LRM
MILM
67
1
0
07 Feb 2025
ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object
  Hallucination in Large Vision-Language Models
ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models
Junzhe Chen
Tianshu Zhang
S. Huang
Yuwei Niu
Linfeng Zhang
Lijie Wen
Xuming Hu
MLLM
VLM
180
2
0
22 Nov 2024
Controllable Context Sensitivity and the Knob Behind It
Controllable Context Sensitivity and the Knob Behind It
Julian Minder
Kevin Du
Niklas Stoehr
Giovanni Monea
Chris Wendler
Robert West
Ryan Cotterell
KELM
55
3
0
11 Nov 2024
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse
  Autoencoders
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders
Viacheslav Surkov
Chris Wendler
Mikhail Terekhov
Justin Deschenaux
Robert West
Çağlar Gülçehre
VLM
40
13
0
28 Oct 2024
Causal Abstraction in Model Interpretability: A Compact Survey
Causal Abstraction in Model Interpretability: A Compact Survey
Yihao Zhang
33
0
0
26 Oct 2024
Meta-Models: An Architecture for Decoding LLM Behaviors Through
  Interpreted Embeddings and Natural Language
Meta-Models: An Architecture for Decoding LLM Behaviors Through Interpreted Embeddings and Natural Language
Anthony Costarelli
Mat Allen
Severin Field
27
1
0
03 Oct 2024
Racing Thoughts: Explaining Contextualization Errors in Large Language Models
Racing Thoughts: Explaining Contextualization Errors in Large Language Models
Michael A. Lepori
Michael Mozer
Asma Ghandeharioun
LRM
85
1
0
02 Oct 2024
Unraveling Text Generation in LLMs: A Stochastic Differential Equation
  Approach
Unraveling Text Generation in LLMs: A Stochastic Differential Equation Approach
Yukun Zhang
DiffM
27
0
0
17 Aug 2024
Unveiling LLM Mechanisms Through Neural ODEs and Control Theory
Unveiling LLM Mechanisms Through Neural ODEs and Control Theory
Yukun Zhang
Qi Dong
38
0
0
23 Jun 2024
Distributional reasoning in LLMs: Parallel reasoning processes in
  multi-hop reasoning
Distributional reasoning in LLMs: Parallel reasoning processes in multi-hop reasoning
Yuval Shalev
Amir Feder
Ariel Goldstein
LRM
42
4
0
19 Jun 2024
Who's asking? User personas and the mechanics of latent misalignment
Who's asking? User personas and the mechanics of latent misalignment
Asma Ghandeharioun
Ann Yuan
Marius Guerard
Emily Reif
Michael A. Lepori
Lucas Dixon
LLMSV
44
7
0
17 Jun 2024
Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix
  Controller
Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller
Min Cai
Yuchen Zhang
Shichang Zhang
Fan Yin
Difan Zou
Yisong Yue
Ziniu Hu
30
0
0
04 Jun 2024
FaithLM: Towards Faithful Explanations for Large Language Models
FaithLM: Towards Faithful Explanations for Large Language Models
Yu-Neng Chuang
Guanchu Wang
Chia-Yuan Chang
Ruixiang Tang
Shaochen Zhong
Fan Yang
Mengnan Du
Xuanting Cai
Xia Hu
LRM
74
0
0
07 Feb 2024
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
373
8,495
0
28 Jan 2022
Natural Language Descriptions of Deep Visual Features
Natural Language Descriptions of Deep Visual Features
Evan Hernandez
Sarah Schwettmann
David Bau
Teona Bagashvili
Antonio Torralba
Jacob Andreas
MILM
204
117
0
26 Jan 2022
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
320
5,785
0
29 Apr 2021
1