Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.14680
Cited By
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space
28 March 2022
Mor Geva
Avi Caciularu
Ke Wang
Yoav Goldberg
KELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space"
50 / 272 papers shown
Title
Revealing the Deceptiveness of Knowledge Editing: A Mechanistic Analysis of Superficial Editing
Jiakuan Xie
Pengfei Cao
Yubo Chen
Kang Liu
Jun Zhao
KELM
1
0
0
19 May 2025
NAMET: Robust Massive Model Editing via Noise-Aware Memory Optimization
Yanbo Dai
Zhenlan Ji
Zongjie Li
Shuai Wang
KELM
0
0
0
17 May 2025
What do Language Model Probabilities Represent? From Distribution Estimation to Response Prediction
Eitan Wagner
Omri Abend
39
0
0
04 May 2025
Demystifying optimized prompts in language models
Rimon Melamed
Lucas H. McCabe
H. H. Huang
39
0
0
04 May 2025
SetKE: Knowledge Editing for Knowledge Elements Overlap
Yifan Wei
Xiaoyan Yu
Ran Song
Hao Peng
Angsheng Li
KELM
67
0
0
29 Apr 2025
Revisiting Transformers through the Lens of Low Entropy and Dynamic Sparsity
Ruifeng Ren
Yong Liu
132
0
0
26 Apr 2025
Exploring How LLMs Capture and Represent Domain-Specific Knowledge
Mirian Hipolito Garcia
Camille Couturier
Daniel Madrigal Diaz
Ankur Mallick
Anastasios Kyrillidis
Robert Sim
Victor Rühle
Saravan Rajmohan
30
0
0
23 Apr 2025
Bigram Subnetworks: Mapping to Next Tokens in Transformer Language Models
Tyler A. Chang
Benjamin Bergen
50
0
0
21 Apr 2025
Signatures of human-like processing in Transformer forward passes
Jennifer Hu
Michael A. Lepori
Michael Franke
AI4CE
156
0
0
18 Apr 2025
One Jump Is All You Need: Short-Cutting Transformers for Early Exit Prediction with One Jump to Fit All Exit Levels
Amrit Diggavi Seshadri
BDL
28
0
0
18 Apr 2025
GraphAttack: Exploiting Representational Blindspots in LLM Safety Mechanisms
Sinan He
An Wang
35
0
0
17 Apr 2025
Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric
Yixin Cao
Jiahao Ying
Yibo Wang
Xipeng Qiu
Xuanjing Huang
Yugang Jiang
ELM
44
2
0
10 Apr 2025
Steering off Course: Reliability Challenges in Steering Language Models
Patrick Queiroz Da Silva
Hari Sethuraman
Dheeraj Rajagopal
Hannaneh Hajishirzi
Sachin Kumar
LLMSV
29
1
0
06 Apr 2025
Is the Reversal Curse a Binding Problem? Uncovering Limitations of Transformers from a Basic Generalization Failure
Boshi Wang
Huan Sun
34
2
0
02 Apr 2025
From Text to Graph: Leveraging Graph Neural Networks for Enhanced Explainability in NLP
Fabio Yáñez-Romero
Andrés Montoyo
Armando Suárez
Yoan Gutiérrez
Ruslan Mitkov
46
0
0
02 Apr 2025
Shared Global and Local Geometry of Language Model Embeddings
Andrew Lee
Melanie Weber
F. Viégas
Martin Wattenberg
FedML
79
3
0
27 Mar 2025
LLM Braces: Straightening Out LLM Predictions with Relevant Sub-Updates
Ying Shen
Lifu Huang
52
1
0
20 Mar 2025
DAPI: Domain Adaptive Toxicity Probe Vector Intervention for Fine-Grained Detoxification
Cho Hyeonsu
Dooyoung Kim
Youngjoong Ko
MoMe
46
0
0
17 Mar 2025
Cognitive Activation and Chaotic Dynamics in Large Language Models: A Quasi-Lyapunov Analysis of Reasoning Mechanisms
Xiaojian Li
Yongkang Leng
Ruiqing Ding
Hangjie Mo
Shanlin Yang
LRM
52
0
0
15 Mar 2025
Probing Latent Subspaces in LLM for AI Security: Identifying and Manipulating Adversarial States
Xin Wei Chia
Jonathan Pan
AAML
46
0
0
12 Mar 2025
Similarity-Guided Layer-Adaptive Vision Transformer for UAV Tracking
Chaocan Xue
Bineng Zhong
Qihua Liang
Yaozong Zheng
Ning Li
Yuanliang Xue
Shuxiang Song
38
0
0
09 Mar 2025
Exploiting Edited Large Language Models as General Scientific Optimizers
Qitan Lv
T. Liu
Haoyu Wang
41
0
0
08 Mar 2025
Efficient Jailbreaking of Large Models by Freeze Training: Lower Layers Exhibit Greater Sensitivity to Harmful Content
Hongyuan Shen
Min Zheng
Jincheng Wang
Yang Zhao
47
0
0
28 Feb 2025
Finite State Automata Inside Transformers with Chain-of-Thought: A Mechanistic Study on State Tracking
Yifan Zhang
Wenyu Du
Dongming Jin
Jie Fu
Zhi Jin
LRM
53
0
0
27 Feb 2025
Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries
Tianyi Lorena Yan
Robin Jia
KELM
MU
46
0
0
27 Feb 2025
PIP-KAG: Mitigating Knowledge Conflicts in Knowledge-Augmented Generation via Parametric Pruning
Pengcheng Huang
Zhenghao Liu
Yukun Yan
Xiaoyuan Yi
Hao Chen
Zhiyuan Liu
Maosong Sun
Tong Xiao
Ge Yu
Chenyan Xiong
104
1
0
24 Feb 2025
Human Preferences in Large Language Model Latent Space: A Technical Analysis on the Reliability of Synthetic Data in Voting Outcome Prediction
Sarah Ball
Simeon Allmendinger
Frauke Kreuter
Niklas Kühl
57
0
0
22 Feb 2025
Revealing and Mitigating Over-Attention in Knowledge Editing
Pinzheng Wang
Zecheng Tang
Keyan Zhou
J. Li
Qiaoming Zhu
M. Zhang
KELM
120
2
0
21 Feb 2025
A Close Look at Decomposition-based XAI-Methods for Transformer Language Models
L. Arras
Bruno Puri
Patrick Kahardipraja
Sebastian Lapuschkin
Wojciech Samek
46
0
0
21 Feb 2025
Mechanistic Understanding of Language Models in Syntactic Code Completion
Samuel Miller
Daking Rai
Ziyu Yao
LRM
49
0
0
20 Feb 2025
DELMAN: Dynamic Defense Against Large Language Model Jailbreaking with Model Editing
Yi Wang
Fenghua Weng
Songlin Yang
Zhan Qin
Minlie Huang
Wenjie Wang
KELM
AAML
53
0
0
17 Feb 2025
Exploring Translation Mechanism of Large Language Models
Hongbin Zhang
Kehai Chen
Xuefeng Bai
Xiucheng Li
Yang Xiang
Min Zhang
67
1
0
17 Feb 2025
Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis
Xuben Wang
Yan Hu
Wenyu Du
Reynold Cheng
Benyou Wang
Difan Zou
61
1
0
17 Feb 2025
LLMs as a synthesis between symbolic and continuous approaches to language
Gemma Boleda
SyDa
74
0
0
17 Feb 2025
Do we Really Need Visual Instructions? Towards Visual Instruction-Free Fine-tuning for Large Vision-Language Models
Zikang Liu
K. Zhou
Wayne Xin Zhao
Dawei Gao
Yaliang Li
Zhicheng Dou
MLLM
VLM
LRM
94
0
0
17 Feb 2025
ReLearn: Unlearning via Learning for Large Language Models
Haoming Xu
Ningyuan Zhao
Liming Yang
Sendong Zhao
Shumin Deng
Mengru Wang
Bryan Hooi
Nay Oo
H. Chen
N. Zhang
KELM
CLL
MU
165
0
0
16 Feb 2025
UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths
Weijia Mao
Zheng Yang
Mike Zheng Shou
MoE
78
0
0
10 Feb 2025
Mechanistic Interpretability of Emotion Inference in Large Language Models
Ala Nekouvaght Tak
Amin Banayeeanzade
Anahita Bolourani
Mina Kian
Robin Jia
Jonathan Gratch
54
0
0
08 Feb 2025
Scaling Embedding Layers in Language Models
Da Yu
Edith Cohen
Badih Ghazi
Yangsibo Huang
Pritish Kamath
Ravi Kumar
Daogao Liu
Chiyuan Zhang
82
0
0
03 Feb 2025
Discovering Chunks in Neural Embeddings for Interpretability
Shuchen Wu
Stephan Alaniz
Eric Schulz
Zeynep Akata
47
0
0
03 Feb 2025
Evolutionary Optimization of Model Merging Recipes
Takuya Akiba
Makoto Shing
Yujin Tang
Qi Sun
David Ha
MoMe
116
100
0
28 Jan 2025
Understanding and Mitigating Gender Bias in LLMs via Interpretable Neuron Editing
Zeping Yu
Sophia Ananiadou
KELM
43
1
0
24 Jan 2025
The Geometry of Tokens in Internal Representations of Large Language Models
Karthik Viswanathan
Yuri Gardinazzi
Giada Panerai
Alberto Cazzaniga
Matteo Biagetti
AIFin
94
4
0
17 Jan 2025
Tapping the Potential of Large Language Models as Recommender Systems: A Comprehensive Framework and Empirical Analysis
Lanling Xu
Junjie Zhang
Bingqian Li
Jinpeng Wang
Sheng Chen
Wayne Xin Zhao
Zhicheng Dou
82
18
0
17 Jan 2025
Joint Knowledge Editing for Information Enrichment and Probability Promotion
Wenhang Shi
Yiren Chen
Shuqing Bian
Xinyi Zhang
Zhe Zhao
Pengfei Hu
Wei Lu
Xiaoyong Du
KELM
48
0
0
22 Dec 2024
Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens
Zhangqi Jiang
Junkai Chen
Beier Zhu
Tingjin Luo
Yankun Shen
Xu Yang
106
4
0
23 Nov 2024
Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering
Zeping Yu
Sophia Ananiadou
136
0
0
17 Nov 2024
Reducing Reasoning Costs: The Path of Optimization for Chain of Thought via Sparse Attention Mechanism
Libo Wang
LRM
AI4CE
48
0
0
14 Nov 2024
Controllable Context Sensitivity and the Knob Behind It
Julian Minder
Kevin Du
Niklas Stoehr
Giovanni Monea
Chris Wendler
Robert West
Ryan Cotterell
KELM
55
3
0
11 Nov 2024
Beyond Toxic Neurons: A Mechanistic Analysis of DPO for Toxicity Reduction
Yushi Yang
Filip Sondej
Harry Mayne
Adam Mahdi
26
0
0
10 Nov 2024
1
2
3
4
5
6
Next