Closed-Form Feedback-Free Learning with Forward Projection

27 January 2025

Papers citing "Closed-Form Feedback-Free Learning with Forward Projection"

17 / 17 papers shown

Title
Sparsification and Reconstruction from the Perspective of Representation Geometry Wenjie Sun Bingzhe Wu Zhile Yang Chengke Wu 15 0 0 28 May 2025
Zero-Shot Vision Encoder Grafting via LLM Surrogates Kaiyu Yue Vasu Singla Menglin Jia John Kirchenbauer Rifaa Qadri Zikui Cai A. Bhatele Furong Huang Tom Goldstein VLM 16 0 0 28 May 2025
Factual Self-Awareness in Language Models: Representation, Robustness, and Scaling Hovhannes Tamoyan Subhabrata Dutta Iryna Gurevych HILM KELM 39 0 0 27 May 2025
Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders James Oldfield Shawn Im Yixuan Li M. Nicolaou Ioannis Patras Grigorios G. Chrysos MoE 38 0 0 27 May 2025
Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target Atoms Mengru Wang Ziwen Xu Shengyu Mao Shumin Deng Zhaopeng Tu Ningyu Zhang N. Zhang LLMSV 49 0 0 23 May 2025
Inference-Time Decomposition of Activations (ITDA): A Scalable Approach to Interpreting Large Language Models Patrick Leask Neel Nanda Noura Al Moubayed 46 1 0 23 May 2025
ProxySPEX: Inference-Efficient Interpretability via Sparse Feature Interactions in LLMs Landon Butler Abhineet Agarwal Justin Singh Kang Yigit Efe Erginbas Bin Yu Kannan Ramchandran 88 0 0 23 May 2025
Interpretability Illusions with Sparse Autoencoders: Evaluating Robustness of Concept Representations Aaron Jiaxun Li Suraj Srinivas Usha Bhalla Himabindu Lakkaraju AAML 92 0 0 21 May 2025
On the creation of narrow AI: hierarchy and nonlocality of neural network skills Eric J. Michaud Asher Parker-Sartori Max Tegmark 58 0 0 21 May 2025
Textual Steering Vectors Can Improve Visual Understanding in Multimodal Large Language Models Woody Haosheng Gan Deqing Fu Julian Asilis Ollie Liu Dani Yogatama Vatsal Sharan Robin Jia Willie Neiswanger LLMSV 49 0 0 20 May 2025
Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis Akarsh Kumar Jeff Clune Joel Lehman Kenneth O. Stanley OOD 55 0 0 16 May 2025
Are Sparse Autoencoders Useful for Java Function Bug Detection? Rui Melo Claudia Mamede Andre Catarino Rui Abreu Henrique Lopes Cardoso 59 0 0 15 May 2025
MIB: A Mechanistic Interpretability Benchmark Aaron Mueller Atticus Geiger Sarah Wiegreffe Dana Arad Iván Arcuschin ... Alessandro Stolfo Martin Tutek Amir Zur David Bau Yonatan Belinkov 73 1 0 17 Apr 2025
On Language Models' Sensitivity to Suspicious Coincidences Sriram Padmanabhan Kanishka Misra Kyle Mahowald Eunsol Choi ReLM LRM 52 0 0 13 Apr 2025
Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning Julian Minder Clement Dumas Caden Juang Bilal Chugtai Neel Nanda 109 1 0 03 Apr 2025
How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training Yixin Ou Yunzhi Yao N. Zhang Hui Jin Jiacheng Sun Shumin Deng Zechao Li Ningyu Zhang KELM CLL 84 2 0 16 Feb 2025
The Geometry of Concepts: Sparse Autoencoder Feature Structure Yuxiao Li Eric J. Michaud David D. Baek Joshua Engels Xiaoqing Sun Max Tegmark 73 15 0 10 Oct 2024