Interpreting Pretrained Language Models via Concept Bottlenecks

Interpreting Pretrained Language Models via Concept Bottlenecks

8 November 2023

Huan Liu

Papers citing "Interpreting Pretrained Language Models via Concept Bottlenecks"

17 / 17 papers shown

Title
Intrinsic Barriers to Explaining Deep Foundation Models Zhen Tan Huan Liu AI4CE 22 0 0 21 Apr 2025
Concept Layers: Enhancing Interpretability and Intervenability via LLM Conceptualization Or Raphael Bidusa Shaul Markovitch 65 0 0 20 Feb 2025
VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance Divyansh Srivastava Beatriz Cabrero-Daniel Christian Berger VLM 67 8 0 17 Jan 2025
Concept Bottleneck Language Models For protein design Aya Abdelsalam Ismail Tuomas Oikarinen Amy Wang Julius Adebayo Samuel Stanton ... J. Kleinhenz Allen Goodman H. C. Bravo Kyunghyun Cho Nathan C. Frey 45 4 0 09 Nov 2024
Enforcing Interpretability in Time Series Transformers: A Concept Bottleneck Framework Angela van Sprang Erman Acar Willem Zuidema AI4TS 51 1 0 08 Oct 2024
Model Attribution in LLM-Generated Disinformation: A Domain Generalization Approach with Supervised Contrastive Learning Alimohammad Beigi Zhen Tan Nivedh Mudiam Canyu Chen Kai Shu Huan Liu DeLMO 41 2 0 31 Jul 2024
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs Nitay Calderon Roi Reichart 42 10 0 27 Jul 2024
CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models Song Wang Peng Wang Tong Zhou Yushun Dong Zhen Tan Jundong Li CoGe 56 7 0 02 Jul 2024
Conceptual Learning via Embedding Approximations for Reinforcing Interpretability and Transparency Maor Dikter Tsachi Blau Chaim Baskin 43 0 0 13 Jun 2024
Facial Affective Behavior Analysis with Instruction Tuning Yifan Li Anh Dao Wentao Bao Zhen Tan Tianlong Chen Huan Liu Yu Kong CVBM 65 15 0 07 Apr 2024
Tuning-Free Accountable Intervention for LLM Deployment -- A Metacognitive Approach Zhen Tan Jie Peng Tianlong Chen Huan Liu 37 6 0 08 Mar 2024
Large Language Models for Data Annotation: A Survey Zhen Tan Dawei Li Song Wang Alimohammad Beigi Bohan Jiang Amrita Bhattacharjee Mansooreh Karami Wenlin Yao Lu Cheng Huan Liu SyDa 56 50 0 21 Feb 2024
Sparsity-Guided Holistic Explanation for LLMs with Interpretable Inference-Time Intervention Zhen Tan Tianlong Chen Zhenyu Zhang Huan Liu 52 14 0 22 Dec 2023
Zeroth-Order Optimization Meets Human Feedback: Provable Learning via Ranking Oracles Zhiwei Tang Dmitry Rybin Tsung-Hui Chang ALM DiffM 39 26 0 07 Mar 2023
Causal Proxy Models for Concept-Based Model Explanations Zhengxuan Wu Karel DÓosterlinck Atticus Geiger Amir Zur Christopher Potts MILM 83 35 0 28 Sep 2022
Text Summarization with Pretrained Encoders Yang Liu Mirella Lapata MILM 258 1,433 0 22 Aug 2019
Efficient Estimation of Word Representations in Vector Space Tomáš Mikolov Kai Chen G. Corrado J. Dean 3DV 296 31,267 0 16 Jan 2013