Explaining black box text modules in natural language with language models

17 May 2023

Papers citing "Explaining black box text modules in natural language with language models"

40 / 40 papers shown

Title
Never Start from Scratch: Expediting On-Device LLM Personalization via Explainable Model Selection Haoming Wang Boyuan Yang Xiangyu Yin Wei Gao 28 0 0 15 Apr 2025
Deceptive Automated Interpretability: Language Models Coordinating to Fool Oversight Systems Simon Lermen Mateusz Dziemian Natalia Pérez-Campanero Antolín 31 0 0 10 Apr 2025
Dataset Featurization: Uncovering Natural Language Features through Unsupervised Data Reconstruction Michal Bravansky Vaclav Kubon Suhas Hariharan Robert Kirk 69 0 0 24 Feb 2025
LaVCa: LLM-assisted Visual Cortex Captioning Takuya Matsuyama Shinji Nishimoto Yu Takagi 58 0 0 20 Feb 2025
Policy-to-Language: Train LLMs to Explain Decisions with Flow-Matching Generated Rewards Xinyi Yang Liang Zeng Heng Dong C. Yu X. Wu H. Yang Yu Wang Milind Tambe Tonghan Wang 76 2 0 18 Feb 2025
Deep Neural Networks and Brain Alignment: Brain Encoding and Decoding (Survey) S. Oota Zijiao Chen Manish Gupta R. Bapi G. Jobard F. Alexandre X. Hinaut 3DV AI4CE 49 11 0 31 Dec 2024
Interpretable Language Modeling via Induction-head Ngram Models Eunji Kim Sriya Mantena Weiwei Yang Chandan Singh Sungroh Yoon Jianfeng Gao 49 0 0 31 Oct 2024
DISCERN: Decoding Systematic Errors in Natural Language for Text Classifiers Rakesh R Menon Shashank Srivastava 26 1 0 29 Oct 2024
Brain-like Functional Organization within Large Language Models Haiyang Sun Lin Zhao Zihao Wu Xiaohui Gao Yutao Hu Mengfei Zuo W. Zhang Junwei Han Tianming Liu X. Hu 29 0 0 25 Oct 2024
Generative causal testing to bridge data-driven models and scientific theories in language neuroscience Richard Antonello Chandan Singh Shailee Jain Aliyah R. Hsu Jianfeng Gao Jianfeng Gao Alexander G. Huth Alexander Huth 21 1 0 01 Oct 2024
Localizing Memorization in SSL Vision Encoders Wenhao Wang Adam Dziedzic Michael Backes Franziska Boenisch 34 2 0 27 Sep 2024
Explaining Datasets in Words: Statistical Models with Natural Language Parameters Ruiqi Zhong Heng Wang Dan Klein Jacob Steinhardt 35 6 0 13 Sep 2024
XAI meets LLMs: A Survey of the Relation between Explainable AI and Large Language Models Erik Cambria Lorenzo Malandri Fabio Mercorio Navid Nobani Andrea Seveso 50 11 0 21 Jul 2024
LLM-assisted Concept Discovery: Automatically Identifying and Explaining Neuron Functions N. Hoang-Xuan Minh Nhat Vu My T. Thai 28 3 0 12 Jun 2024
Crafting Interpretable Embeddings by Asking LLMs Questions Vinamra Benara Chandan Singh John X. Morris Richard Antonello Ion Stoica Alexander G. Huth Jianfeng Gao 24 5 0 26 May 2024
Explainable Automatic Grading with Neural Additive Models Aubrey Condor Z. Pardos ELM 27 2 0 01 May 2024
A Multimodal Automated Interpretability Agent Tamar Rott Shaham Sarah Schwettmann Franklin Wang Achyuta Rajaram Evan Hernandez Jacob Andreas Antonio Torralba 31 17 0 22 Apr 2024
Latent Concept-based Explanation of NLP Models Xuemin Yu Fahim Dalvi Nadir Durrani Marzia Nouri Hassan Sajjad LRM FAtt 24 1 0 18 Apr 2024
Explainable Generative AI (GenXAI): A Survey, Conceptualization, and Research Agenda Johannes Schneider 83 26 0 15 Apr 2024
Computational Models to Study Language Processing in the Human Brain: A Survey Shaonan Wang Jingyuan Sun Yunhao Zhang Nan Lin Marie-Francine Moens Chengqing Zong 29 5 0 20 Mar 2024
End-to-End Neuro-Symbolic Reinforcement Learning with Textual Explanations Lirui Luo Guoxi Zhang Hongming Xu Yaodong Yang Cong Fang Qing Li 37 11 0 19 Mar 2024
Rethinking Interpretability in the Era of Large Language Models Chandan Singh J. Inala Michel Galley Rich Caruana Jianfeng Gao LRM AI4CE 77 61 0 30 Jan 2024
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models Asma Ghandeharioun Avi Caciularu Adam Pearce Lucas Dixon Mor Geva 34 87 0 11 Jan 2024
Large Scale Foundation Models for Intelligent Manufacturing Applications: A Survey Haotian Zhang S. D. Semujju Zhicheng Wang Xianwei Lv Kang Xu ... Jing Wu Zhuo Long Wensheng Liang Xiaoguang Ma Ruiyan Zhuang UQCV AI4TS AI4CE 27 4 0 11 Dec 2023
Survey on AI Ethics: A Socio-technical Perspective Dave Mbiazi Meghana Bhange Maryam Babaei Ivaxi Sheth Patrik Joslin Kenfack 17 4 0 28 Nov 2023
An Interdisciplinary Outlook on Large Language Models for Scientific Research James Boyko Joseph Cohen Nathan Fox Maria Han Veiga Jennifer I-Hsiu Li ... Andreas H. Rauch Kenneth N. Reid Soumi Tribedi Anastasia Visheratina Xin Xie 36 17 0 03 Nov 2023
Unpacking the Ethical Value Alignment in Big Models Xiaoyuan Yi Jing Yao Xiting Wang Xing Xie 24 11 0 26 Oct 2023
From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning Xuansheng Wu Wenlin Yao Jianshu Chen Xiaoman Pan Xiaoyang Wang Ninghao Liu Dong Yu LRM 20 26 0 30 Sep 2023
Rigorously Assessing Natural Language Explanations of Neurons Jing-ling Huang Atticus Geiger Karel DÓosterlinck Zhengxuan Wu Christopher Potts MILM 26 25 0 19 Sep 2023
FIND: A Function Description Benchmark for Evaluating Interpretability Methods Sarah Schwettmann Tamar Rott Shaham Joanna Materzyñska Neil Chowdhury Shuang Li Jacob Andreas David Bau Antonio Torralba 18 19 0 07 Sep 2023
Explainability for Large Language Models: A Survey Haiyan Zhao Hanjie Chen Fan Yang Ninghao Liu Huiqi Deng Hengyi Cai Shuaiqiang Wang Dawei Yin Mengnan Du LRM 23 408 0 02 Sep 2023
Self-Verification Improves Few-Shot Clinical Information Extraction Zelalem Gero Chandan Singh Hao Cheng Tristan Naumann Michel Galley Jianfeng Gao Hoifung Poon 40 52 0 30 May 2023
Goal-Driven Explainable Clustering via Language Descriptions Zihan Wang Jingbo Shang Ruiqi Zhong 30 35 0 23 May 2023
Explaining Language Models' Predictions with High-Impact Concepts Ruochen Zhao Shafiq R. Joty Yongjie Wang Tan Wang LRM 63 8 0 03 May 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing Wes Gurnee Neel Nanda Matthew Pauly Katherine Harvey Dmitrii Troitskii Dimitris Bertsimas MILM 158 186 0 02 May 2023
Describing Differences between Text Distributions with Natural Language Ruiqi Zhong Charles Burton Snell Dan Klein Jacob Steinhardt VLM 124 42 0 28 Jan 2022
Natural Language Descriptions of Deep Visual Features Evan Hernandez Sarah Schwettmann David Bau Teona Bagashvili Antonio Torralba Jacob Andreas MILM 198 117 0 26 Jan 2022
Learning from the Best: Rationalizing Prediction by Adversarial Information Calibration Lei Sha Oana-Maria Camburu Thomas Lukasiewicz 124 35 0 16 Dec 2020
e-SNLI: Natural Language Inference with Natural Language Explanations Oana-Maria Camburu Tim Rocktaschel Thomas Lukasiewicz Phil Blunsom LRM 255 620 0 04 Dec 2018
What you can cram into a single vector: Probing sentence embeddings for linguistic properties Alexis Conneau Germán Kruszewski Guillaume Lample Loïc Barrault Marco Baroni 201 882 0 03 May 2018