Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.09863
Cited By
Explaining black box text modules in natural language with language models
17 May 2023
Chandan Singh
Aliyah R. Hsu
Richard Antonello
Shailee Jain
Alexander G. Huth
Bin-Xia Yu
Jianfeng Gao
MILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Explaining black box text modules in natural language with language models"
40 / 40 papers shown
Title
Never Start from Scratch: Expediting On-Device LLM Personalization via Explainable Model Selection
Haoming Wang
Boyuan Yang
Xiangyu Yin
Wei Gao
28
0
0
15 Apr 2025
Deceptive Automated Interpretability: Language Models Coordinating to Fool Oversight Systems
Simon Lermen
Mateusz Dziemian
Natalia Pérez-Campanero Antolín
31
0
0
10 Apr 2025
Dataset Featurization: Uncovering Natural Language Features through Unsupervised Data Reconstruction
Michal Bravansky
Vaclav Kubon
Suhas Hariharan
Robert Kirk
69
0
0
24 Feb 2025
LaVCa: LLM-assisted Visual Cortex Captioning
Takuya Matsuyama
Shinji Nishimoto
Yu Takagi
58
0
0
20 Feb 2025
Policy-to-Language: Train LLMs to Explain Decisions with Flow-Matching Generated Rewards
Xinyi Yang
Liang Zeng
Heng Dong
C. Yu
X. Wu
H. Yang
Yu Wang
Milind Tambe
Tonghan Wang
76
2
0
18 Feb 2025
Deep Neural Networks and Brain Alignment: Brain Encoding and Decoding (Survey)
S. Oota
Zijiao Chen
Manish Gupta
R. Bapi
G. Jobard
F. Alexandre
X. Hinaut
3DV
AI4CE
49
11
0
31 Dec 2024
Interpretable Language Modeling via Induction-head Ngram Models
Eunji Kim
Sriya Mantena
Weiwei Yang
Chandan Singh
Sungroh Yoon
Jianfeng Gao
49
0
0
31 Oct 2024
DISCERN: Decoding Systematic Errors in Natural Language for Text Classifiers
Rakesh R Menon
Shashank Srivastava
26
1
0
29 Oct 2024
Brain-like Functional Organization within Large Language Models
Haiyang Sun
Lin Zhao
Zihao Wu
Xiaohui Gao
Yutao Hu
Mengfei Zuo
W. Zhang
Junwei Han
Tianming Liu
X. Hu
29
0
0
25 Oct 2024
Generative causal testing to bridge data-driven models and scientific theories in language neuroscience
Richard Antonello
Chandan Singh
Shailee Jain
Aliyah R. Hsu
Jianfeng Gao
Jianfeng Gao
Alexander G. Huth
Alexander Huth
21
1
0
01 Oct 2024
Localizing Memorization in SSL Vision Encoders
Wenhao Wang
Adam Dziedzic
Michael Backes
Franziska Boenisch
34
2
0
27 Sep 2024
Explaining Datasets in Words: Statistical Models with Natural Language Parameters
Ruiqi Zhong
Heng Wang
Dan Klein
Jacob Steinhardt
35
6
0
13 Sep 2024
XAI meets LLMs: A Survey of the Relation between Explainable AI and Large Language Models
Erik Cambria
Lorenzo Malandri
Fabio Mercorio
Navid Nobani
Andrea Seveso
50
11
0
21 Jul 2024
LLM-assisted Concept Discovery: Automatically Identifying and Explaining Neuron Functions
N. Hoang-Xuan
Minh Nhat Vu
My T. Thai
28
3
0
12 Jun 2024
Crafting Interpretable Embeddings by Asking LLMs Questions
Vinamra Benara
Chandan Singh
John X. Morris
Richard Antonello
Ion Stoica
Alexander G. Huth
Jianfeng Gao
24
5
0
26 May 2024
Explainable Automatic Grading with Neural Additive Models
Aubrey Condor
Z. Pardos
ELM
27
2
0
01 May 2024
A Multimodal Automated Interpretability Agent
Tamar Rott Shaham
Sarah Schwettmann
Franklin Wang
Achyuta Rajaram
Evan Hernandez
Jacob Andreas
Antonio Torralba
31
17
0
22 Apr 2024
Latent Concept-based Explanation of NLP Models
Xuemin Yu
Fahim Dalvi
Nadir Durrani
Marzia Nouri
Hassan Sajjad
LRM
FAtt
24
1
0
18 Apr 2024
Explainable Generative AI (GenXAI): A Survey, Conceptualization, and Research Agenda
Johannes Schneider
83
26
0
15 Apr 2024
Computational Models to Study Language Processing in the Human Brain: A Survey
Shaonan Wang
Jingyuan Sun
Yunhao Zhang
Nan Lin
Marie-Francine Moens
Chengqing Zong
29
5
0
20 Mar 2024
End-to-End Neuro-Symbolic Reinforcement Learning with Textual Explanations
Lirui Luo
Guoxi Zhang
Hongming Xu
Yaodong Yang
Cong Fang
Qing Li
37
11
0
19 Mar 2024
Rethinking Interpretability in the Era of Large Language Models
Chandan Singh
J. Inala
Michel Galley
Rich Caruana
Jianfeng Gao
LRM
AI4CE
77
61
0
30 Jan 2024
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
Asma Ghandeharioun
Avi Caciularu
Adam Pearce
Lucas Dixon
Mor Geva
34
87
0
11 Jan 2024
Large Scale Foundation Models for Intelligent Manufacturing Applications: A Survey
Haotian Zhang
S. D. Semujju
Zhicheng Wang
Xianwei Lv
Kang Xu
...
Jing Wu
Zhuo Long
Wensheng Liang
Xiaoguang Ma
Ruiyan Zhuang
UQCV
AI4TS
AI4CE
27
4
0
11 Dec 2023
Survey on AI Ethics: A Socio-technical Perspective
Dave Mbiazi
Meghana Bhange
Maryam Babaei
Ivaxi Sheth
Patrik Joslin Kenfack
17
4
0
28 Nov 2023
An Interdisciplinary Outlook on Large Language Models for Scientific Research
James Boyko
Joseph Cohen
Nathan Fox
Maria Han Veiga
Jennifer I-Hsiu Li
...
Andreas H. Rauch
Kenneth N. Reid
Soumi Tribedi
Anastasia Visheratina
Xin Xie
36
17
0
03 Nov 2023
Unpacking the Ethical Value Alignment in Big Models
Xiaoyuan Yi
Jing Yao
Xiting Wang
Xing Xie
24
11
0
26 Oct 2023
From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning
Xuansheng Wu
Wenlin Yao
Jianshu Chen
Xiaoman Pan
Xiaoyang Wang
Ninghao Liu
Dong Yu
LRM
20
26
0
30 Sep 2023
Rigorously Assessing Natural Language Explanations of Neurons
Jing-ling Huang
Atticus Geiger
Karel DÓosterlinck
Zhengxuan Wu
Christopher Potts
MILM
26
25
0
19 Sep 2023
FIND: A Function Description Benchmark for Evaluating Interpretability Methods
Sarah Schwettmann
Tamar Rott Shaham
Joanna Materzyñska
Neil Chowdhury
Shuang Li
Jacob Andreas
David Bau
Antonio Torralba
18
19
0
07 Sep 2023
Explainability for Large Language Models: A Survey
Haiyan Zhao
Hanjie Chen
Fan Yang
Ninghao Liu
Huiqi Deng
Hengyi Cai
Shuaiqiang Wang
Dawei Yin
Mengnan Du
LRM
23
408
0
02 Sep 2023
Self-Verification Improves Few-Shot Clinical Information Extraction
Zelalem Gero
Chandan Singh
Hao Cheng
Tristan Naumann
Michel Galley
Jianfeng Gao
Hoifung Poon
40
52
0
30 May 2023
Goal-Driven Explainable Clustering via Language Descriptions
Zihan Wang
Jingbo Shang
Ruiqi Zhong
30
35
0
23 May 2023
Explaining Language Models' Predictions with High-Impact Concepts
Ruochen Zhao
Shafiq R. Joty
Yongjie Wang
Tan Wang
LRM
63
8
0
03 May 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Wes Gurnee
Neel Nanda
Matthew Pauly
Katherine Harvey
Dmitrii Troitskii
Dimitris Bertsimas
MILM
158
186
0
02 May 2023
Describing Differences between Text Distributions with Natural Language
Ruiqi Zhong
Charles Burton Snell
Dan Klein
Jacob Steinhardt
VLM
124
42
0
28 Jan 2022
Natural Language Descriptions of Deep Visual Features
Evan Hernandez
Sarah Schwettmann
David Bau
Teona Bagashvili
Antonio Torralba
Jacob Andreas
MILM
198
117
0
26 Jan 2022
Learning from the Best: Rationalizing Prediction by Adversarial Information Calibration
Lei Sha
Oana-Maria Camburu
Thomas Lukasiewicz
124
35
0
16 Dec 2020
e-SNLI: Natural Language Inference with Natural Language Explanations
Oana-Maria Camburu
Tim Rocktaschel
Thomas Lukasiewicz
Phil Blunsom
LRM
255
620
0
04 Dec 2018
What you can cram into a single vector: Probing sentence embeddings for linguistic properties
Alexis Conneau
Germán Kruszewski
Guillaume Lample
Loïc Barrault
Marco Baroni
201
882
0
03 May 2018
1