Explainability for Large Language Models: A Survey

2 September 2023

Haiyan Zhao

Hanjie Chen

Fan Yang

Ninghao Liu

Papers citing "Explainability for Large Language Models: A Survey"

50 / 67 papers shown

Title
Retrieval Augmented Generation Evaluation for Health Documents Mario Ceresa Lorenzo Bertolini Valentin Comte Nicholas Spadaro Barbara Raffael ... Sergio Consoli Amalia Muñoz Piñeiro Alex Patak Maddalena Querci Tobias Wiesenthal RALM 3DV 39 0 1 07 May 2025
Privacy Risks and Preservation Methods in Explainable Artificial Intelligence: A Scoping Review Sonal Allana Mohan Kankanhalli Rozita Dara 32 0 0 05 May 2025
LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures Francisco Aguilera-Martínez Fernando Berzal PILM 52 0 0 02 May 2025
Gender Bias in Explainability: Investigating Performance Disparity in Post-hoc Methods Mahdi Dhaini Ege Erdogan Nils Feldhus Gjergji Kasneci 46 0 0 02 May 2025
XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs Marco Arazzi Vignesh Kumar Kembu Antonino Nocera V. P. 82 0 0 30 Apr 2025
Bi-directional Model Cascading with Proxy Confidence David Warren Mark Dras 44 0 0 27 Apr 2025
Beyond Public Access in LLM Pre-Training Data Sruly Rosenblat Tim O'Reilly Ilan Strauss MLAU 57 0 0 24 Apr 2025
An evaluation of LLMs and Google Translate for translation of selected Indian languages via sentiment and semantic analyses Rohitash Chandra Aryan Chaudhary Yeshwanth Rayavarapu 44 0 0 27 Mar 2025
TruthPrInt: Mitigating LVLM Object Hallucination Via Latent Truthful-Guided Pre-Intervention Jinhao Duan Fei Kong Hao-Ran Cheng James Diffenderfer B. Kailkhura Lichao Sun Xiaofeng Zhu Xiaoshuang Shi Kaidi Xu 141 0 0 13 Mar 2025
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data? Yuhang Liu Dong Gong Erdun Gao Zhen Zhang Biwei Huang Mingming Gong Anton van den Hengel Javen Qinfeng Shi J. Shi 154 0 0 12 Mar 2025
Statistical Deficiency for Task Inclusion Estimation Loïc Fosse Frédéric Béchet Benoit Favre Géraldine Damnati Gwénolé Lecorvé Maxime Darrin Philippe Formont Pablo Piantanida 136 0 0 07 Mar 2025
Unveiling Scoring Processes: Dissecting the Differences between LLMs and Human Graders in Automatic Scoring Xuansheng Wu Padmaja Pravin Saraf Gyeong-Geon Lee Ehsan Latif Ninghao Liu Xiaoming Zhai 55 4 0 24 Feb 2025
Exploring Translation Mechanism of Large Language Models Hongbin Zhang Kehai Chen Xuefeng Bai Xiucheng Li Yang Xiang Min Zhang 59 1 0 17 Feb 2025
Survey on AI-Generated Media Detection: From Non-MLLM to MLLM Yueying Zou Peipei Li Zekun Li Huaibo Huang Xing Cui Xuannan Liu Chenghanyu Zhang Ran He DeLMO 120 2 0 07 Feb 2025
CueTip: An Interactive and Explainable Physics-aware Pool Assistant Sean Memery Kevin Denamganai Jiaxin Zhang Zehai Tu Yiwen Guo Kartic Subr LRM 42 0 0 30 Jan 2025
Citations and Trust in LLM Generated Responses Yifan Ding Matthew Facciani Amrit Poudel Ellen Joyce Salvador Aguiñaga Balaji Veeramani Sanmitra Bhattacharya Tim Weninger HILM 41 3 0 03 Jan 2025
Dynamic Attention-Guided Context Decoding for Mitigating Context Faithfulness Hallucinations in Large Language Models Yanwen Huang Yong Zhang Ning Cheng Zhitao Li Shaojun Wang Jing Xiao 86 0 0 02 Jan 2025
How Do Artificial Intelligences Think? The Three Mathematico-Cognitive Factors of Categorical Segmentation Operated by Synthetic Neurons Michael Pichat William Pogrund Armanush Gasparian Paloma Pichat Samuel Demarchi Michael Veillet-Guillem 42 3 0 26 Dec 2024
When Backdoors Speak: Understanding LLM Backdoor Attacks Through Model-Generated Explanations Huaizhi Ge Yiming Li Qifan Wang Yongfeng Zhang Ruixiang Tang AAML SILM 81 0 0 19 Nov 2024
Attention Tracker: Detecting Prompt Injection Attacks in LLMs Kuo-Han Hung Ching-Yun Ko Ambrish Rawat I-Hsin Chung Winston H. Hsu Pin-Yu Chen 49 7 0 01 Nov 2024
Automated Trustworthiness Oracle Generation for Machine Learning Text Classifiers Lam Nguyen Tung Steven Cho Xiaoning Du Neelofar Neelofar Valerio Terragni Stefano Ruberto Aldeida Aleti 136 2 0 30 Oct 2024
Large Language Model-assisted Speech and Pointing Benefits Multiple 3D Object Selection in Virtual Reality Junlong Chen Jens Grubert Per Ola Kristensson 21 0 0 28 Oct 2024
On the Role of Attention Heads in Large Language Model Safety Z. Zhou Haiyang Yu Xinghua Zhang Rongwu Xu Fei Huang Kun Wang Yang Liu Junfeng Fang Yongbin Li 59 5 0 17 Oct 2024
Pixology: Probing the Linguistic and Visual Capabilities of Pixel-based Language Models Kushal Tatariya Vladimir Araujo Thomas Bauwens Miryam de Lhoneux VLM 33 0 0 15 Oct 2024
Output Scouting: Auditing Large Language Models for Catastrophic Responses Andrew Bell João Fonseca KELM 51 1 0 04 Oct 2024
F-Fidelity: A Robust Framework for Faithfulness Evaluation of Explainable AI Xu Zheng Farhad Shirani Zhuomin Chen Chaohao Lin Wei Cheng Wenbo Guo Dongsheng Luo AAML 28 0 0 03 Oct 2024
Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution Haiyan Zhao Heng Zhao Bo Shen Ali Payani Fan Yang Mengnan Du 59 2 0 30 Sep 2024
SynSUM -- Synthetic Benchmark with Structured and Unstructured Medical Records Paloma Rabaey Henri Arno Stefan Heytens Thomas Demeester 31 1 0 13 Sep 2024
From Yes-Men to Truth-Tellers: Addressing Sycophancy in Large Language Models with Pinpoint Tuning Wei Chen Zhen Huang Liang Xie Binbin Lin Houqiang Li ... Deng Cai Yonggang Zhang Wenxiao Wang Xu Shen Jieping Ye 51 6 0 03 Sep 2024
Relation Also Knows: Rethinking the Recall and Editing of Factual Associations in Auto-Regressive Transformer Language Models Xiyu Liu Zhengxiao Liu Naibin Gu Zheng-Shen Lin Wanli Ma Ji Xiang Weiping Wang KELM 44 0 0 27 Aug 2024
Visual Agents as Fast and Slow Thinkers Guangyan Sun Mingyu Jin Zhenting Wang Cheng-Long Wang Siqi Ma Qifan Wang Ying Nian Wu Ying Nian Wu Dongfang Liu Dongfang Liu LLMAG LRM 77 13 0 16 Aug 2024
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs Nitay Calderon Roi Reichart 38 10 0 27 Jul 2024
MAVEN-Fact: A Large-scale Event Factuality Detection Dataset Chunyang Li Hao Peng Xiaozhi Wang Y. Qi Lei Hou Bin Xu Juanzi Li HILM 35 1 0 22 Jul 2024
Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models Jinliang Lu Ziliang Pang Min Xiao Yaochen Zhu Rui Xia Jiajun Zhang MoMe 38 18 0 08 Jul 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models Daking Rai Yilun Zhou Shi Feng Abulhair Saparov Ziyu Yao 82 19 0 02 Jul 2024
When Search Engine Services meet Large Language Models: Visions and Challenges Haoyi Xiong Jiang Bian Yuchen Li Xuhong Li Mengnan Du Shuaiqiang Wang Dawei Yin Sumi Helal 53 28 0 28 Jun 2024
Applications of Generative AI in Healthcare: algorithmic, ethical, legal and societal considerations Onyekachukwu R. Okonji Kamol Yunusov Bonnie Gordon MedIm 41 3 0 15 Jun 2024
How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States Zhenhong Zhou Haiyang Yu Xinghua Zhang Rongwu Xu Fei Huang Yongbin Li 24 26 0 09 Jun 2024
Generative AI Voting: Fair Collective Choice is Resilient to LLM Biases and Inconsistencies Srijoni Majumdar Edith Elkind Evangelos Pournaras SyDa 49 1 0 31 May 2024
Large Language Models Cannot Explain Themselves Advait Sarkar LRM 37 7 0 07 May 2024
Assessing LLMs in Malicious Code Deobfuscation of Real-world Malware Campaigns Constantinos Patsakis Fran Casino Nikolaos Lykousas 39 12 0 30 Apr 2024
Transformers for molecular property prediction: Lessons learned from the past five years Afnan Sultan Jochen Sieg M. Mathea Andrea Volkamer AI4CE 29 10 0 05 Apr 2024
Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency Akila Wickramasekara F. Breitinger Mark Scanlon 49 8 0 29 Feb 2024
The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement Learning and Large Language Models M. Pternea Prerna Singh Abir Chakraborty Y. Oruganti M. Milletarí Sayli Bapat Kebei Jiang OffRL 18 7 0 02 Feb 2024
Black-Box Access is Insufficient for Rigorous AI Audits Stephen Casper Carson Ezell Charlotte Siegmann Noam Kolt Taylor Lynn Curtis ... Michael Gerovitch David Bau Max Tegmark David M. Krueger Dylan Hadfield-Menell AAML 34 78 0 25 Jan 2024
Walking a Tightrope -- Evaluating Large Language Models in High-Risk Domains Chia-Chien Hung Wiem Ben-Rim Lindsay Frost Lars Bruckner Carolin (Haas) Lawrence AILaw ALM ELM 25 9 0 25 Nov 2023
A Survey of Graph Meets Large Language Model: Progress and Future Directions Yuhan Li Zhixun Li Peisong Wang Jia Li Xiangguo Sun Hongtao Cheng Jeffrey Xu Yu 38 55 0 21 Nov 2023
Explain-then-Translate: An Analysis on Improving Program Translation with Self-generated Explanations Zilu Tang Mayank Agarwal Alex Shypula Bailin Wang Derry Wijaya Jie Chen Yoon Kim LRM 37 15 0 13 Nov 2023
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets Samuel Marks Max Tegmark HILM 102 168 0 10 Oct 2023
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond Jingfeng Yang Hongye Jin Ruixiang Tang Xiaotian Han Qizhang Feng Haoming Jiang Bing Yin Xia Hu LM&MA 131 619 0 26 Apr 2023