Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations

10 March 2017

Finale Doshi-Velez

Papers citing "Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations"

50 / 108 papers shown

Title
Large Language Models as Attribution Regularizers for Efficient Model Training Davor Vukadin Marin Šilić Goran Delač 38 0 0 27 Feb 2025
Diagnosing COVID-19 Severity from Chest X-Ray Images Using ViT and CNN Architectures Luis Lara Lucia Eve Berger Rajesh Raju ViT 32 0 0 23 Feb 2025
B-cosification: Transforming Deep Neural Networks to be Inherently Interpretable Shreyash Arya Sukrut Rao Moritz Bohle Bernt Schiele 68 2 0 28 Jan 2025
Automated Trustworthiness Oracle Generation for Machine Learning Text Classifiers Lam Nguyen Tung Steven Cho Xiaoning Du Neelofar Neelofar Valerio Terragni Stefano Ruberto Aldeida Aleti 139 2 0 30 Oct 2024
Problem Solving Through Human-AI Preference-Based Cooperation Subhabrata Dutta Timo Kaufmann Goran Glavas Ivan Habernal Kristian Kersting Frauke Kreuter Mira Mezini Iryna Gurevych Eyke Hüllermeier Hinrich Schuetze 95 1 0 14 Aug 2024
Explanation Regularisation through the Lens of Attributions Pedro Ferreira Wilker Aziz Ivan Titov 43 1 0 23 Jul 2024
Language-guided Detection and Mitigation of Unknown Dataset Bias Zaiying Zhao Soichiro Kumano Toshihiko Yamasaki 36 2 0 05 Jun 2024
Exploring the Trade-off Between Model Performance and Explanation Plausibility of Text Classifiers Using Human Rationales Lucas Resck Marcos M. Raimundo Jorge Poco 44 1 0 03 Apr 2024
AI, Meet Human: Learning Paradigms for Hybrid Decision Making Systems Clara Punzi Roberto Pellungrini Mattia Setzu F. Giannotti D. Pedreschi 25 5 0 09 Feb 2024
Identifying Spurious Correlations using Counterfactual Alignment Joseph Paul Cohen Louis Blankemeier Akshay S. Chaudhari CML 55 1 0 01 Dec 2023
Improving Interpretation Faithfulness for Vision Transformers Lijie Hu Yixin Liu Ninghao Liu Mengdi Huai Lichao Sun Di Wang 27 5 0 29 Nov 2023
Concept Distillation: Leveraging Human-Centered Explanations for Model Improvement Avani Gupta Saurabh Saini P. J. Narayanan 25 6 0 26 Nov 2023
Explaining high-dimensional text classifiers Odelia Melamed Rich Caruana 20 0 0 22 Nov 2023
SCAAT: Improving Neural Network Interpretability via Saliency Constrained Adaptive Adversarial Training Rui Xu Wenkang Qin Peixiang Huang Hao Wang Lin Luo FAtt AAML 28 2 0 09 Nov 2023
Interpretability-Aware Vision Transformer Yao Qiang Chengyin Li Prashant Khanduri D. Zhu ViT 82 7 0 14 Sep 2023
FunnyBirds: A Synthetic Vision Dataset for a Part-Based Analysis of Explainable AI Methods Robin Hesse Simone Schaub-Meyer Stefan Roth AAML 34 32 0 11 Aug 2023
Unlearning Spurious Correlations in Chest X-ray Classification Misgina Tsighe Hagos Kathleen M. Curran Brian Mac Namee CML OOD 16 0 0 02 Aug 2023
Mitigating Bias: Enhancing Image Classification by Improving Model Explanations Raha Ahmadi Mohammad Javad Rajabi Mohammad Khalooiem Mohammad Sabokrou 26 0 0 04 Jul 2023
Learning Differentiable Logic Programs for Abstract Visual Reasoning Hikaru Shindo Viktor Pfanschilling D. Dhami Kristian Kersting NAI 29 6 0 03 Jul 2023
One Explanation Does Not Fit XIL Felix Friedrich David Steinmann Kristian Kersting LRM 35 2 0 14 Apr 2023
Preemptively Pruning Clever-Hans Strategies in Deep Neural Networks Lorenz Linhardt Klaus-Robert Muller G. Montavon AAML 23 7 0 12 Apr 2023
Are Data-driven Explanations Robust against Out-of-distribution Data? Tang Li Fengchun Qiao Mengmeng Ma Xiangkai Peng OODD OOD 33 10 0 29 Mar 2023
Learning with Explanation Constraints Rattana Pukdee Dylan Sam J. Zico Kolter Maria-Florina Balcan Pradeep Ravikumar FAtt 32 6 0 25 Mar 2023
Towards Learning and Explaining Indirect Causal Effects in Neural Networks Abbaavaram Gowtham Reddy Saketh Bachu Harsh Nilesh Pathak Ben Godfrey V. Balasubramanian V. Varshaneya Satya Narayanan Kar CML 28 0 0 24 Mar 2023
Towards Explaining Subjective Ground of Individuals on Social Media Younghun Lee Dan Goldwasser 23 1 0 18 Nov 2022
Identifying Spurious Correlations and Correcting them with an Explanation-based Learning Misgina Tsighe Hagos Kathleen M. Curran Brian Mac Namee 16 10 0 15 Nov 2022
Towards Human-Centred Explainability Benchmarks For Text Classification Viktor Schlegel Erick Mendez Guzman R. Batista-Navarro 18 5 0 10 Nov 2022
XMD: An End-to-End Framework for Interactive Explanation-Based Debugging of NLP Models Dong-Ho Lee Akshen Kadakia Brihi Joshi Aaron Chan Ziyi Liu ... Takashi Shibuya Ryosuke Mitani Toshiyuki Sekiya Jay Pujara Xiang Ren LRM 40 9 0 30 Oct 2022
Sparsity in Continuous-Depth Neural Networks H. Aliee Till Richter Mikhail Solonin I. Ibarra Fabian J. Theis Niki Kilbertus 29 10 0 26 Oct 2022
Revision Transformers: Instructing Language Models to Change their Values Felix Friedrich Wolfgang Stammer P. Schramowski Kristian Kersting KELM 30 6 0 19 Oct 2022
Equivariant and Invariant Grounding for Video Question Answering Yicong Li Xiang Wang Junbin Xiao Tat-Seng Chua 18 25 0 26 Jul 2022
RES: A Robust Framework for Guiding Visual Explanation Yuyang Gao Tong Sun Guangji Bai Siyi Gu S. Hong Liang Zhao FAtt AAML XAI 21 32 0 27 Jun 2022
The Importance of Background Information for Out of Distribution Generalization Jupinder Parmar Khaled Kamal Saab Brian Pogatchnik D. Rubin Christopher Ré OOD 16 0 0 17 Jun 2022
Optimizing Relevance Maps of Vision Transformers Improves Robustness Hila Chefer Idan Schwartz Lior Wolf ViT 32 37 0 02 Jun 2022
Learning to Ignore Adversarial Attacks Yiming Zhang Yan Zhou Samuel Carton Chenhao Tan 46 2 0 23 May 2022
Perspectives on Incorporating Expert Feedback into Model Updates Valerie Chen Umang Bhatt Hoda Heidari Adrian Weller Ameet Talwalkar 30 11 0 13 May 2022
The Road to Explainability is Paved with Bias: Measuring the Fairness of Explanations Aparna Balagopalan Haoran Zhang Kimia Hamidieh Thomas Hartvigsen Frank Rudzicz Marzyeh Ghassemi 38 77 0 06 May 2022
Unsupervised Learning of Unbiased Visual Representations C. Barbano Enzo Tartaglione Marco Grangetto SSL CML OOD 24 1 0 26 Apr 2022
Dynamically Refined Regularization for Improving Cross-corpora Hate Speech Detection Tulika Bose Nikolaos Aletras Irina Illina Dominique Fohr 42 5 0 23 Mar 2022
Aligning Eyes between Humans and Deep Neural Network through Interactive Attention Alignment Yuyang Gao Tong Sun Liang Zhao Sungsoo Ray Hong HAI 21 37 0 06 Feb 2022
Right for the Right Latent Factors: Debiasing Generative Models via Disentanglement Xiaoting Shao Karl Stelzner Kristian Kersting CML DRL 22 3 0 01 Feb 2022
Debiased-CAM to mitigate systematic error with faithful visual explanations of machine learning Wencan Zhang Mariella Dimiccoli Brian Y. Lim FAtt 19 1 0 30 Jan 2022
Controlling Directions Orthogonal to a Classifier Yilun Xu Hao He T. Shen Tommi Jaakkola 61 19 0 27 Jan 2022
Making a (Counterfactual) Difference One Rationale at a Time Michael J. Plyler Michal Green Min Chi 21 11 0 13 Jan 2022
Towards Relatable Explainable AI with the Perceptual Process Wencan Zhang Brian Y. Lim AAML XAI 20 61 0 28 Dec 2021
What to Learn, and How: Toward Effective Learning from Rationales Samuel Carton Surya Kanoria Chenhao Tan 27 22 0 30 Nov 2021
Improving Deep Learning Interpretability by Saliency Guided Training Aya Abdelsalam Ismail H. C. Bravo S. Feizi FAtt 18 79 0 29 Nov 2021
Matching Learned Causal Effects of Neural Networks with Domain Priors Sai Srinivas Kancheti Abbavaram Gowtham Reddy V. Balasubramanian Amit Sharma CML 28 12 0 24 Nov 2021
Toward Learning Human-aligned Cross-domain Robust Models by Countering Misaligned Features Haohan Wang Zeyi Huang Hanlin Zhang Yong Jae Lee Eric P. Xing OOD 129 16 0 05 Nov 2021
Modeling Techniques for Machine Learning Fairness: A Survey Mingyang Wan Daochen Zha Ninghao Liu Na Zou SyDa FaML 27 36 0 04 Nov 2021