v1v2 (latest)

Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models

1 January 2021

Tongshuang Wu

Marco Tulio Ribeiro

Jeffrey Heer

Daniel S. Weld

ArXiv (abs)PDF HTML

Papers citing "Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models"

50 / 182 papers shown

Title
Towards detecting unanticipated bias in Large Language Models Anna Kruspe 83 4 0 03 Apr 2024
A Rationale-centric Counterfactual Data Augmentation Method for Cross-Document Event Coreference Resolution Bowen Ding Qingkai Min Shengkun Ma Yingjie Li Linyi Yang Yue Zhang 69 6 0 02 Apr 2024
RORA: Robust Free-Text Rationale Evaluation Zhengping Jiang Yining Lu Hanjie Chen Daniel Khashabi Benjamin Van Durme Anqi Liu 90 3 0 28 Feb 2024
LLMs with Chain-of-Thought Are Non-Causal Reasoners Guangsheng Bao Hongbo Zhang Linyi Yang Cunxiang Wang Yue Zhang LRM 19 19 0 25 Feb 2024
Clarify: Improving Model Robustness With Natural Language Corrections Yoonho Lee Michelle S. Lam Helena Vasconcelos Michael S. Bernstein Chelsea Finn 80 7 0 06 Feb 2024
LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools and Self-Explanations Qianli Wang Tatiana Anikina Nils Feldhus Josef van Genabith Leonhard Hennig Sebastian Möller ELM LRM 105 10 0 23 Jan 2024
Towards a Non-Ideal Methodological Framework for Responsible ML Ramaravind Kommiya Mothilal Shion Guha Syed Ishtiaque Ahmed 102 8 0 20 Jan 2024
An Empirical Study of Counterfactual Visualization to Support Visual Causal Inference Arran Zeyu Wang D. Borland David Gotz CML 91 11 0 16 Jan 2024
Are self-explanations from Large Language Models faithful? Andreas Madsen Sarath Chandar Siva Reddy LRM 110 36 0 15 Jan 2024
Sparsity-Guided Holistic Explanation for LLMs with Interpretable Inference-Time Intervention Zhen Tan Tianlong Chen Zhenyu Zhang Huan Liu 98 17 0 22 Dec 2023
InstructPipe: Generating Visual Blocks Pipelines with Human Instructions and LLMs Zhongyi Zhou Jing Jin Vrushank Phadnis Xiuxiu Yuan Jun Jiang ... A. Olwal David Kim Ram Iyengar Na Li Andrea Colaço 65 5 0 15 Dec 2023
Using Captum to Explain Generative Language Models Vivek Miglani Aobo Yang Aram H. Markosyan Diego Garcia-Olano Narine Kokhlikyan 104 33 0 09 Dec 2023
TIBET: Identifying and Evaluating Biases in Text-to-Image Generative Models Aditya Chinchure Pushkar Shukla Gaurav Bhatt Kiri Salij K. Hosanagar Leonid Sigal Matthew Turk 95 29 0 03 Dec 2023
SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples Phillip Howard Avinash Madasu Tiep Le Gustavo Lujan Moreno Anahita Bhiwandiwalla Vasudev Lal 123 24 0 30 Nov 2023
Attribution and Alignment: Effects of Local Context Repetition on Utterance Production and Comprehension in Dialogue Aron Molnar Jaap Jumelet Mario Giulianelli Arabella J. Sinclair 70 2 0 21 Nov 2023
Measuring and Improving Attentiveness to Partial Inputs with Counterfactuals Yanai Elazar Bhargavi Paranjape Hao Peng Sarah Wiegreffe Khyathi Raghavi Vivek Srikumar Sameer Singh Noah A. Smith AAML OOD 70 0 0 16 Nov 2023
Using Natural Language Explanations to Improve Robustness of In-context Learning Xuanli He Yuxiang Wu Oana-Maria Camburu Pasquale Minervini Pontus Stenetorp AAML 89 1 0 13 Nov 2023
Interpreting Pretrained Language Models via Concept Bottlenecks Zhen Tan Lu Cheng Song Wang Yuan Bo Wenlin Yao Huan Liu LRM 98 25 0 08 Nov 2023
Quantifying Uncertainty in Natural Language Explanations of Large Language Models Sree Harsha Tanneru Chirag Agarwal Himabindu Lakkaraju LRM 68 15 0 06 Nov 2023
"Honey, Tell Me What's Wrong", Global Explanation of Textual Discriminative Models through Cooperative Generation Antoine Chaffin Julien Delaunay 22 0 0 27 Oct 2023
Break it, Imitate it, Fix it: Robustness by Generating Human-Like Attacks Aradhana Sinha Ananth Balashankar Ahmad Beirami Thi Avrahami Jilin Chen Alex Beutel AAML 83 4 0 25 Oct 2023
Sum-of-Parts: Self-Attributing Neural Networks with End-to-End Learning of Feature Groups Weiqiu You Helen Qu Marco Gatti Bhuvnesh Jain Eric Wong FAtt FaML 99 4 0 25 Oct 2023
Towards Conceptualization of "Fair Explanation": Disparate Impacts of anti-Asian Hate Speech Explanations on Content Moderators Tin Trung Nguyen Jiannan Xu Aayushi Roy Hal Daumé Marine Carpuat 75 5 0 23 Oct 2023
EXPLAIN, EDIT, GENERATE: Rationale-Sensitive Counterfactual Data Augmentation for Multi-hop Fact Verification Yingjie Zhu Jiasheng Si Yibo Zhao Haiyang Zhu Deyu Zhou Yulan He 93 7 0 23 Oct 2023
Faithfulness Measurable Masked Language Models Andreas Madsen Siva Reddy Sarath Chandar 85 3 0 11 Oct 2023
InterroLang: Exploring NLP Models and Datasets through Dialogue-based Explanations Nils Feldhus Qianli Wang Tatiana Anikina Sahil Chopra Cennet Oguz Sebastian Möller 95 14 0 09 Oct 2023
Model Compression in Practice: Lessons Learned from Practitioners Creating On-device Machine Learning Experiences Fred Hohman Mary Beth Kery Donghao Ren Dominik Moritz 111 19 0 06 Oct 2023
From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning Xuansheng Wu Wenlin Yao Jianshu Chen Xiaoman Pan Xiaoyang Wang Ninghao Liu Dong Yu LRM 97 33 0 30 Sep 2023
EvalLM: Interactive Evaluation of Large Language Model Prompts on User-Defined Criteria Tae Soo Kim Yoonjoo Lee Jamin Shin Young-Ho Kim Juho Kim 108 72 0 24 Sep 2023
Towards LLM-guided Causal Explainability for Black-box Text Classifiers Amrita Bhattacharjee Raha Moraffah Joshua Garland Huan Liu 101 40 0 23 Sep 2023
COCO-Counterfactuals: Automatically Constructed Counterfactual Examples for Image-Text Pairs Tiep Le Vasudev Lal Phillip Howard DiffM 90 30 0 23 Sep 2023
CATfOOD: Counterfactual Augmented Training for Improving Out-of-Domain Performance and Calibration Rachneet Sachdeva Martin Tutek Iryna Gurevych OODD 102 13 0 14 Sep 2023
Explainability for Large Language Models: A Survey Haiyan Zhao Hanjie Chen Fan Yang Ninghao Liu Huiqi Deng Hengyi Cai Shuaiqiang Wang D. Yin Jundong Li LRM 108 472 0 02 Sep 2023
PEANUT: A Human-AI Collaborative Tool for Annotating Audio-Visual Data Zheng Zhang Zheng Ning Chenliang Xu Yapeng Tian Toby Jia-Jun Li 98 7 0 27 Jul 2023
CommonsenseVIS: Visualizing and Understanding Commonsense Reasoning Capabilities of Natural Language Models Xingbo Wang Renfei Huang Zhihua Jin Tianqing Fang Huamin Qu VLM ReLM LRM 136 2 0 23 Jul 2023
Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations Yanda Chen Ruiqi Zhong Narutatsu Ri Chen Zhao He He Jacob Steinhardt Zhou Yu Kathleen McKeown LRM 98 55 0 17 Jul 2023
Power-up! What Can Generative Models Do for Human Computation Workflows? Garrett Allen Gaole He U. Gadiraju 127 3 0 05 Jul 2023
Concept-Based Explanations to Test for False Causal Relationships Learned by Abusive Language Classifiers I. Nejadgholi S. Kiritchenko Kathleen C. Fraser Esma Balkir 60 0 0 04 Jul 2023
On Evaluating and Mitigating Gender Biases in Multilingual Settings Aniket Vashishtha Kabir Ahuja Sunayana Sitaram 95 26 0 04 Jul 2023
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models Neel Jain Khalid Saifullah Yuxin Wen John Kirchenbauer Manli Shu Aniruddha Saha Micah Goldblum Jonas Geiping Tom Goldstein ALM ELM 102 23 0 23 Jun 2023
Towards Explainable Evaluation Metrics for Machine Translation Christoph Leiter Piyawat Lertvittayakumjorn M. Fomicheva Wei Zhao Yang Gao Steffen Eger ELM 104 15 0 22 Jun 2023
Towards Regulatable AI Systems: Technical Gaps and Policy Opportunities Xudong Shen H. Brown Jiashu Tao Martin Strobel Yao Tong Akshay Narayan Harold Soh Finale Doshi-Velez 105 3 0 22 Jun 2023
Which Spurious Correlations Impact Reasoning in NLI Models? A Visual Interactive Diagnosis through Data-Constrained Counterfactuals Robin Shing Moon Chan Afra Amini Mennatallah El-Assady LRM AAML 83 2 0 21 Jun 2023
Causal Effect Regularization: Automated Detection and Removal of Spurious Attributes Abhinav Kumar Amit Deshpande Ajay Sharma CML 103 1 0 19 Jun 2023
Cross-Modal Attribute Insertions for Assessing the Robustness of Vision-and-Language Learning Shivaen Ramshetty Gaurav Verma Srijan Kumar 82 2 0 19 Jun 2023
Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations Lifan Yuan Yangyi Chen Ganqu Cui Hongcheng Gao Fangyuan Zou Xingyi Cheng Heng Ji Zhiyuan Liu Maosong Sun 148 84 0 07 Jun 2023
Reason to explain: Interactive contrastive explanations (REASONX) Laura State Salvatore Ruggieri Franco Turini LRM 107 1 0 29 May 2023
Faithfulness Tests for Natural Language Explanations Pepa Atanasova Oana-Maria Camburu Christina Lioma Thomas Lukasiewicz J. Simonsen Isabelle Augenstein FAtt 120 67 0 29 May 2023
CREST: A Joint Framework for Rationalization and Counterfactual Text Generation Marcos Vinícius Treviso Alexis Ross Nuno M. Guerreiro André F.T. Martins 102 17 0 26 May 2023
Counterfactuals of Counterfactuals: a back-translation-inspired approach to analyse counterfactual editors Giorgos Filandrianos Edmund Dervakos Orfeas Menis Mastromichalakis Chrysoula Zerva Giorgos Stamou AAML 90 5 0 26 May 2023