Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2101.00288
Cited By
v1
v2 (latest)
Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models
1 January 2021
Tongshuang Wu
Marco Tulio Ribeiro
Jeffrey Heer
Daniel S. Weld
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models"
50 / 182 papers shown
Title
Towards detecting unanticipated bias in Large Language Models
Anna Kruspe
83
4
0
03 Apr 2024
A Rationale-centric Counterfactual Data Augmentation Method for Cross-Document Event Coreference Resolution
Bowen Ding
Qingkai Min
Shengkun Ma
Yingjie Li
Linyi Yang
Yue Zhang
69
6
0
02 Apr 2024
RORA: Robust Free-Text Rationale Evaluation
Zhengping Jiang
Yining Lu
Hanjie Chen
Daniel Khashabi
Benjamin Van Durme
Anqi Liu
90
3
0
28 Feb 2024
LLMs with Chain-of-Thought Are Non-Causal Reasoners
Guangsheng Bao
Hongbo Zhang
Linyi Yang
Cunxiang Wang
Yue Zhang
LRM
19
19
0
25 Feb 2024
Clarify: Improving Model Robustness With Natural Language Corrections
Yoonho Lee
Michelle S. Lam
Helena Vasconcelos
Michael S. Bernstein
Chelsea Finn
80
7
0
06 Feb 2024
LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools and Self-Explanations
Qianli Wang
Tatiana Anikina
Nils Feldhus
Josef van Genabith
Leonhard Hennig
Sebastian Möller
ELM
LRM
105
10
0
23 Jan 2024
Towards a Non-Ideal Methodological Framework for Responsible ML
Ramaravind Kommiya Mothilal
Shion Guha
Syed Ishtiaque Ahmed
102
8
0
20 Jan 2024
An Empirical Study of Counterfactual Visualization to Support Visual Causal Inference
Arran Zeyu Wang
D. Borland
David Gotz
CML
91
11
0
16 Jan 2024
Are self-explanations from Large Language Models faithful?
Andreas Madsen
Sarath Chandar
Siva Reddy
LRM
110
36
0
15 Jan 2024
Sparsity-Guided Holistic Explanation for LLMs with Interpretable Inference-Time Intervention
Zhen Tan
Tianlong Chen
Zhenyu Zhang
Huan Liu
98
17
0
22 Dec 2023
InstructPipe: Generating Visual Blocks Pipelines with Human Instructions and LLMs
Zhongyi Zhou
Jing Jin
Vrushank Phadnis
Xiuxiu Yuan
Jun Jiang
...
A. Olwal
David Kim
Ram Iyengar
Na Li
Andrea Colaço
65
5
0
15 Dec 2023
Using Captum to Explain Generative Language Models
Vivek Miglani
Aobo Yang
Aram H. Markosyan
Diego Garcia-Olano
Narine Kokhlikyan
104
33
0
09 Dec 2023
TIBET: Identifying and Evaluating Biases in Text-to-Image Generative Models
Aditya Chinchure
Pushkar Shukla
Gaurav Bhatt
Kiri Salij
K. Hosanagar
Leonid Sigal
Matthew Turk
95
29
0
03 Dec 2023
SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples
Phillip Howard
Avinash Madasu
Tiep Le
Gustavo Lujan Moreno
Anahita Bhiwandiwalla
Vasudev Lal
123
24
0
30 Nov 2023
Attribution and Alignment: Effects of Local Context Repetition on Utterance Production and Comprehension in Dialogue
Aron Molnar
Jaap Jumelet
Mario Giulianelli
Arabella J. Sinclair
70
2
0
21 Nov 2023
Measuring and Improving Attentiveness to Partial Inputs with Counterfactuals
Yanai Elazar
Bhargavi Paranjape
Hao Peng
Sarah Wiegreffe
Khyathi Raghavi
Vivek Srikumar
Sameer Singh
Noah A. Smith
AAML
OOD
70
0
0
16 Nov 2023
Using Natural Language Explanations to Improve Robustness of In-context Learning
Xuanli He
Yuxiang Wu
Oana-Maria Camburu
Pasquale Minervini
Pontus Stenetorp
AAML
89
1
0
13 Nov 2023
Interpreting Pretrained Language Models via Concept Bottlenecks
Zhen Tan
Lu Cheng
Song Wang
Yuan Bo
Wenlin Yao
Huan Liu
LRM
98
25
0
08 Nov 2023
Quantifying Uncertainty in Natural Language Explanations of Large Language Models
Sree Harsha Tanneru
Chirag Agarwal
Himabindu Lakkaraju
LRM
68
15
0
06 Nov 2023
"Honey, Tell Me What's Wrong", Global Explanation of Textual Discriminative Models through Cooperative Generation
Antoine Chaffin
Julien Delaunay
22
0
0
27 Oct 2023
Break it, Imitate it, Fix it: Robustness by Generating Human-Like Attacks
Aradhana Sinha
Ananth Balashankar
Ahmad Beirami
Thi Avrahami
Jilin Chen
Alex Beutel
AAML
83
4
0
25 Oct 2023
Sum-of-Parts: Self-Attributing Neural Networks with End-to-End Learning of Feature Groups
Weiqiu You
Helen Qu
Marco Gatti
Bhuvnesh Jain
Eric Wong
FAtt
FaML
99
4
0
25 Oct 2023
Towards Conceptualization of "Fair Explanation": Disparate Impacts of anti-Asian Hate Speech Explanations on Content Moderators
Tin Trung Nguyen
Jiannan Xu
Aayushi Roy
Hal Daumé
Marine Carpuat
75
5
0
23 Oct 2023
EXPLAIN, EDIT, GENERATE: Rationale-Sensitive Counterfactual Data Augmentation for Multi-hop Fact Verification
Yingjie Zhu
Jiasheng Si
Yibo Zhao
Haiyang Zhu
Deyu Zhou
Yulan He
93
7
0
23 Oct 2023
Faithfulness Measurable Masked Language Models
Andreas Madsen
Siva Reddy
Sarath Chandar
85
3
0
11 Oct 2023
InterroLang: Exploring NLP Models and Datasets through Dialogue-based Explanations
Nils Feldhus
Qianli Wang
Tatiana Anikina
Sahil Chopra
Cennet Oguz
Sebastian Möller
95
14
0
09 Oct 2023
Model Compression in Practice: Lessons Learned from Practitioners Creating On-device Machine Learning Experiences
Fred Hohman
Mary Beth Kery
Donghao Ren
Dominik Moritz
111
19
0
06 Oct 2023
From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning
Xuansheng Wu
Wenlin Yao
Jianshu Chen
Xiaoman Pan
Xiaoyang Wang
Ninghao Liu
Dong Yu
LRM
97
33
0
30 Sep 2023
EvalLM: Interactive Evaluation of Large Language Model Prompts on User-Defined Criteria
Tae Soo Kim
Yoonjoo Lee
Jamin Shin
Young-Ho Kim
Juho Kim
108
72
0
24 Sep 2023
Towards LLM-guided Causal Explainability for Black-box Text Classifiers
Amrita Bhattacharjee
Raha Moraffah
Joshua Garland
Huan Liu
101
40
0
23 Sep 2023
COCO-Counterfactuals: Automatically Constructed Counterfactual Examples for Image-Text Pairs
Tiep Le
Vasudev Lal
Phillip Howard
DiffM
90
30
0
23 Sep 2023
CATfOOD: Counterfactual Augmented Training for Improving Out-of-Domain Performance and Calibration
Rachneet Sachdeva
Martin Tutek
Iryna Gurevych
OODD
102
13
0
14 Sep 2023
Explainability for Large Language Models: A Survey
Haiyan Zhao
Hanjie Chen
Fan Yang
Ninghao Liu
Huiqi Deng
Hengyi Cai
Shuaiqiang Wang
D. Yin
Jundong Li
LRM
108
472
0
02 Sep 2023
PEANUT: A Human-AI Collaborative Tool for Annotating Audio-Visual Data
Zheng Zhang
Zheng Ning
Chenliang Xu
Yapeng Tian
Toby Jia-Jun Li
98
7
0
27 Jul 2023
CommonsenseVIS: Visualizing and Understanding Commonsense Reasoning Capabilities of Natural Language Models
Xingbo Wang
Renfei Huang
Zhihua Jin
Tianqing Fang
Huamin Qu
VLM
ReLM
LRM
136
2
0
23 Jul 2023
Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
Yanda Chen
Ruiqi Zhong
Narutatsu Ri
Chen Zhao
He He
Jacob Steinhardt
Zhou Yu
Kathleen McKeown
LRM
98
55
0
17 Jul 2023
Power-up! What Can Generative Models Do for Human Computation Workflows?
Garrett Allen
Gaole He
U. Gadiraju
127
3
0
05 Jul 2023
Concept-Based Explanations to Test for False Causal Relationships Learned by Abusive Language Classifiers
I. Nejadgholi
S. Kiritchenko
Kathleen C. Fraser
Esma Balkir
60
0
0
04 Jul 2023
On Evaluating and Mitigating Gender Biases in Multilingual Settings
Aniket Vashishtha
Kabir Ahuja
Sunayana Sitaram
95
26
0
04 Jul 2023
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models
Neel Jain
Khalid Saifullah
Yuxin Wen
John Kirchenbauer
Manli Shu
Aniruddha Saha
Micah Goldblum
Jonas Geiping
Tom Goldstein
ALM
ELM
102
23
0
23 Jun 2023
Towards Explainable Evaluation Metrics for Machine Translation
Christoph Leiter
Piyawat Lertvittayakumjorn
M. Fomicheva
Wei Zhao
Yang Gao
Steffen Eger
ELM
104
15
0
22 Jun 2023
Towards Regulatable AI Systems: Technical Gaps and Policy Opportunities
Xudong Shen
H. Brown
Jiashu Tao
Martin Strobel
Yao Tong
Akshay Narayan
Harold Soh
Finale Doshi-Velez
105
3
0
22 Jun 2023
Which Spurious Correlations Impact Reasoning in NLI Models? A Visual Interactive Diagnosis through Data-Constrained Counterfactuals
Robin Shing Moon Chan
Afra Amini
Mennatallah El-Assady
LRM
AAML
83
2
0
21 Jun 2023
Causal Effect Regularization: Automated Detection and Removal of Spurious Attributes
Abhinav Kumar
Amit Deshpande
Ajay Sharma
CML
103
1
0
19 Jun 2023
Cross-Modal Attribute Insertions for Assessing the Robustness of Vision-and-Language Learning
Shivaen Ramshetty
Gaurav Verma
Srijan Kumar
82
2
0
19 Jun 2023
Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations
Lifan Yuan
Yangyi Chen
Ganqu Cui
Hongcheng Gao
Fangyuan Zou
Xingyi Cheng
Heng Ji
Zhiyuan Liu
Maosong Sun
148
84
0
07 Jun 2023
Reason to explain: Interactive contrastive explanations (REASONX)
Laura State
Salvatore Ruggieri
Franco Turini
LRM
107
1
0
29 May 2023
Faithfulness Tests for Natural Language Explanations
Pepa Atanasova
Oana-Maria Camburu
Christina Lioma
Thomas Lukasiewicz
J. Simonsen
Isabelle Augenstein
FAtt
120
67
0
29 May 2023
CREST: A Joint Framework for Rationalization and Counterfactual Text Generation
Marcos Vinícius Treviso
Alexis Ross
Nuno M. Guerreiro
André F.T. Martins
102
17
0
26 May 2023
Counterfactuals of Counterfactuals: a back-translation-inspired approach to analyse counterfactual editors
Giorgos Filandrianos
Edmund Dervakos
Orfeas Menis Mastromichalakis
Chrysoula Zerva
Giorgos Stamou
AAML
90
5
0
26 May 2023
Previous
1
2
3
4
Next