ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.00708
  4. Cited By
Understanding Large Language Model Behaviors through Interactive Counterfactual Generation and Analysis
v1v2 (latest)

Understanding Large Language Model Behaviors through Interactive Counterfactual Generation and Analysis

23 April 2024
Furui Cheng
Vilém Zouhar
Robin Shing Moon Chan
Daniel Fürst
Hendrik Strobelt
Mennatallah El-Assady
ArXiv (abs)PDFHTMLGithub

Papers citing "Understanding Large Language Model Behaviors through Interactive Counterfactual Generation and Analysis"

8 / 8 papers shown
Representation Engineering for Large-Language Models: Survey and Research Challenges
Representation Engineering for Large-Language Models: Survey and Research Challenges
Lukasz Bartoszcze
Sarthak Munshi
Bryan Sukidi
Jennifer Yen
Zejia Yang
David Williams-King
Linh Le
Kosi Asuzu
Carsten Maple
555
13
0
24 Feb 2025
Interpreting Language Reward Models via Contrastive Explanations
Interpreting Language Reward Models via Contrastive ExplanationsInternational Conference on Learning Representations (ICLR), 2024
Junqi Jiang
Tom Bewley
Saumitra Mishra
Freddy Lecue
Manuela Veloso
591
8
0
25 Nov 2024
Bias in Large Language Models: Origin, Evaluation, and Mitigation
Yufei Guo
Muzhe Guo
Juntao Su
Zhou Yang
Mengqiu Zhu
Hongfei Li
Mengyang Qiu
Shuo Shuo Liu
AILaw
405
101
0
16 Nov 2024
OCDB: Revisiting Causal Discovery with a Comprehensive Benchmark and
  Evaluation Framework
OCDB: Revisiting Causal Discovery with a Comprehensive Benchmark and Evaluation Framework
Wei Zhou
Hong Huang
Guowen Zhang
Ruize Shi
Kehan Yin
Yuanyuan Lin
Bang Liu
CML
285
1
0
07 Jun 2024
JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models
JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models
Yingchaojie Feng
Zhizhang Chen
Zhining Kang
Sijia Wang
Haoyu Tian
Wei Zhang
Minfeng Zhu
Wei Chen
421
10
0
12 Apr 2024
The Language Interpretability Tool: Extensible, Interactive
  Visualizations and Analysis for NLP Models
The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Ian Tenney
James Wexler
Jasmijn Bastings
Tolga Bolukbasi
Andy Coenen
...
Ellen Jiang
Mahima Pushkarna
Carey Radebaugh
Emily Reif
Ann Yuan
VLM
444
213
0
12 Aug 2020
A Unified Approach to Interpreting Model Predictions
A Unified Approach to Interpreting Model Predictions
Scott M. Lundberg
Su-In Lee
FAtt
5.2K
32,979
0
22 May 2017
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
Marco Tulio Ribeiro
Sameer Singh
Carlos Guestrin
FAttFaML
2.7K
21,359
0
16 Feb 2016
1
Page 1 of 1