ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.00708
  4. Cited By
Interactive Analysis of LLMs using Meaningful Counterfactuals

Interactive Analysis of LLMs using Meaningful Counterfactuals

23 April 2024
Furui Cheng
Vilém Zouhar
Robin Shing Moon Chan
Daniel Fürst
Hendrik Strobelt
Mennatallah El-Assady
ArXivPDFHTML

Papers citing "Interactive Analysis of LLMs using Meaningful Counterfactuals"

7 / 7 papers shown
Title
Representation Engineering for Large-Language Models: Survey and Research Challenges
Representation Engineering for Large-Language Models: Survey and Research Challenges
Lukasz Bartoszcze
Sarthak Munshi
Bryan Sukidi
Jennifer Yen
Zejia Yang
David Williams-King
Linh Le
Kosi Asuzu
Carsten Maple
102
0
0
24 Feb 2025
Interpreting Language Reward Models via Contrastive Explanations
Interpreting Language Reward Models via Contrastive Explanations
Junqi Jiang
Tom Bewley
Saumitra Mishra
Freddy Lecue
Manuela Veloso
76
0
0
25 Nov 2024
Bias in Large Language Models: Origin, Evaluation, and Mitigation
Yufei Guo
Muzhe Guo
Juntao Su
Zhou Yang
Mengqiu Zhu
Hongfei Li
Mengyang Qiu
Shuo Shuo Liu
AILaw
33
10
0
16 Nov 2024
OCDB: Revisiting Causal Discovery with a Comprehensive Benchmark and
  Evaluation Framework
OCDB: Revisiting Causal Discovery with a Comprehensive Benchmark and Evaluation Framework
Wei Zhou
Hong Huang
Guowen Zhang
Ruize Shi
Kehan Yin
Yuanyuan Lin
Bang Liu
CML
50
1
0
07 Jun 2024
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large
  Language Models
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models
Minsuk Kahng
Ian Tenney
Mahima Pushkarna
Michael Xieyang Liu
James Wexler
Emily Reif
Krystal Kallarackal
Minsuk Chang
Michael Terry
Lucas Dixon
56
21
0
16 Feb 2024
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
339
12,003
0
04 Mar 2022
ViCE: Visual Counterfactual Explanations for Machine Learning Models
ViCE: Visual Counterfactual Explanations for Machine Learning Models
Oscar Gomez
Steffen Holter
Jun Yuan
E. Bertini
AAML
57
93
0
05 Mar 2020
1