Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.04732
Cited By
Zeno: An Interactive Framework for Behavioral Evaluation of Machine Learning
9 February 2023
Ángel Alexander Cabrera
Erica Fu
Donald Bertucci
Kenneth Holstein
Ameet Talwalkar
Jason I. Hong
Adam Perer
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Zeno: An Interactive Framework for Behavioral Evaluation of Machine Learning"
11 / 11 papers shown
Title
What's the Difference? Supporting Users in Identifying the Effects of Prompt and Model Changes Through Token Patterns
Michael A. Hedderich
Anyi Wang
Raoyuan Zhao
Florian Eichin
Barbara Plank
37
0
0
22 Apr 2025
Orbit: A Framework for Designing and Evaluating Multi-objective Rankers
Chenyang Yang
Tesi Xiao
Michael Shavlovsky
Christian Kastner
Tongshuang Wu
42
0
0
07 Nov 2024
Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments
Angie Boggust
Venkatesh Sivaraman
Yannick Assogba
Donghao Ren
Dominik Moritz
Fred Hohman
VLM
60
3
0
06 Aug 2024
Canvil: Designerly Adaptation for LLM-Powered User Experiences
K. J. Kevin Feng
Q. V. Liao
Ziang Xiao
Jennifer Wortman Vaughan
Amy X. Zhang
David W. McDonald
51
17
0
17 Jan 2024
Beyond Testers' Biases: Guiding Model Testing with Knowledge Bases using LLMs
Chenyang Yang
Rishabh Rustogi
Rachel A. Brower-Sinning
Grace A. Lewis
Christian Kastner
Tongshuang Wu
KELM
38
12
0
14 Oct 2023
EvalLM: Interactive Evaluation of Large Language Model Prompts on User-Defined Criteria
Tae Soo Kim
Yoonjoo Lee
Jamin Shin
Young-Ho Kim
Juho Kim
34
69
0
24 Sep 2023
Gentopia: A Collaborative Platform for Tool-Augmented LLMs
Binfeng Xu
Xukun Liu
Hua Shen
Zeyu Han
Yuhan Li
Murong Yue
Zhi-Ping Peng
Yuchen Liu
Ziyu Yao
Dongkuan Xu
LLMAG
30
19
0
08 Aug 2023
AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap
Q. V. Liao
J. Vaughan
58
159
0
02 Jun 2023
Exploring How Machine Learning Practitioners (Try To) Use Fairness Toolkits
Wesley Hanwen Deng
Manish Nagireddy
M. S. Lee
Jatinder Singh
Zhiwei Steven Wu
Kenneth Holstein
Haiyi Zhu
45
88
0
13 May 2022
Discovering and Validating AI Errors With Crowdsourced Failure Reports
Ángel Alexander Cabrera
Abraham J. Druck
Jason I. Hong
Adam Perer
HAI
68
54
0
23 Sep 2021
Robustness Gym: Unifying the NLP Evaluation Landscape
Karan Goel
Nazneen Rajani
Jesse Vig
Samson Tan
Jason M. Wu
Stephan Zheng
Caiming Xiong
Joey Tianyi Zhou
Christopher Ré
AAML
OffRL
OOD
154
137
0
13 Jan 2021
1