ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.07971
  4. Cited By
SPHERE: An Evaluation Card for Human-AI Systems

SPHERE: An Evaluation Card for Human-AI Systems

24 March 2025
Qianou Ma
Dora Zhao
Xinran Zhao
Chenglei Si
Chenyang Yang
Ryan Louie
Ehud Reiter
Diyi Yang
Tongshuang Wu
    ALM
ArXivPDFHTML

Papers citing "SPHERE: An Evaluation Card for Human-AI Systems"

15 / 15 papers shown
Title
A Survey on Large Language Model based Human-Agent Systems
A Survey on Large Language Model based Human-Agent Systems
Henry Peng Zou
Wei-Chieh Huang
Yaozu Wu
Yankai Chen
Chunyu Miao
...
Yongbin Li
Dongyuan Li
Dongyuan Li
Xue Liu
Philip S. Yu
LLMAG
LM&Ro
LM&MA
119
0
0
01 May 2025
LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
A. Bavaresco
Raffaella Bernardi
Leonardo Bertolazzi
Desmond Elliott
Raquel Fernández
...
David Schlangen
Alessandro Suglia
Aditya K Surikuchi
Ece Takmaz
A. Testoni
ALM
ELM
68
69
0
26 Jun 2024
Proofread: Fixes All Errors with One Tap
Proofread: Fixes All Errors with One Tap
Renjie Liu
Yanxiang Zhang
Yun Zhu
Haicheng Sun
Yuanbo Zhang
Michael Xuelin Huang
Shanqing Cai
Lei Meng
Shumin Zhai
ALM
48
3
0
06 Jun 2024
DuetSim: Building User Simulator with Dual Large Language Models for
  Task-Oriented Dialogues
DuetSim: Building User Simulator with Dual Large Language Models for Task-Oriented Dialogues
Xiang Luo
Zhiwen Tang
Jin Wang
Xuejie Zhang
70
5
0
16 May 2024
The RealHumanEval: Evaluating Large Language Models' Abilities to
  Support Programmers
The RealHumanEval: Evaluating Large Language Models' Abilities to Support Programmers
Hussein Mozannar
Valerie Chen
Mohammed Alsobay
Subhro Das
Sebastian Zhao
Dennis L. Wei
Manish Nagireddy
P. Sattigeri
Ameet Talwalkar
David Sontag
ELM
62
18
0
03 Apr 2024
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Wei-Lin Chiang
Lianmin Zheng
Ying Sheng
Anastasios Nikolas Angelopoulos
Tianle Li
...
Hao Zhang
Banghua Zhu
Michael I. Jordan
Joseph E. Gonzalez
Ion Stoica
OSLM
80
536
0
07 Mar 2024
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
217
4,085
0
09 Jun 2023
Help me write a poem: Instruction Tuning as a Vehicle for Collaborative
  Poetry Writing
Help me write a poem: Instruction Tuning as a Vehicle for Collaborative Poetry Writing
Tuhin Chakrabarty
Vishakh Padmakumar
Hengxing He
30
73
0
25 Oct 2022
Understanding Machine Learning Practitioners' Data Documentation
  Perceptions, Needs, Challenges, and Desiderata
Understanding Machine Learning Practitioners' Data Documentation Perceptions, Needs, Challenges, and Desiderata
A. Heger
Elizabeth B. Marquis
Mihaela Vorvoreanu
Hanna M. Wallach
J. W. Vaughan
49
61
0
06 Jun 2022
User-Driven Research of Medical Note Generation Software
User-Driven Research of Medical Note Generation Software
Tom Knoll
Francesco Moramarco
Alex Papadopoulos Korfiatis
Rachel D. Young
C. Ruffini
Mark Perera
Christian Perstl
Ehud Reiter
Anya Belz
Aleksandar Savkov
38
21
0
05 May 2022
AI Chains: Transparent and Controllable Human-AI Interaction by Chaining
  Large Language Model Prompts
AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts
Tongshuang Wu
Michael Terry
Carrie J. Cai
LLMAG
AI4CE
LRM
58
450
0
04 Oct 2021
A Comparative Analysis of Industry Human-AI Interaction Guidelines
A Comparative Analysis of Industry Human-AI Interaction Guidelines
Austin P. Wright
Zijie J. Wang
Haekyu Park
G. Guo
F. Sperrle
Mennatallah El-Assady
Alex Endert
Daniel A. Keim
Duen Horng Chau
24
30
0
22 Oct 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
440
41,106
0
28 May 2020
BERTScore: Evaluating Text Generation with BERT
BERTScore: Evaluating Text Generation with BERT
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
187
5,668
0
21 Apr 2019
Datasheets for Datasets
Datasheets for Datasets
Timnit Gebru
Jamie Morgenstern
Briana Vecchione
Jennifer Wortman Vaughan
Hanna M. Wallach
Hal Daumé
Kate Crawford
209
2,158
0
23 Mar 2018
1