ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.03100
  4. Cited By
Rethinking Model Evaluation as Narrowing the Socio-Technical Gap

Rethinking Model Evaluation as Narrowing the Socio-Technical Gap

1 June 2023
Q. V. Liao
Ziang Xiao
    ALM
    ELM
ArXivPDFHTML

Papers citing "Rethinking Model Evaluation as Narrowing the Socio-Technical Gap"

23 / 23 papers shown
Title
The AI Gap: How Socioeconomic Status Affects Language Technology Interactions
The AI Gap: How Socioeconomic Status Affects Language Technology Interactions
Elisa Bassignana
Amanda Cercas Curry
Dirk Hovy
13
0
0
17 May 2025
Unfettered Forceful Skill Acquisition with Physical Reasoning and Coordinate Frame Labeling
William Xie
Max Conway
Yutong Zhang
N. Correll
LM&Ro
LRM
40
0
0
14 May 2025
DICE: A Framework for Dimensional and Contextual Evaluation of Language Models
DICE: A Framework for Dimensional and Contextual Evaluation of Language Models
Aryan Shrivastava
Paula Akemi Aoyagui
38
0
0
14 Apr 2025
Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users
Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users
Antonia Karamolegkou
Malvina Nikandrou
Georgios Pantazopoulos
Danae Sanchez Villegas
Phillip Rust
Ruchira Dhar
Daniel Hershcovich
Anders Søgaard
44
0
0
28 Mar 2025
SPHERE: An Evaluation Card for Human-AI Systems
SPHERE: An Evaluation Card for Human-AI Systems
Qianou Ma
Dora Zhao
Xinran Zhao
Chenglei Si
Chenyang Yang
Ryan Louie
Ehud Reiter
Diyi Yang
Tongshuang Wu
ALM
68
1
0
24 Mar 2025
Browsing Lost Unformed Recollections: A Benchmark for Tip-of-the-Tongue Search and Reasoning
Browsing Lost Unformed Recollections: A Benchmark for Tip-of-the-Tongue Search and Reasoning
Sky CH-Wang
Darshan Deshpande
Smaranda Muresan
Anand Kannappan
Rebecca Qian
70
1
0
24 Mar 2025
VeriLA: A Human-Centered Evaluation Framework for Interpretable Verification of LLM Agent Failures
VeriLA: A Human-Centered Evaluation Framework for Interpretable Verification of LLM Agent Failures
Yoo Yeon Sung
H. Kim
Dan Zhang
68
1
0
16 Mar 2025
Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs
Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs
Zhaowei Zhang
Fengshuo Bai
Qizhi Chen
Chengdong Ma
Mingzhi Wang
Haoran Sun
Zilong Zheng
Yaodong Yang
78
3
0
26 Feb 2025
Understanding the LLM-ification of CHI: Unpacking the Impact of LLMs at CHI through a Systematic Literature Review
Understanding the LLM-ification of CHI: Unpacking the Impact of LLMs at CHI through a Systematic Literature Review
Rock Yuren Pang
Hope Schroeder
Kynnedy Simone Smith
Solon Barocas
Ziang Xiao
Emily Tseng
Danielle Bragg
92
3
0
22 Jan 2025
Can LLM "Self-report"?: Evaluating the Validity of Self-report Scales in Measuring Personality Design in LLM-based Chatbots
Huiqi Zou
Pengda Wang
Zihan Yan
Tianjun Sun
Ziang Xiao
101
1
0
29 Nov 2024
BetterBench: Assessing AI Benchmarks, Uncovering Issues, and
  Establishing Best Practices
BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best Practices
Anka Reuel
Amelia F. Hardy
Chandler Smith
Max Lamparth
Malcolm Hardy
Mykel J. Kochenderfer
ELM
95
18
0
20 Nov 2024
What the Harm? Quantifying the Tangible Impact of Gender Bias in Machine
  Translation with a Human-centered Study
What the Harm? Quantifying the Tangible Impact of Gender Bias in Machine Translation with a Human-centered Study
Beatrice Savoldi
Sara Papi
Matteo Negri
Ana Guerberof
L. Bentivogli
52
7
0
01 Oct 2024
Benchmarks as Microscopes: A Call for Model Metrology
Benchmarks as Microscopes: A Call for Model Metrology
Michael Stephen Saxon
Ari Holtzman
Peter West
William Y. Wang
Naomi Saphra
53
10
0
22 Jul 2024
Faux Polyglot: A Study on Information Disparity in Multilingual Large Language Models
Faux Polyglot: A Study on Information Disparity in Multilingual Large Language Models
Nikhil Sharma
Kenton Murray
Ziang Xiao
58
1
0
07 Jul 2024
ECBD: Evidence-Centered Benchmark Design for NLP
ECBD: Evidence-Centered Benchmark Design for NLP
Yu Lu Liu
Su Lin Blodgett
Jackie Chi Kit Cheung
Q. Vera Liao
Alexandra Olteanu
Ziang Xiao
47
10
0
13 Jun 2024
(Beyond) Reasonable Doubt: Challenges that Public Defenders Face in
  Scrutinizing AI in Court
(Beyond) Reasonable Doubt: Challenges that Public Defenders Face in Scrutinizing AI in Court
Angela Jin
Niloufar Salehi
ELM
41
2
0
13 Mar 2024
Generative Echo Chamber? Effects of LLM-Powered Search Systems on
  Diverse Information Seeking
Generative Echo Chamber? Effects of LLM-Powered Search Systems on Diverse Information Seeking
Nikhil Sharma
Q. V. Liao
Ziang Xiao
45
19
0
08 Feb 2024
Does Writing with Language Models Reduce Content Diversity?
Does Writing with Language Models Reduce Content Diversity?
Vishakh Padmakumar
He He
50
83
0
11 Sep 2023
Identifying and Mitigating the Security Risks of Generative AI
Identifying and Mitigating the Security Risks of Generative AI
Clark W. Barrett
Bradley L Boyd
Ellie Burzstein
Nicholas Carlini
Brad Chen
...
Zulfikar Ramzan
Khawaja Shams
D. Song
Ankur Taly
Diyi Yang
SILM
46
93
0
28 Aug 2023
Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and
  Their Implications
Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications
Kaitlyn Zhou
Su Lin Blodgett
Adam Trischler
Hal Daumé
Kaheer Suleman
Alexandra Olteanu
ELM
99
26
0
13 May 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
384
12,081
0
04 Mar 2022
Towards A Rigorous Science of Interpretable Machine Learning
Towards A Rigorous Science of Interpretable Machine Learning
Finale Doshi-Velez
Been Kim
XAI
FaML
257
3,696
0
28 Feb 2017
Teaching Machines to Read and Comprehend
Teaching Machines to Read and Comprehend
Karl Moritz Hermann
Tomás Kociský
Edward Grefenstette
L. Espeholt
W. Kay
Mustafa Suleyman
Phil Blunsom
211
3,515
0
10 Jun 2015
1