ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.13633
  4. Cited By
EvalLM: Interactive Evaluation of Large Language Model Prompts on
  User-Defined Criteria

EvalLM: Interactive Evaluation of Large Language Model Prompts on User-Defined Criteria

24 September 2023
Tae Soo Kim
Yoonjoo Lee
Jamin Shin
Young-Ho Kim
Juho Kim
ArXivPDFHTML

Papers citing "EvalLM: Interactive Evaluation of Large Language Model Prompts on User-Defined Criteria"

16 / 16 papers shown
Title
SPHERE: An Evaluation Card for Human-AI Systems
SPHERE: An Evaluation Card for Human-AI Systems
Qianou Ma
Dora Zhao
Xinran Zhao
Chenglei Si
Chenyang Yang
Ryan Louie
Ehud Reiter
Diyi Yang
Tongshuang Wu
ALM
50
0
0
24 Mar 2025
Gensors: Authoring Personalized Visual Sensors with Multimodal Foundation Models and Reasoning
Michael Xieyang Liu
S. Petridis
Vivian Tsai
Alexander J. Fiannaca
Alex Olwal
Michael Terry
Carrie J. Cai
LRM
42
1
0
28 Jan 2025
Understanding the LLM-ification of CHI: Unpacking the Impact of LLMs at CHI through a Systematic Literature Review
Understanding the LLM-ification of CHI: Unpacking the Impact of LLMs at CHI through a Systematic Literature Review
Rock Yuren Pang
Hope Schroeder
Kynnedy Simone Smith
Solon Barocas
Ziang Xiao
Emily Tseng
Danielle Bragg
77
3
0
22 Jan 2025
Orbit: A Framework for Designing and Evaluating Multi-objective Rankers
Orbit: A Framework for Designing and Evaluating Multi-objective Rankers
Chenyang Yang
Tesi Xiao
Michael Shavlovsky
Christian Kastner
Tongshuang Wu
37
0
0
07 Nov 2024
A Systematic Survey and Critical Review on Evaluating Large Language
  Models: Challenges, Limitations, and Recommendations
A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations
Md Tahmid Rahman Laskar
Sawsan Alqahtani
M Saiful Bari
Mizanur Rahman
Mohammad Abdullah Matin Khan
...
Chee Wei Tan
Md. Rizwan Parvez
Enamul Hoque
Shafiq R. Joty
Jimmy Huang
ELM
ALM
29
27
0
04 Jul 2024
LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video
  Editing
LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing
Bryan Wang
Yuliang Li
Zhaoyang Lv
Haijun Xia
Yan Xu
Raj Sodhi
32
39
0
15 Feb 2024
GhostWriter: Augmenting Collaborative Human-AI Writing Experiences Through Personalization and Agency
GhostWriter: Augmenting Collaborative Human-AI Writing Experiences Through Personalization and Agency
Catherine Yeh
Gonzalo A. Ramos
Rachel Ng
Andy Huntington
Richard Banks
LLMAG
49
20
0
13 Feb 2024
LLM-based NLG Evaluation: Current Status and Challenges
LLM-based NLG Evaluation: Current Status and Challenges
Mingqi Gao
Xinyu Hu
Jie Ruan
Xiao Pu
Xiaojun Wan
ELM
LM&MA
57
29
0
02 Feb 2024
Aligning Large Language Models through Synthetic Feedback
Aligning Large Language Models through Synthetic Feedback
Sungdong Kim
Sanghwan Bae
Jamin Shin
Soyoung Kang
Donghyun Kwak
Kang Min Yoo
Minjoon Seo
ALM
SyDa
78
67
0
23 May 2023
Can Large Language Models Be an Alternative to Human Evaluations?
Can Large Language Models Be an Alternative to Human Evaluations?
Cheng-Han Chiang
Hung-yi Lee
ALM
LM&MA
224
572
0
03 May 2023
Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural
  Language Generation
Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation
Patrick Fernandes
Aman Madaan
Emmy Liu
António Farinhas
Pedro Henrique Martins
...
José G. C. de Souza
Shuyan Zhou
Tongshuang Wu
Graham Neubig
André F. T. Martins
ALM
117
56
0
01 May 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,953
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
364
8,495
0
28 Jan 2022
Discovering and Validating AI Errors With Crowdsourced Failure Reports
Discovering and Validating AI Errors With Crowdsourced Failure Reports
Ángel Alexander Cabrera
Abraham J. Druck
Jason I. Hong
Adam Perer
HAI
60
54
0
23 Sep 2021
Fantastically Ordered Prompts and Where to Find Them: Overcoming
  Few-Shot Prompt Order Sensitivity
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity
Yao Lu
Max Bartolo
Alastair Moore
Sebastian Riedel
Pontus Stenetorp
AILaw
LRM
279
1,121
0
18 Apr 2021
What Makes Good In-Context Examples for GPT-$3$?
What Makes Good In-Context Examples for GPT-333?
Jiachang Liu
Dinghan Shen
Yizhe Zhang
Bill Dolan
Lawrence Carin
Weizhu Chen
AAML
RALM
275
1,312
0
17 Jan 2021
1