ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2101.06561
  4. Cited By
GENIE: Toward Reproducible and Standardized Human Evaluation for Text
  Generation

GENIE: Toward Reproducible and Standardized Human Evaluation for Text Generation

17 January 2021
Daniel Khashabi
Gabriel Stanovsky
Jonathan Bragg
Nicholas Lourie
Jungo Kasai
Yejin Choi
Noah A. Smith
Daniel S. Weld
ArXivPDFHTML

Papers citing "GENIE: Toward Reproducible and Standardized Human Evaluation for Text Generation"

8 / 8 papers shown
Title
ATEB: Evaluating and Improving Advanced NLP Tasks for Text Embedding Models
ATEB: Evaluating and Improving Advanced NLP Tasks for Text Embedding Models
Simeng Han
Frank Palma Gomez
Tu Vu
Zefei Li
Daniel Cer
Hansi Zeng
Chris Tar
Arman Cohan
Gustavo Hernández Ábrego
59
1
0
24 Feb 2025
Tailoring Vaccine Messaging with Common-Ground Opinions
Tailoring Vaccine Messaging with Common-Ground Opinions
Rickard Stureborg
Sanxing Chen
Ruoyu Xie
Aayushi Patel
Christopher Li
Chloe Qinyu Zhu
Tingnan Hu
Jun Yang
Bhuwan Dhingra
42
0
0
17 May 2024
EvalLM: Interactive Evaluation of Large Language Model Prompts on
  User-Defined Criteria
EvalLM: Interactive Evaluation of Large Language Model Prompts on User-Defined Criteria
Tae Soo Kim
Yoonjoo Lee
Jamin Shin
Young-Ho Kim
Juho Kim
34
69
0
24 Sep 2023
Stop Uploading Test Data in Plain Text: Practical Strategies for
  Mitigating Data Contamination by Evaluation Benchmarks
Stop Uploading Test Data in Plain Text: Practical Strategies for Mitigating Data Contamination by Evaluation Benchmarks
Alon Jacovi
Avi Caciularu
Omer Goldman
Yoav Goldberg
17
95
0
17 May 2023
One Embedder, Any Task: Instruction-Finetuned Text Embeddings
One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Hongjin Su
Weijia Shi
Jungo Kasai
Yizhong Wang
Yushi Hu
Mari Ostendorf
Wen-tau Yih
Noah A. Smith
Luke Zettlemoyer
Tao Yu
27
282
0
19 Dec 2022
Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand
Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Lavinia Dunagan
Jacob Morrison
Alexander R. Fabbri
Yejin Choi
Noah A. Smith
54
39
0
08 Dec 2021
The Perils of Using Mechanical Turk to Evaluate Open-Ended Text
  Generation
The Perils of Using Mechanical Turk to Evaluate Open-Ended Text Generation
Marzena Karpinska
Nader Akoury
Mohit Iyyer
223
107
0
14 Sep 2021
The GEM Benchmark: Natural Language Generation, its Evaluation and
  Metrics
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Sebastian Gehrmann
Tosin P. Adewumi
Karmanya Aggarwal
Pawan Sasanka Ammanamanchi
Aremu Anuoluwapo
...
Nishant Subramani
Wei Xu
Diyi Yang
Akhila Yerukola
Jiawei Zhou
VLM
260
285
0
02 Feb 2021
1