ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.09022
  4. Cited By
It Takes Two to Tango: Navigating Conceptualizations of NLP Tasks and
  Measurements of Performance

It Takes Two to Tango: Navigating Conceptualizations of NLP Tasks and Measurements of Performance

15 May 2023
Arjun Subramonian
Xingdi Yuan
Hal Daumé
Su Lin Blodgett
ArXivPDFHTML

Papers citing "It Takes Two to Tango: Navigating Conceptualizations of NLP Tasks and Measurements of Performance"

13 / 13 papers shown
Title
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation
Maria Eriksson
Erasmo Purificato
Arman Noroozian
Joao Vinagre
Guillaume Chaslot
Emilia Gomez
David Fernandez Llorca
ELM
139
1
0
10 Feb 2025
WinoPron: Revisiting English Winogender Schemas for Consistency,
  Coverage, and Grammatical Case
WinoPron: Revisiting English Winogender Schemas for Consistency, Coverage, and Grammatical Case
Vagrant Gautam
Julius Steuer
Eileen Bingert
Ray Johns
Anne Lauscher
Dietrich Klakow
48
3
0
09 Sep 2024
Do LLMs Understand Social Knowledge? Evaluating the Sociability of Large
  Language Models with SocKET Benchmark
Do LLMs Understand Social Knowledge? Evaluating the Sociability of Large Language Models with SocKET Benchmark
Minje Choi
Jiaxin Pei
Sagar Kumar
Chang Shu
David Jurgens
ALM
LLMAG
29
69
0
24 May 2023
Stop Measuring Calibration When Humans Disagree
Stop Measuring Calibration When Humans Disagree
Joris Baan
Wilker Aziz
Barbara Plank
Raquel Fernández
24
53
0
28 Oct 2022
ACES: Translation Accuracy Challenge Sets for Evaluating Machine
  Translation Metrics
ACES: Translation Accuracy Challenge Sets for Evaluating Machine Translation Metrics
Chantal Amrhein
Nikita Moghe
Liane Guillou
ELM
34
22
0
27 Oct 2022
Quantifying Social Biases Using Templates is Unreliable
Quantifying Social Biases Using Templates is Unreliable
P. Seshadri
Pouya Pezeshkpour
Sameer Singh
51
33
0
09 Oct 2022
State-of-the-art generalisation research in NLP: A taxonomy and review
State-of-the-art generalisation research in NLP: A taxonomy and review
Dieuwke Hupkes
Mario Giulianelli
Verna Dankers
Mikel Artetxe
Yanai Elazar
...
Leila Khalatbari
Maria Ryskina
Rita Frieske
Ryan Cotterell
Zhijing Jin
114
93
0
06 Oct 2022
Extractive is not Faithful: An Investigation of Broad Unfaithfulness
  Problems in Extractive Summarization
Extractive is not Faithful: An Investigation of Broad Unfaithfulness Problems in Extractive Summarization
Shiyue Zhang
David Wan
Joey Tianyi Zhou
HILM
52
27
0
08 Sep 2022
Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and
  Their Implications
Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications
Kaitlyn Zhou
Su Lin Blodgett
Adam Trischler
Hal Daumé
Kaheer Suleman
Alexandra Olteanu
ELM
99
26
0
13 May 2022
The GEM Benchmark: Natural Language Generation, its Evaluation and
  Metrics
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Sebastian Gehrmann
Tosin P. Adewumi
Karmanya Aggarwal
Pawan Sasanka Ammanamanchi
Aremu Anuoluwapo
...
Nishant Subramani
Wei-ping Xu
Diyi Yang
Akhila Yerukola
Jiawei Zhou
VLM
254
285
0
02 Feb 2021
Hypothesis Only Baselines in Natural Language Inference
Hypothesis Only Baselines in Natural Language Inference
Adam Poliak
Jason Naradowsky
Aparajita Haldar
Rachel Rudinger
Benjamin Van Durme
190
576
0
02 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,959
0
20 Apr 2018
Text Summarization Techniques: A Brief Survey
Text Summarization Techniques: A Brief Survey
M. Allahyari
Seyedamin Pouriyeh
Mehdi Assefi
S. Safaei
Elizabeth D. Trippe
Juan B. Gutierrez
K. Kochut
CVBM
52
513
0
07 Jul 2017
1