Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.09069
Cited By
Automatic Construction of Evaluation Suites for Natural Language Generation Datasets
16 June 2021
Simon Mille
Kaustubh D. Dhole
Saad Mahamood
Laura Perez-Beltrachini
Varun Gangal
Mihir Kale
Emiel van Miltenburg
Sebastian Gehrmann
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Automatic Construction of Evaluation Suites for Natural Language Generation Datasets"
20 / 20 papers shown
Title
Leveraging Entailment Judgements in Cross-Lingual Summarisation
Huajian Zhang
Laura Perez-Beltrachini
HILM
38
0
0
01 Aug 2024
KAUCUS: Knowledge Augmented User Simulators for Training Language Model Assistants
Kaustubh D. Dhole
26
3
0
29 Jan 2024
Evaluating Robustness of Dialogue Summarization Models in the Presence of Naturally Occurring Variations
Ankita Gupta
Chulaka Gunasekara
H. Wan
Jatin Ganhotra
Sachindra Joshi
Marina Danilevsky
13
0
0
15 Nov 2023
Benchmarking Large Language Model Capabilities for Conditional Generation
Joshua Maynez
Priyanka Agrawal
Sebastian Gehrmann
ELM
LM&MA
36
28
0
29 Jun 2023
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models
Neel Jain
Khalid Saifullah
Yuxin Wen
John Kirchenbauer
Manli Shu
Aniruddha Saha
Micah Goldblum
Jonas Geiping
Tom Goldstein
ALM
ELM
25
23
0
23 Jun 2023
Improving User Controlled Table-To-Text Generation Robustness
Hanxu Hu
Yunqing Liu
Zhongyi Yu
Laura Perez-Beltrachini
14
5
0
20 Feb 2023
ReCode: Robustness Evaluation of Code Generation Models
Shiqi Wang
Zheng Li
Haifeng Qian
Cheng Yang
Zijian Wang
...
Parminder Bhatia
Ramesh Nallapati
M. K. Ramanathan
Dan Roth
Bing Xiang
19
80
0
20 Dec 2022
Measuring the Measuring Tools: An Automatic Evaluation of Semantic Metrics for Text Corpora
George Kour
Samuel Ackerman
Orna Raz
E. Farchi
Boaz Carmeli
Ateret Anaby-Tavor
41
10
0
29 Nov 2022
Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey
Sachin Kumar
Vidhisha Balachandran
Lucille Njoo
Antonios Anastasopoulos
Yulia Tsvetkov
ELM
74
85
0
14 Oct 2022
Mind the Labels: Describing Relations in Knowledge Graphs With Pretrained Models
Zdeněk Kasner
Ioannis Konstas
Ondrej Dusek
29
6
0
13 Oct 2022
Quantifying Social Biases Using Templates is Unreliable
P. Seshadri
Pouya Pezeshkpour
Sameer Singh
51
33
0
09 Oct 2022
GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Sebastian Gehrmann
Abhik Bhattacharjee
Abinaya Mahendiran
Alex Jinpeng Wang
Alexandros Papangelis
...
Yacine Jernite
Yi Xu
Yisi Sang
Yixin Liu
Yufang Hou
47
38
0
22 Jun 2022
Why only Micro-F1? Class Weighting of Measures for Relation Classification
David Harbecke
Yuxuan Chen
Leonhard Hennig
Christoph Alt
26
19
0
19 May 2022
Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text
Sebastian Gehrmann
Elizabeth Clark
Thibault Sellam
ELM
AI4CE
58
183
0
14 Feb 2022
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
Kaustubh D. Dhole
Varun Gangal
Sebastian Gehrmann
Aadesh Gupta
Zhenhao Li
...
Tianbao Xie
Usama Yaseen
Michael A. Yee
Jing Zhang
Yue Zhang
174
86
0
06 Dec 2021
Break, Perturb, Build: Automatic Perturbation of Reasoning Paths Through Question Decomposition
Mor Geva
Tomer Wolfson
Jonathan Berant
ReLM
LRM
20
21
0
29 Jul 2021
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Sebastian Gehrmann
Tosin P. Adewumi
Karmanya Aggarwal
Pawan Sasanka Ammanamanchi
Aremu Anuoluwapo
...
Nishant Subramani
Wei-ping Xu
Diyi Yang
Akhila Yerukola
Jiawei Zhou
VLM
251
285
0
02 Feb 2021
Robustness Gym: Unifying the NLP Evaluation Landscape
Karan Goel
Nazneen Rajani
Jesse Vig
Samson Tan
Jason M. Wu
Stephan Zheng
Caiming Xiong
Joey Tianyi Zhou
Christopher Ré
AAML
OffRL
OOD
151
136
0
13 Jan 2021
How Can We Accelerate Progress Towards Human-like Linguistic Generalization?
Tal Linzen
220
188
0
03 May 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,959
0
20 Apr 2018
1