v1v2 (latest)

TuringAdvice: A Generative and Dynamic Evaluation of Language Use

7 April 2020

Yejin Choi

Papers citing "TuringAdvice: A Generative and Dynamic Evaluation of Language Use"

11 / 11 papers shown

Debate, Deliberate, Decide (D3): A Cost-Aware Adversarial Framework for Reliable and Interpretable LLM Evaluation

Chaithanya Bandi

Abir Harrasse

Hari Bandi

LLMAG ELM

426

07 Oct 2024

Towards Human-Centred Explainability Benchmarks For Text Classification

Viktor Schlegel

Erick Mendez Guzman

Riza Batista-Navarro

295

10 Nov 2022

AI and the Everything in the Whole Wide World Benchmark

Inioluwa Deborah Raji

305

431

26 Nov 2021

TellMeWhy: A Dataset for Answering Why-Questions in NarrativesFindings (Findings), 2021

Yash Kumar Lal

Nathanael Chambers

Raymond J. Mooney

Niranjan Balasubramanian

377

11 Jun 2021

What Will it Take to Fix Benchmarking in Natural Language Understanding?North American Chapter of the Association for Computational Linguistics (NAACL), 2021

Samuel R. Bowman

George E. Dahl

ELM ALM

347

207

05 Apr 2021

Help! Need Advice on Identifying AdviceConference on Empirical Methods in Natural Language Processing (EMNLP), 2020

Venkata S Govindarajan

184

06 Oct 2020

Measuring Massive Multitask Language UnderstandingInternational Conference on Learning Representations (ICLR), 2020

4.1K

7,570

07 Sep 2020

Forecasting AI Progress: A Research Agenda

Ross Gruetzemacher

Florian E. Dorner

Niko Bernaola-Alvarez

Charlie Giattino

D. Manheim

AI4TS

229

04 Aug 2020

Evaluation of Text Generation: A Survey

425

447

26 Jun 2020

Experience Grounds Language

...

672

424

21 Apr 2020

Machine learning as a model for cultural learning: Teaching an algorithm what it means to be fatSociological Methods & Research (SMR), 2020

Alina Arseniev-Koehler

J. Foster

339

24 Mar 2020