AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models

13 April 2023

Papers citing "AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models"

4 / 54 papers shown

Title
FEVER: a large-scale dataset for Fact Extraction and VERification James Thorne Andreas Vlachos Christos Christodoulopoulos Arpit Mittal HILM 145 1,652 0 14 Mar 2018
Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems Wang Ling Dani Yogatama Chris Dyer Phil Blunsom AIMat 79 729 0 11 May 2017
SQuAD: 100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar Jian Zhang Konstantin Lopyrev Percy Liang RALM 280 8,127 0 16 Jun 2016
A large annotated corpus for learning natural language inference Samuel R. Bowman Gabor Angeli Christopher Potts Christopher D. Manning 313 4,284 0 21 Aug 2015