Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.01448
Cited By
Meta Semantic Template for Evaluation of Large Language Models
1 October 2023
Yachuan Liu
Liang Chen
Jindong Wang
Qiaozhu Mei
Xing Xie
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Meta Semantic Template for Evaluation of Large Language Models"
10 / 10 papers shown
Title
Evaluating Language Models for Mathematics through Interactions
Katherine M. Collins
Albert Q. Jiang
Simon Frieder
L. Wong
Miri Zilka
...
William Hart
T. Gowers
Wen-Ding Li
Adrian Weller
M. Jamnik
70
60
0
02 Jun 2023
Towards Robust Personalized Dialogue Generation via Order-Insensitive Representation Regularization
Liang Chen
Hongru Wang
Yang Deng
Wai-Chung Kwan
Zezhong Wang
Kam-Fai Wong
46
15
0
22 May 2023
On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective
Jindong Wang
Xixu Hu
Wenxin Hou
Hao Chen
Runkai Zheng
...
Weirong Ye
Xiubo Geng
Binxing Jiao
Yue Zhang
Xingxu Xie
AI4MH
101
233
0
22 Feb 2023
GLUE-X: Evaluating Natural Language Understanding Models from an Out-of-distribution Generalization Perspective
Linyi Yang
Shuibai Zhang
Libo Qin
Yafu Li
Yidong Wang
Hanmeng Liu
Jindong Wang
Xingxu Xie
Yue Zhang
ELM
91
81
0
15 Nov 2022
Quantifying Memorization Across Neural Language Models
Nicholas Carlini
Daphne Ippolito
Matthew Jagielski
Katherine Lee
Florian Tramèr
Chiyuan Zhang
PILM
100
614
0
15 Feb 2022
Red Teaming Language Models with Language Models
Ethan Perez
Saffron Huang
Francis Song
Trevor Cai
Roman Ring
John Aslanides
Amelia Glaese
Nat McAleese
G. Irving
AAML
131
645
0
07 Feb 2022
Finetuned Language Models Are Zero-Shot Learners
Jason W. Wei
Maarten Bosma
Vincent Zhao
Kelvin Guu
Adams Wei Yu
Brian Lester
Nan Du
Andrew M. Dai
Quoc V. Le
ALM
UQCV
116
3,723
0
03 Sep 2021
Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking
Zhiyi Ma
Kawin Ethayarajh
Tristan Thrush
Somya Jain
Ledell Yu Wu
Robin Jia
Christopher Potts
Adina Williams
Douwe Kiela
ELM
72
57
0
21 May 2021
Dynabench: Rethinking Benchmarking in NLP
Douwe Kiela
Max Bartolo
Yixin Nie
Divyansh Kaushik
Atticus Geiger
...
Pontus Stenetorp
Robin Jia
Joey Tianyi Zhou
Christopher Potts
Adina Williams
180
405
0
07 Apr 2021
Beyond Accuracy: Behavioral Testing of NLP models with CheckList
Marco Tulio Ribeiro
Tongshuang Wu
Carlos Guestrin
Sameer Singh
ELM
190
1,100
0
08 May 2020
1