Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.07902
Cited By
Through the Lens of Core Competency: Survey on Evaluation of Large Language Models
15 August 2023
Ziyu Zhuang
Qiguang Chen
Longxuan Ma
Mingda Li
Yi Han
Yushan Qian
Haopeng Bai
Zixian Feng
Weinan Zhang
Ting Liu
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Through the Lens of Core Competency: Survey on Evaluation of Large Language Models"
49 / 49 papers shown
Title
CMMLU: Measuring massive multitask language understanding in Chinese
Haonan Li
Yixuan Zhang
Fajri Koto
Yifei Yang
Hai Zhao
Yeyun Gong
Nan Duan
Tim Baldwin
ALM
ELM
76
258
0
15 Jun 2023
Do Large Language Models Know What They Don't Know?
Zhangyue Yin
Qiushi Sun
Qipeng Guo
Jiawen Wu
Xipeng Qiu
Xuanjing Huang
ELM
AI4MH
65
160
0
29 May 2023
StructGPT: A General Framework for Large Language Model to Reason over Structured Data
Jinhao Jiang
Kun Zhou
Zican Dong
Keming Ye
Wayne Xin Zhao
Ji-Rong Wen
LRM
LMTD
RALM
81
289
0
16 May 2023
A Survey on Table-and-Text HybridQA: Concepts, Methods, Challenges and Future Directions
Dingzirui Wang
Longxu Dou
Wanxiang Che
58
5
0
27 Dec 2022
Language Models as Inductive Reasoners
Zonglin Yang
Li Dong
Xinya Du
Hao Cheng
Min Zhang
Xiaodong Liu
Jianfeng Gao
Furu Wei
ReLM
LRM
40
36
0
21 Dec 2022
Language Models are Multilingual Chain-of-Thought Reasoners
Freda Shi
Mirac Suzgun
Markus Freitag
Xuezhi Wang
Suraj Srivats
...
Yi Tay
Sebastian Ruder
Denny Zhou
Dipanjan Das
Jason W. Wei
ReLM
LRM
211
362
0
06 Oct 2022
Binding Language Models in Symbolic Languages
Zhoujun Cheng
Tianbao Xie
Peng Shi
Chengzu Li
Rahul Nadkarni
...
Dragomir R. Radev
Mari Ostendorf
Luke Zettlemoyer
Noah A. Smith
Tao Yu
LMTD
160
208
0
06 Oct 2022
Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning
Pan Lu
Liang Qiu
Kai-Wei Chang
Ying Nian Wu
Song-Chun Zhu
Tanmay Rajpurohit
Peter Clark
Ashwin Kalyan
ReLM
LRM
129
289
0
29 Sep 2022
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
303
509
0
24 Sep 2022
APPDIA: A Discourse-aware Transformer-based Style Transfer Model for Offensive Social Media Conversations
Katherine Atwell
Sabit Hassan
Malihe Alikhani
70
30
0
17 Sep 2022
Emergent Abilities of Large Language Models
Jason W. Wei
Yi Tay
Rishi Bommasani
Colin Raffel
Barret Zoph
...
Tatsunori Hashimoto
Oriol Vinyals
Percy Liang
J. Dean
W. Fedus
ELM
ReLM
LRM
191
2,461
0
15 Jun 2022
A Fine-grained Interpretability Evaluation Benchmark for Neural NLP
Lijie Wang
Yaozong Shen
Shu-ping Peng
Shuai Zhang
Xinyan Xiao
Hao Liu
Hongxuan Tang
Ying-Cong Chen
Hua Wu
Haifeng Wang
ELM
68
21
0
23 May 2022
e-CARE: a New Dataset for Exploring Explainable Causal Reasoning
Li Du
Xiao Ding
Kai Xiong
Ting Liu
Bing Qin
CML
55
65
0
12 May 2022
NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks
Swaroop Mishra
Arindam Mitra
Neeraj Varshney
Bhavdeep Singh Sachdeva
Peter Clark
Chitta Baral
Ashwin Kalyan
AIMat
ReLM
ELM
LRM
68
107
0
12 Apr 2022
The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems
Caleb Ziems
Jane A. Yu
Yi-Chia Wang
A. Halevy
Diyi Yang
57
95
0
06 Apr 2022
LinkBERT: Pretraining Language Models with Document Links
Michihiro Yasunaga
J. Leskovec
Percy Liang
KELM
65
359
0
29 Mar 2022
STaR: Bootstrapping Reasoning With Reasoning
E. Zelikman
Yuhuai Wu
Jesse Mu
Noah D. Goodman
ReLM
LRM
83
481
0
28 Mar 2022
A Survey on Aspect-Based Sentiment Analysis: Tasks, Methods, and Challenges
Wenxuan Zhang
Xin Li
Yang Deng
Lidong Bing
W. Lam
58
249
0
02 Mar 2022
Commonsense Knowledge Reasoning and Generation with Pre-trained Language Models: A Survey
Prajjwal Bhargava
Vincent Ng
ReLM
LRM
110
63
0
28 Jan 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
624
9,267
0
28 Jan 2022
UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models
Tianbao Xie
Chen Henry Wu
Peng Shi
Ruiqi Zhong
Torsten Scholak
...
Lingpeng Kong
Rui Zhang
Noah A. Smith
Luke Zettlemoyer
Tao Yu
LMTD
85
301
0
16 Jan 2022
Ethical and social risks of harm from Language Models
Laura Weidinger
John F. J. Mellor
Maribeth Rauh
Conor Griffin
J. Uesato
...
Lisa Anne Hendricks
William S. Isaac
Sean Legassick
G. Irving
Iason Gabriel
PILM
73
1,025
0
08 Dec 2021
Truthful AI: Developing and governing AI that does not lie
Owain Evans
Owen Cotton-Barratt
Lukas Finnveden
Adam Bales
Avital Balwit
Peter Wills
Luca Righetti
William Saunders
HILM
272
116
0
13 Oct 2021
Does BERT Learn as Humans Perceive? Understanding Linguistic Styles through Lexica
Shirley Anugrah Hayati
Dongyeop Kang
Lyle Ungar
51
33
0
06 Sep 2021
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
185
5,440
0
07 Jul 2021
FeTaQA: Free-form Table Question Answering
Linyong Nan
Chia-Hsuan Hsieh
Ziming Mao
Xi Lin
Neha Verma
...
Isabel Trindade
Renusree Bandaru
Jacob Cunningham
Caiming Xiong
Dragomir R. Radev
LMTD
106
155
0
01 Apr 2021
D'ya like DAGs? A Survey on Structure Learning and Causal Discovery
M. Vowels
Necati Cihan Camgöz
Richard Bowden
CML
76
300
0
03 Mar 2021
Reducing conversational agents' overconfidence through linguistic calibration
Sabrina J. Mielke
Arthur Szlam
Emily Dinan
Y-Lan Boureau
237
166
0
30 Dec 2020
HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection
Binny Mathew
Punyajoy Saha
Seid Muhie Yimam
Chris Biemann
Pawan Goyal
Animesh Mukherjee
112
569
0
18 Dec 2020
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
427
1,902
0
14 Dec 2020
COMET-ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs
Jena D. Hwang
Chandra Bhagavatula
Ronan Le Bras
Jeff Da
Keisuke Sakaguchi
Antoine Bosselut
Yejin Choi
67
409
0
12 Oct 2020
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
Samuel Gehman
Suchin Gururangan
Maarten Sap
Yejin Choi
Noah A. Smith
116
1,191
0
24 Sep 2020
GeDi: Generative Discriminator Guided Sequence Generation
Ben Krause
Akhilesh Deepak Gotmare
Bryan McCann
N. Keskar
Shafiq Joty
R. Socher
Nazneen Rajani
87
403
0
14 Sep 2020
Fact or Fiction: Verifying Scientific Claims
David Wadden
Shanchuan Lin
Kyle Lo
Lucy Lu Wang
Madeleine van Zuylen
Arman Cohan
Hannaneh Hajishirzi
HAI
94
450
0
30 Apr 2020
ToTTo: A Controlled Table-To-Text Generation Dataset
Ankur P. Parikh
Xuezhi Wang
Sebastian Gehrmann
Manaal Faruqui
Bhuwan Dhingra
Diyi Yang
Dipanjan Das
LMTD
55
364
0
29 Apr 2020
Transformers as Soft Reasoners over Language
Peter Clark
Oyvind Tafjord
Kyle Richardson
ReLM
OffRL
LRM
77
356
0
14 Feb 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
501
4,763
0
23 Jan 2020
SemEval-2017 Task 4: Sentiment Analysis in Twitter
Sara Rosenthal
N. Farra
Preslav Nakov
VLM
73
798
0
02 Dec 2019
TabFact: A Large-scale Dataset for Table-based Fact Verification
Wenhu Chen
Hongmin Wang
Jianshu Chen
Yunkai Zhang
Hong Wang
Shiyang Li
Xiyou Zhou
William Yang Wang
LMTD
84
499
0
05 Sep 2019
Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack
Emily Dinan
Samuel Humeau
Bharath Chintagunta
Jason Weston
54
244
0
17 Aug 2019
Dynamically Fused Graph Network for Multi-hop Reasoning
Yunxuan Xiao
Yanru Qu
Lin Qiu
Hao Zhou
Lei Li
Weinan Zhang
Yong Yu
69
191
0
16 May 2019
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
Alex Jinpeng Wang
Yada Pruksachatkun
Nikita Nangia
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
218
2,305
0
02 May 2019
Evaluating model calibration in classification
Juozas Vaicenavicius
David Widmann
Carl R. Andersson
Fredrik Lindsten
Jacob Roll
Thomas B. Schon
UQCV
123
198
0
19 Feb 2019
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge
Alon Talmor
Jonathan Herzig
Nicholas Lourie
Jonathan Berant
RALM
129
1,714
0
02 Nov 2018
MultiWOZ -- A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling
Paweł Budzianowski
Tsung-Hsien Wen
Bo-Hsiang Tseng
I. Casanueva
Stefan Ultes
Osman Ramadan
Milica Gasic
147
1,310
0
29 Sep 2018
FEVER: a large-scale dataset for Fact Extraction and VERification
James Thorne
Andreas Vlachos
Christos Christodoulopoulos
Arpit Mittal
HILM
121
1,645
0
14 Mar 2018
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
Mandar Joshi
Eunsol Choi
Daniel S. Weld
Luke Zettlemoyer
RALM
195
2,636
0
09 May 2017
Emotional Chatting Machine: Emotional Conversation Generation with Internal and External Memory
Hao Zhou
Minlie Huang
Tianyang Zhang
Xiaoyan Zhu
Bing-Qian Liu
69
738
0
04 Apr 2017
Intriguing properties of neural networks
Christian Szegedy
Wojciech Zaremba
Ilya Sutskever
Joan Bruna
D. Erhan
Ian Goodfellow
Rob Fergus
AAML
222
14,893
1
21 Dec 2013
1