Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.10928
Cited By
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
20 July 2023
Seonghyeon Ye
Doyoung Kim
Sungdong Kim
Hyeonbin Hwang
Seungone Kim
Yongrae Jo
James Thorne
Juho Kim
Minjoon Seo
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets"
41 / 91 papers shown
Title
Harmonic LLMs are Trustworthy
Nicholas S. Kersting
Mohammad Rahman
Suchismitha Vedala
Yang Wang
45
0
0
30 Apr 2024
AdvisorQA: Towards Helpful and Harmless Advice-seeking Question Answering with Collective Intelligence
Minbeom Kim
Hwanhee Lee
Joonsuk Park
Hwaran Lee
Kyomin Jung
40
1
0
18 Apr 2024
Self-Explore to Avoid the Pit: Improving the Reasoning Capabilities of Language Models with Fine-grained Rewards
Hyeonbin Hwang
Doyoung Kim
Seungone Kim
Seonghyeon Ye
Minjoon Seo
LRM
ReLM
40
7
0
16 Apr 2024
Concept -- An Evaluation Protocol on Conversational Recommender Systems with System-centric and User-centric Factors
Chen Huang
Peixin Qin
Yang Deng
Wenqiang Lei
Jiancheng Lv
Tat-Seng Chua
39
6
0
04 Apr 2024
Reinforcement Learning from Reflective Feedback (RLRF): Aligning and Improving LLMs via Fine-Grained Self-Reflection
Kyungjae Lee
Dasol Hwang
Sunghyun Park
Youngsoo Jang
Moontae Lee
46
8
0
21 Mar 2024
CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence in Korean
Eunsu Kim
Juyoung Suk
Philhoon Oh
Haneul Yoo
James Thorne
Alice H. Oh
ELM
75
15
0
11 Mar 2024
FAC
2
^2
2
E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition
Xiaoqiang Wang
Bang Liu
Lingfei Wu
35
0
0
29 Feb 2024
INSTRUCTIR: A Benchmark for Instruction Following of Information Retrieval Models
Hanseok Oh
Hyunji Lee
Seonghyeon Ye
Haebin Shin
Hansol Jang
Changwook Jun
Minjoon Seo
46
19
0
22 Feb 2024
Investigating Multilingual Instruction-Tuning: Do Polyglot Models Demand for Multilingual Instructions?
Alexander Arno Weber
Klaudia Thellmann
Jan Ebert
Nicolas Flores-Herr
Jens Lehmann
Michael Fromm
Mehdi Ali
38
4
0
21 Feb 2024
LLM-based NLG Evaluation: Current Status and Challenges
Mingqi Gao
Xinyu Hu
Jie Ruan
Xiao Pu
Xiaojun Wan
ELM
LM&MA
65
29
0
02 Feb 2024
Benchmarking LLMs via Uncertainty Quantification
Fanghua Ye
Mingming Yang
Jianhui Pang
Longyue Wang
Derek F. Wong
Emine Yilmaz
Shuming Shi
Zhaopeng Tu
ELM
15
47
0
23 Jan 2024
MERA: A Comprehensive LLM Evaluation in Russian
Alena Fenogenova
Artem Chervyakov
Nikita Martynov
Anastasia Kozlova
Maria Tikhonova
...
Nikita Savushkin
Polina Mikhailova
Denis Dimitrov
Alexander Panchenko
Sergey Markov
ELM
39
10
0
09 Jan 2024
Data-Centric Foundation Models in Computational Healthcare: A Survey
Yunkun Zhang
Jin Gao
Zheling Tan
Lingfeng Zhou
Kexin Ding
Mu Zhou
Shaoting Zhang
Dequan Wang
AI4CE
37
22
0
04 Jan 2024
Demystifying Instruction Mixing for Fine-tuning Large Language Models
Renxi Wang
Haonan Li
Minghao Wu
Yuxia Wang
Xudong Han
Chiyu Zhang
Timothy Baldwin
28
0
0
17 Dec 2023
The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning
Bill Yuchen Lin
Abhilasha Ravichander
Ximing Lu
Nouha Dziri
Melanie Sclar
Khyathi Raghavi Chandu
Chandra Bhagavatula
Yejin Choi
22
164
0
04 Dec 2023
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models
Keming Lu
Hongyi Yuan
Runji Lin
Junyang Lin
Zheng Yuan
Chang Zhou
Jingren Zhou
MoE
LRM
42
52
0
15 Nov 2023
Post Turing: Mapping the landscape of LLM Evaluation
Alexey Tikhonov
Ivan P. Yamshchikov
ELM
51
4
0
03 Nov 2023
HARE: Explainable Hate Speech Detection with Step-by-Step Reasoning
Yongjin Yang
Joonkee Kim
Yujin Kim
Namgyu Ho
James Thorne
Se-Young Yun
24
21
0
01 Nov 2023
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Seungone Kim
Jamin Shin
Yejin Cho
Joel Jang
Shayne Longpre
...
Sangdoo Yun
Seongjin Shin
Sungdong Kim
James Thorne
Minjoon Seo
ALM
LM&MA
ELM
37
209
0
12 Oct 2023
Human Feedback is not Gold Standard
Tom Hosking
Phil Blunsom
Max Bartolo
ALM
32
49
0
28 Sep 2023
Large Language Model Alignment: A Survey
Tianhao Shen
Renren Jin
Yufei Huang
Chuang Liu
Weilong Dong
Zishan Guo
Xinwei Wu
Yan Liu
Deyi Xiong
LM&MA
19
176
0
26 Sep 2023
EvalLM: Interactive Evaluation of Large Language Model Prompts on User-Defined Criteria
Tae Soo Kim
Yoonjoo Lee
Jamin Shin
Young-Ho Kim
Juho Kim
34
69
0
24 Sep 2023
LongDocFACTScore: Evaluating the Factuality of Long Document Abstractive Summarisation
Jennifer A Bishop
Qianqian Xie
Sophia Ananiadou
HILM
19
9
0
21 Sep 2023
Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs
Yuxia Wang
Haonan Li
Xudong Han
Preslav Nakov
Timothy Baldwin
52
102
0
25 Aug 2023
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Wen-tau Yih
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
HILM
ALM
59
606
0
23 May 2023
Can Large Language Models Capture Dissenting Human Voices?
Noah Lee
Na Min An
James Thorne
ALM
41
30
0
23 May 2023
Aligning Large Language Models through Synthetic Feedback
Sungdong Kim
Sanghwan Bae
Jamin Shin
Soyoung Kang
Donghyun Kwak
Kang Min Yoo
Minjoon Seo
ALM
SyDa
81
67
0
23 May 2023
Can Large Language Models Be an Alternative to Human Evaluations?
Cheng-Han Chiang
Hung-yi Lee
ALM
LM&MA
229
572
0
03 May 2023
DIFFQG: Generating Questions to Summarize Factual Changes
Jeremy R. Cole
Palak Jain
Julian Martin Eisenschlos
Michael J.Q. Zhang
Eunsol Choi
Bhuwan Dhingra
KELM
24
3
0
01 Mar 2023
Efficiently Enhancing Zero-Shot Performance of Instruction Following Model via Retrieval of Soft Prompt
Seonghyeon Ye
Joel Jang
Doyoung Kim
Yongrae Jo
Minjoon Seo
VLM
36
2
0
06 Oct 2022
INSCIT: Information-Seeking Conversations with Mixed-Initiative Interactions
Zeqiu Wu
Ryu Parish
Hao Cheng
Sewon Min
Prithviraj Ammanabrolu
Mari Ostendorf
Hannaneh Hajishirzi
65
14
0
02 Jul 2022
Fine-tuned Language Models are Continual Learners
Thomas Scialom
Tuhin Chakrabarty
Smaranda Muresan
CLL
LRM
145
117
0
24 May 2022
Instruction Induction: From Few Examples to Natural Language Task Descriptions
Or Honovich
Uri Shaham
Samuel R. Bowman
Omer Levy
ELM
LRM
120
136
0
22 May 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
330
11,953
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
395
8,495
0
28 Jan 2022
Understanding Dataset Difficulty with
V
\mathcal{V}
V
-Usable Information
Kawin Ethayarajh
Yejin Choi
Swabha Swayamdipta
167
157
0
16 Oct 2021
BBQ: A Hand-Built Bias Benchmark for Question Answering
Alicia Parrish
Angelica Chen
Nikita Nangia
Vishakh Padmakumar
Jason Phang
Jana Thompson
Phu Mon Htut
Sam Bowman
223
367
0
15 Oct 2021
ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers
Haitian Sun
William W. Cohen
Ruslan Salakhutdinov
67
33
0
13 Oct 2021
ContractNLI: A Dataset for Document-level Natural Language Inference for Contracts
Yuta Koreeda
Christopher D. Manning
AILaw
94
96
0
05 Oct 2021
A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation
Tianyu Liu
Yizhe Zhang
Chris Brockett
Yi Mao
Zhifang Sui
Weizhu Chen
W. Dolan
HILM
228
143
0
18 Apr 2021
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
Mor Geva
Daniel Khashabi
Elad Segal
Tushar Khot
Dan Roth
Jonathan Berant
RALM
250
677
0
06 Jan 2021
Previous
1
2