Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2503.17136
Cited By
CoKe: Customizable Fine-Grained Story Evaluation via Chain-of-Keyword Rationalization
21 March 2025
Brihi Joshi
Sriram Venkatapathy
Mohit Bansal
Nanyun Peng
Haw-Shiuan Chang
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"CoKe: Customizable Fine-Grained Story Evaluation via Chain-of-Keyword Rationalization"
23 / 23 papers shown
Title
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
Pat Verga
Sebastian Hofstatter
Sophia Althammer
Yixuan Su
Aleksandra Piktus
Arkady Arkhangorodsky
Minjie Xu
Naomi White
Patrick Lewis
ALM
ELM
104
104
0
29 Apr 2024
Understanding the Effects of RLHF on LLM Generalisation and Diversity
Robert Kirk
Ishita Mediratta
Christoforos Nalmpantis
Jelena Luketina
Eric Hambro
Edward Grefenstette
Roberta Raileanu
AI4CE
ALM
182
149
0
10 Oct 2023
AMRs Assemble! Learning to Ensemble with Autoregressive Models for AMR Parsing
Abelardo Carlos Martínez Lorenzo
Pere-Lluís Huguet Cabot
Roberto Navigli
72
7
0
19 Jun 2023
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Zeqiu Wu
Yushi Hu
Weijia Shi
Nouha Dziri
Alane Suhr
Prithviraj Ammanabrolu
Noah A. Smith
Mari Ostendorf
Hannaneh Hajishirzi
ALM
149
334
0
02 Jun 2023
Let's Verify Step by Step
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
ALM
OffRL
LRM
195
1,240
0
31 May 2023
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning
Seungone Kim
Se June Joo
Doyoung Kim
Joel Jang
Seonghyeon Ye
Jamin Shin
Minjoon Seo
ALM
RALM
LRM
87
107
0
23 May 2023
ZARA: Improving Few-Shot Self-Rationalization for Small Language Models
Wei-Lin Chen
An-Zi Yen
Cheng-Kuang Wu
Hen-Hsen Huang
Hsin-Hsi Chen
ReLM
LRM
52
11
0
12 May 2023
Follow the Wisdom of the Crowd: Effective Text Generation via Minimum Bayes Risk Decoding
Mirac Suzgun
Luke Melas-Kyriazi
Dan Jurafsky
67
45
0
14 Nov 2022
Towards a Unified Multi-Dimensional Evaluator for Text Generation
Ming Zhong
Yang Liu
Da Yin
Yuning Mao
Yizhu Jiao
Peng Liu
Chenguang Zhu
Heng Ji
Jiawei Han
ELM
85
276
0
13 Oct 2022
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
Rajkumar Ramamurthy
Prithviraj Ammanabrolu
Kianté Brantley
Jack Hessel
R. Sifa
Christian Bauckhage
Hannaneh Hajishirzi
Yejin Choi
OffRL
101
248
0
03 Oct 2022
Complexity-Based Prompting for Multi-Step Reasoning
Yao Fu
Hao-Chun Peng
Ashish Sabharwal
Peter Clark
Tushar Khot
ReLM
LRM
232
444
0
03 Oct 2022
Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations
Jaehun Jung
Lianhui Qin
Sean Welleck
Faeze Brahman
Chandra Bhagavatula
Ronan Le Bras
Yejin Choi
ReLM
LRM
273
196
0
24 May 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
886
13,176
0
04 Mar 2022
Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection
Maarten Sap
Swabha Swayamdipta
Laura Vianna
Xuhui Zhou
Yejin Choi
Noah A. Smith
81
283
0
15 Nov 2021
BARTScore: Evaluating Generated Text as Text Generation
Weizhe Yuan
Graham Neubig
Pengfei Liu
126
849
0
22 Jun 2021
Measuring Association Between Labels and Free-Text Rationales
Sarah Wiegreffe
Ana Marasović
Noah A. Smith
313
182
0
24 Oct 2020
UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation
Jian Guan
Minlie Huang
61
70
0
16 Sep 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
880
42,379
0
28 May 2020
BERTScore: Evaluating Text Generation with BERT
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
354
5,868
0
21 Apr 2019
e-SNLI: Natural Language Inference with Natural Language Explanations
Oana-Maria Camburu
Tim Rocktaschel
Thomas Lukasiewicz
Phil Blunsom
LRM
427
640
0
04 Dec 2018
Hierarchical Neural Story Generation
Angela Fan
M. Lewis
Yann N. Dauphin
DiffM
183
1,626
0
13 May 2018
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
541
19,265
0
20 Jul 2017
CIDEr: Consensus-based Image Description Evaluation
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
297
4,508
0
20 Nov 2014
1