Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.14371
Cited By
The Ability of Large Language Models to Evaluate Constraint-satisfaction in Agent Responses to Open-ended Requests
22 September 2024
Lior Madmoni
Amir Zait
Ilia Labzovsky
Danny Karmon
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"The Ability of Large Language Models to Evaluate Constraint-satisfaction in Agent Responses to Open-ended Requests"
14 / 14 papers shown
Title
Mixtral of Experts
Albert Q. Jiang
Alexandre Sablayrolles
Antoine Roux
A. Mensch
Blanche Savary
...
Théophile Gervet
Thibaut Lavril
Thomas Wang
Timothée Lacroix
William El Sayed
MoE
LLMAG
164
1,123
0
08 Jan 2024
KwaiAgents: Generalized Information-seeking Agent System with Large Language Models
Haojie Pan
Zepeng Zhai
Hao Yuan
Yaojia Lv
Ruiji Fu
Ming Liu
Zhongyuan Wang
Bing Qin
LLMAG
RALM
79
12
0
08 Dec 2023
KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval
Marah Abdin
Suriya Gunasekar
Varun Chandrasekaran
Jerry Li
Mert Yuksekgonul
Rahee Peshawaria
Ranjita Naik
Besmira Nushi
86
12
0
24 Oct 2023
Exploring Automatic Evaluation Methods based on a Decoder-based LLM for Text Generation
Tomohito Kasahara
Daisuke Kawahara
80
3
0
17 Oct 2023
TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks
Dongfu Jiang
Yishan Li
Ge Zhang
Wenhao Huang
Bill Yuchen Lin
Wenhu Chen
ALM
76
69
0
01 Oct 2023
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models
Mert Yuksekgonul
Varun Chandrasekaran
Erik Jones
Suriya Gunasekar
Ranjita Naik
Hamid Palangi
Ece Kamar
Besmira Nushi
HILM
52
49
0
26 Sep 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
452
4,444
0
09 Jun 2023
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Wen-tau Yih
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
HILM
ALM
146
703
0
23 May 2023
Towards a Unified Multi-Dimensional Evaluator for Text Generation
Ming Zhong
Yang Liu
Da Yin
Yuning Mao
Yizhu Jiao
Peng Liu
Chenguang Zhu
Heng Ji
Jiawei Han
ELM
93
276
0
13 Oct 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
853
9,714
0
28 Jan 2022
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLM
OffRL
LRM
359
4,598
0
27 Oct 2021
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models
Nandan Thakur
Nils Reimers
Andreas Rucklé
Abhishek Srivastava
Iryna Gurevych
VLM
425
1,057
0
17 Apr 2021
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
Mandar Joshi
Eunsol Choi
Daniel S. Weld
Luke Zettlemoyer
RALM
237
2,692
0
09 May 2017
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
316
8,177
0
16 Jun 2016
1