Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.16634
Cited By
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
29 March 2023
Yang Liu
Dan Iter
Yichong Xu
Shuohang Wang
Ruochen Xu
Chenguang Zhu
ELM
ALM
LM&MA
Re-assign community
ArXiv
PDF
HTML
Papers citing
"G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"
50 / 754 papers shown
Title
DomainCQA: Crafting Expert-Level QA from Domain-Specific Charts
Ling Zhong
Yujing Lu
Jing Yang
Weiming Li
Peng Wei
Yongheng Wang
Manni Duan
Qing Zhang
47
0
0
25 Mar 2025
Evaluating Bias in LLMs for Job-Resume Matching: Gender, Race, and Education
Hayate Iso
Pouya Pezeshkpour
Nikita Bhutani
Estevam R. Hruschka
70
0
0
24 Mar 2025
Improving Preference Extraction In LLMs By Identifying Latent Knowledge Through Classifying Probes
Sharan Maiya
Yinhong Liu
Ramit Debnath
Anna Korhonen
42
0
0
22 Mar 2025
ConvoGen: Enhancing Conversational AI with Synthetic Data: A Multi-Agent Approach
Reem Gody
Mahmoud Goudy
Ahmed Tawfik
SyDa
199
0
0
21 Mar 2025
CoKe: Customizable Fine-Grained Story Evaluation via Chain-of-Keyword Rationalization
Brihi Joshi
Sriram Venkatapathy
Mohit Bansal
Nanyun Peng
Haw-Shiuan Chang
LRM
51
0
0
21 Mar 2025
MAMM-Refine: A Recipe for Improving Faithfulness in Generation with Multi-Agent Collaboration
David Wan
Justin Chih-Yao Chen
Elias Stengel-Eskin
Joey Tianyi Zhou
LLMAG
LRM
65
1
0
19 Mar 2025
A Framework to Assess Multilingual Vulnerabilities of LLMs
Likai Tang
Niruth Bogahawatta
Yasod Ginige
Jiarui Xu
Shixuan Sun
Surangika Ranathunga
Suranga Seneviratne
42
0
0
17 Mar 2025
A Multi-Stage Framework with Taxonomy-Guided Reasoning for Occupation Classification Using Large Language Models
Palakorn Achananuparp
Ee-Peng Lim
46
0
0
17 Mar 2025
Not All Personas Are Worth It: Culture-Reflective Persona Data Augmentation
Ji-Eun Han
Yoonseok Heo
49
0
0
17 Mar 2025
A Survey on Transformer Context Extension: Approaches and Evaluation
Yijun Liu
Jinzheng Yu
Yang Xu
Zhongyang Li
Qingfu Zhu
LLMAG
74
0
0
17 Mar 2025
Unequal Opportunities: Examining the Bias in Geographical Recommendations by Large Language Models
Shiran Dudy
Thulasi Tholeti
R. Ramachandranpillai
Muhammad Ali
Toby Jia-Jun Li
Ricardo Baeza-Yates
29
0
0
16 Mar 2025
Interpretation Gaps in LLM-Assisted Comprehension of Privacy Documents
Rinku Dewri
55
0
0
15 Mar 2025
OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs
Ivan Kartáč
Mateusz Lango
Ondrej Dusek
ELM
54
1
0
14 Mar 2025
Bridging Language Models and Financial Analysis
Alejandro Lopez-Lira
Jihoon Kwon
Sangwoon Yoon
Jy-yong Sohn
Chanyeol Choi
AIFin
44
0
0
14 Mar 2025
MentalChat16K: A Benchmark Dataset for Conversational Mental Health Assistance
Jia Xu
Tianyi Wei
Bojian Hou
Patryk Orzechowski
Shu Yang
Ruochen Jin
Rachael Paulbeck
Joost B. Wagenaar
George Demiris
Li Shen
AI4MH
49
0
0
13 Mar 2025
Take Off the Training Wheels Progressive In-Context Learning for Effective Alignment
Zhenyu Liu
Dongfang Li
Xinshuo Hu
X. Zhao
Yibin Chen
Baotian Hu
Min-Ling Zhang
49
1
0
13 Mar 2025
How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit Misinformation
Ruohao Guo
Wei-ping Xu
Alan Ritter
44
1
0
12 Mar 2025
DAFE: LLM-Based Evaluation Through Dynamic Arbitration for Free-Form Question-Answering
Sher Badshah
Hassan Sajjad
68
1
0
11 Mar 2025
Fine-Tuning LLMs for Report Summarization: Analysis on Supervised and Unsupervised Data
Swati Rallapalli
Shannon Gallagher
Andrew O. Mellinger
Jasmine Ratchford
Anusha Sinha
Tyler Brooks
William R. Nichols
Nick Winski
Bryan Brown
48
0
0
10 Mar 2025
Bot Wars Evolved: Orchestrating Competing LLMs in a Counterstrike Against Phone Scams
Nardine Basta
Conor Atkins
Dali Kaafar
LLMAG
51
0
0
10 Mar 2025
WildIFEval: Instruction Following in the Wild
Gili Lior
Asaf Yehudai
Ariel Gera
L. Ein-Dor
71
0
0
09 Mar 2025
StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
Xin Ding
Hao Wu
Yuqing Yang
Shiqi Jiang
Donglin Bai
Zhibo Chen
Ting Cao
172
0
0
08 Mar 2025
Learning and generalization of robotic dual-arm manipulation of boxes from demonstrations via Gaussian Mixture Models (GMMs)
Qian Ying Lee
Suhas Raghavendra Kulkarni
Kenzhi Iskandar Wong
Lin Yang
Bernardo Noronha
Yongjun Wee
Tzu-Yi Hung
Domenico Campolo
53
0
0
07 Mar 2025
RocketEval: Efficient Automated LLM Evaluation via Grading Checklist
Tianjun Wei
Wei Wen
Ruizhi Qiao
Xing Sun
Jianghong Ma
ALM
ELM
52
1
0
07 Mar 2025
QG-SMS: Enhancing Test Item Analysis via Student Modeling and Simulation
Bang Nguyen
Tingting Du
Mengxia Yu
Lawrence Angrave
Meng Jiang
AI4Ed
71
0
0
07 Mar 2025
No Free Labels: Limitations of LLM-as-a-Judge Without Human Grounding
Michael Krumdick
Charles Lovering
Varshini Reddy
Seth Ebner
Chris Tanner
ALM
ELM
58
2
0
07 Mar 2025
IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information Retrieval
Tingyu Song
Guo Gan
Mingsheng Shang
Yilun Zhao
VLM
70
0
0
06 Mar 2025
How Do Hackathons Foster Creativity? Towards AI Collaborative Evaluation of Creativity at Scale
Jeanette Falk
Yiyi Chen
Janet Rafner
Mike Zhang
Johannes Bjerva
Alexander Nolte
66
1
0
06 Mar 2025
Topology-Aware Conformal Prediction for Stream Networks
Jifan Zhang
Fangxin Wang
Philip S. Yu
Kaize Ding
Shixiang Zhu
AI4TS
41
0
0
06 Mar 2025
AttackSeqBench: Benchmarking Large Language Models' Understanding of Sequential Patterns in Cyber Attacks
Javier Yong
Haokai Ma
Yunshan Ma
Anis Yusof
Zhenkai Liang
E. Chang
57
0
0
05 Mar 2025
LexGenie: Automated Generation of Structured Reports for European Court of Human Rights Case Law
T. Y. S. S. Santosh
Mahmoud Aly
O. Ichim
Matthias Grabmair
AILaw
ELM
90
0
0
05 Mar 2025
Improving LLM-as-a-Judge Inference with the Judgment Distribution
Victor Wang
Michael J.Q. Zhang
Eunsol Choi
58
1
0
04 Mar 2025
MedHEval: Benchmarking Hallucinations and Mitigation Strategies in Medical Large Vision-Language Models
Aofei Chang
Le Huang
Parminder Bhatia
Taha A. Kass-Hout
Fenglong Ma
Cao Xiao
VLM
82
0
0
04 Mar 2025
Measuring What Makes You Unique: Difference-Aware User Modeling for Enhancing LLM Personalization
Yilun Qiu
Xiaoyan Zhao
Yang Zhang
Yimeng Bai
Luu Anh Tuan
Hong Cheng
Fuli Feng
Tat-Seng Chua
61
1
0
04 Mar 2025
Personalized Generation In Large Model Era: A Survey
Yiyan Xu
Jinghao Zhang
Alireza Salemi
Xinting Hu
Luu Anh Tuan
Fuli Feng
Hamed Zamani
Xiangnan He
Tat-Seng Chua
3DV
79
2
0
04 Mar 2025
SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction
Lu Dai
Yijie Xu
Jinhui Ye
Hao Liu
Hui Xiong
3DV
RALM
86
2
0
03 Mar 2025
Argument Summarization and its Evaluation in the Era of Large Language Models
Moritz Altemeyer
Steffen Eger
Johannes Daxenberger
Tim Altendorf
Philipp Cimiano
Benjamin Schiller
LM&MA
ELM
LRM
70
0
0
02 Mar 2025
How Diversely Can Language Models Solve Problems? Exploring the Algorithmic Diversity of Model-Generated Code
Seonghyeon Lee
Heejae Chon
Joonwon Jang
Dongha Lee
Hwanjo Yu
ALM
39
0
0
02 Mar 2025
BadJudge: Backdoor Vulnerabilities of LLM-as-a-Judge
Terry Tong
Fei Wang
Zhe Zhao
Mengzhao Chen
AAML
ELM
37
1
0
01 Mar 2025
Teach-to-Reason with Scoring: Self-Explainable Rationale-Driven Multi-Trait Essay Scoring
Heejin Do
Sangwon Ryu
Gary Geunbae Lee
LRM
55
0
0
28 Feb 2025
A Pilot Empirical Study on When and How to Use Knowledge Graphs as Retrieval Augmented Generation
Xujie Yuan
Y. Liu
Shimin Di
Shiwen Wu
Libin Zheng
Rui Meng
Lei Chen
Xiaofang Zhou
Jian Yin
36
0
0
28 Feb 2025
Learning to Align Multi-Faceted Evaluation: A Unified and Robust Framework
Kaishuai Xu
Tiezheng YU
Wenjun Hou
Yi Cheng
Liangyou Li
Xin Jiang
Lifeng Shang
Qiang Liu
Wenjie Li
ELM
66
0
0
26 Feb 2025
Can LLMs Help Uncover Insights about LLMs? A Large-Scale, Evolving Literature Analysis of Frontier LLMs
Jungsoo Park
Junmo Kang
Gabriel Stanovsky
Alan Ritter
57
0
0
26 Feb 2025
Independent Mobility GPT (IDM-GPT): A Self-Supervised Multi-Agent Large Language Model Framework for Customized Traffic Mobility Analysis Using Machine Learning Models
Fengze Yang
Xiaoyue Cathy Liu
Lingjiu Lu
Bingzhang Wang
Chenxi
40
0
0
25 Feb 2025
BRIDO: Bringing Democratic Order to Abstractive Summarization
Junhyun Lee
Harshith Goka
Hyeonmok Ko
HILM
54
0
0
25 Feb 2025
PiCO: Peer Review in LLMs based on the Consistency Optimization
Kun-Peng Ning
Shuo Yang
Yu-Yang Liu
Jia-Yu Yao
Zhen-Hui Liu
Yu Wang
Ming Pang
Li Yuan
ALM
71
8
0
24 Feb 2025
REGen: A Reliable Evaluation Framework for Generative Event Argument Extraction
Omar Sharif
Joseph Gatto
Madhusudan Basak
S. Preum
47
0
0
24 Feb 2025
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models
Qianqi Yan
Yue Fan
Hongquan Li
Shan Jiang
Yang Zhao
Xinze Guan
Ching-Chen Kuo
Qing Guo
VLM
LRM
82
2
0
22 Feb 2025
Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation
SeongYeub Chu
JongWoo Kim
MunYong Yi
60
3
0
21 Feb 2025
An LLM-Based Approach for Insight Generation in Data Analysis
Alberto Sánchez Pérez
Alaa Boukhary
Paolo Papotti
Luis Castejón Lozano
Adam Elwood
44
0
0
20 Feb 2025
Previous
1
2
3
4
5
...
14
15
16
Next