Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.15000
Cited By
MDEval: Evaluating and Enhancing Markdown Awareness in Large Language Models
28 January 2025
Zhongpu Chen
Yixiao Liu
Long Shi
Zhi-Jie Wang
Xingyan Chen
Yu Zhao
Fuji Ren
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"MDEval: Evaluating and Enhancing Markdown Awareness in Large Language Models"
17 / 17 papers shown
Title
The Hidden Structure -- Improving Legal Document Understanding Through Explicit Text Formatting
Christian Braun
Alexander Lilienbeck
Daniel Mentjukov
AILaw
71
0
0
19 May 2025
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Yaowei Zheng
Richong Zhang
Junhao Zhang
Yanhan Ye
Zheyan Luo
Zhangchi Feng
Yongqiang Ma
159
559
0
20 Mar 2024
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Wei-Lin Chiang
Lianmin Zheng
Ying Sheng
Anastasios Nikolas Angelopoulos
Tianle Li
...
Hao Zhang
Banghua Zhu
Michael I. Jordan
Joseph E. Gonzalez
Ion Stoica
OSLM
184
603
0
07 Mar 2024
State of What Art? A Call for Multi-Prompt LLM Evaluation
Moran Mizrahi
Guy Kaplan
Daniel Malkin
Rotem Dror
Dafna Shahaf
Gabriel Stanovsky
ELM
154
147
0
31 Dec 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
613
4,459
0
09 Jun 2023
Evaluating Language Models for Mathematics through Interactions
Katherine M. Collins
Albert Q. Jiang
Simon Frieder
L. Wong
Miri Zilka
...
William Hart
T. Gowers
Wen-Ding Li
Adrian Weller
M. Jamnik
92
61
0
02 Jun 2023
Search-in-the-Chain: Interactively Enhancing Large Language Models with Search for Knowledge-intensive Tasks
Shicheng Xu
Liang Pang
Huawei Shen
Xueqi Cheng
Tat-Seng Chua
RALM
KELM
LRM
194
48
0
28 Apr 2023
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
Yang Liu
Dan Iter
Yichong Xu
Shuohang Wang
Ruochen Xu
Chenguang Zhu
ELM
ALM
LM&MA
249
1,216
0
29 Mar 2023
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
Potsawee Manakul
Adian Liusie
Mark Gales
HILM
LRM
237
448
0
15 Mar 2023
GPTScore: Evaluate as You Desire
Jinlan Fu
See-Kiong Ng
Zhengbao Jiang
Pengfei Liu
LM&MA
ALM
ELM
194
292
0
08 Feb 2023
Large Language Models Meet NL2Code: A Survey
Daoguang Zan
B. Chen
Fengji Zhang
Di Lu
Bingchao Wu
Bei Guan
Yongji Wang
Jian-Guang Lou
ELM
ALM
95
183
0
19 Dec 2022
Confident AI
Jim Davis
160
2
0
12 Feb 2022
SummEval: Re-evaluating Summarization Evaluation
Alexander R. Fabbri
Wojciech Kry'sciñski
Bryan McCann
Caiming Xiong
R. Socher
Dragomir R. Radev
HILM
148
725
0
24 Jul 2020
BLEURT: Learning Robust Metrics for Text Generation
Thibault Sellam
Dipanjan Das
Ankur P. Parikh
139
1,509
0
09 Apr 2020
BERTScore: Evaluating Text Generation with BERT
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
622
5,892
0
21 Apr 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.3K
7,212
0
20 Apr 2018
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Peter Clark
Isaac Cowhey
Oren Etzioni
Tushar Khot
Ashish Sabharwal
Carissa Schoenick
Oyvind Tafjord
ELM
RALM
LRM
249
2,679
0
14 Mar 2018
1