ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.15000
  4. Cited By

MDEval: Evaluating and Enhancing Markdown Awareness in Large Language Models

28 January 2025
Zhongpu Chen
Yixiao Liu
Long Shi
Zhi-Jie Wang
Xingyan Chen
Yu Zhao
Fuji Ren
ArXiv (abs)PDFHTML

Papers citing "MDEval: Evaluating and Enhancing Markdown Awareness in Large Language Models"

17 / 17 papers shown
Title
The Hidden Structure -- Improving Legal Document Understanding Through Explicit Text Formatting
The Hidden Structure -- Improving Legal Document Understanding Through Explicit Text Formatting
Christian Braun
Alexander Lilienbeck
Daniel Mentjukov
AILaw
71
0
0
19 May 2025
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Yaowei Zheng
Richong Zhang
Junhao Zhang
Yanhan Ye
Zheyan Luo
Zhangchi Feng
Yongqiang Ma
159
559
0
20 Mar 2024
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Wei-Lin Chiang
Lianmin Zheng
Ying Sheng
Anastasios Nikolas Angelopoulos
Tianle Li
...
Hao Zhang
Banghua Zhu
Michael I. Jordan
Joseph E. Gonzalez
Ion Stoica
OSLM
184
603
0
07 Mar 2024
State of What Art? A Call for Multi-Prompt LLM Evaluation
State of What Art? A Call for Multi-Prompt LLM Evaluation
Moran Mizrahi
Guy Kaplan
Daniel Malkin
Rotem Dror
Dafna Shahaf
Gabriel Stanovsky
ELM
154
147
0
31 Dec 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALMOSLMELM
613
4,459
0
09 Jun 2023
Evaluating Language Models for Mathematics through Interactions
Evaluating Language Models for Mathematics through Interactions
Katherine M. Collins
Albert Q. Jiang
Simon Frieder
L. Wong
Miri Zilka
...
William Hart
T. Gowers
Wen-Ding Li
Adrian Weller
M. Jamnik
92
61
0
02 Jun 2023
Search-in-the-Chain: Interactively Enhancing Large Language Models with
  Search for Knowledge-intensive Tasks
Search-in-the-Chain: Interactively Enhancing Large Language Models with Search for Knowledge-intensive Tasks
Shicheng Xu
Liang Pang
Huawei Shen
Xueqi Cheng
Tat-Seng Chua
RALMKELMLRM
194
48
0
28 Apr 2023
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
Yang Liu
Dan Iter
Yichong Xu
Shuohang Wang
Ruochen Xu
Chenguang Zhu
ELMALMLM&MA
249
1,216
0
29 Mar 2023
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for
  Generative Large Language Models
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
Potsawee Manakul
Adian Liusie
Mark Gales
HILMLRM
237
448
0
15 Mar 2023
GPTScore: Evaluate as You Desire
GPTScore: Evaluate as You Desire
Jinlan Fu
See-Kiong Ng
Zhengbao Jiang
Pengfei Liu
LM&MAALMELM
194
292
0
08 Feb 2023
Large Language Models Meet NL2Code: A Survey
Large Language Models Meet NL2Code: A Survey
Daoguang Zan
B. Chen
Fengji Zhang
Di Lu
Bingchao Wu
Bei Guan
Yongji Wang
Jian-Guang Lou
ELMALM
95
183
0
19 Dec 2022
Confident AI
Confident AI
Jim Davis
160
2
0
12 Feb 2022
SummEval: Re-evaluating Summarization Evaluation
SummEval: Re-evaluating Summarization Evaluation
Alexander R. Fabbri
Wojciech Kry'sciñski
Bryan McCann
Caiming Xiong
R. Socher
Dragomir R. Radev
HILM
148
725
0
24 Jul 2020
BLEURT: Learning Robust Metrics for Text Generation
BLEURT: Learning Robust Metrics for Text Generation
Thibault Sellam
Dipanjan Das
Ankur P. Parikh
139
1,509
0
09 Apr 2020
BERTScore: Evaluating Text Generation with BERT
BERTScore: Evaluating Text Generation with BERT
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
622
5,892
0
21 Apr 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.3K
7,212
0
20 Apr 2018
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning
  Challenge
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Peter Clark
Isaac Cowhey
Oren Etzioni
Tushar Khot
Ashish Sabharwal
Carissa Schoenick
Oyvind Tafjord
ELMRALMLRM
249
2,679
0
14 Mar 2018
1