ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.13832
7
0
v1v2 (latest)

FrontendBench: A Benchmark for Evaluating LLMs on Front-End Development via Automatic Evaluation

16 June 2025
Hongda Zhu
Y. Zhang
Bing Zhao
Jingzhe Ding
Siyao Liu
Tong Liu
Dandan Wang
Yanan Liu
Zhaojian Li
ArXiv (abs)PDFHTML
Main:7 Pages
5 Figures
Bibliography:2 Pages
5 Tables
Abstract

Large Language Models (LLMs) have made significant strides in front-end code generation. However, existing benchmarks exhibit several critical limitations: many tasks are overly simplistic, test cases often lack rigor, and end-to-end validation is absent. These issues hinder the accurate assessment of model performance. To address these challenges, we present FrontendBench, a benchmark co-developed by humans and LLMs. FrontendBench categorizes tasks based on code functionality and incorporates interactive test scenarios, enabling a more comprehensive and practical evaluation of front-end code generation capabilities. The benchmark comprises 148 meticulously crafted prompt-test case pairs spanning five levels of web components, from basic UI elements to complex interactive features. Each task reflects realistic front-end development challenges. Furthermore, we introduce an automatic evaluation framework that executes generated code within a sandbox environment and assesses outcomes using predefined test scripts. This framework achieves a 90.54% agreement rate with expert human evaluations, demonstrating high reliability. We benchmark several state-of-the-art LLMs on FrontendBench and observe substantial performance disparities in handling real-world front-end tasks. These results highlight FrontendBench as a reliable and scalable benchmark, supporting consistent multimodal evaluation and providing a robust foundation for future research in front-end code generation. Our data and code will be released soon.

View on arXiv
@article{zhu2025_2506.13832,
  title={ FrontendBench: A Benchmark for Evaluating LLMs on Front-End Development via Automatic Evaluation },
  author={ Hongda Zhu and Yiwen Zhang and Bing Zhao and Jingzhe Ding and Siyao Liu and Tong Liu and Dandan Wang and Yanan Liu and Zhaojian Li },
  journal={arXiv preprint arXiv:2506.13832},
  year={ 2025 }
}
Comments on this paper