Black-box scanners have played a significant role in detecting vulnerabilities for web applications. A key focus in current black-box scanning is increasing test coverage (i.e., accessing more web pages). However, since many web applications are user-oriented, some deep pages can only be accessed through complex user interactions, which are difficult to reach by existing black-box scanners. To fill this gap, a key insight is that web pages contain a wealth of semantic information that can aid in understanding potential user intention. Based on this insight, we propose Hoyen, a black-box scanner that uses the Large Language Model to predict user intention and provide guidance for expanding the scanning scope. Hoyen has been rigorously evaluated on 12 popular open-source web applications and compared with 6 representative tools. The results demonstrate that Hoyen performs a comprehensive exploration of web applications, expanding the attack surface while achieving about 2x than the coverage of other scanners on average, with high request accuracy. Furthermore, Hoyen detected over 90% of its requests towards the core functionality of the application, detecting more vulnerabilities than other scanners, including unique vulnerabilities in well-known web applications. Our data/code is available atthis https URL
View on arXiv@article{wang2025_2504.20801, title={ Unlocking User-oriented Pages: Intention-driven Black-box Scanner for Real-world Web Applications }, author={ Weizhe Wang and Yao Zhang and Kaitai Liang and Guangquan Xu and Hongpeng Bai and Qingyang Yan and Xi Zheng and Bin Wu }, journal={arXiv preprint arXiv:2504.20801}, year={ 2025 } }