ML-based Phishing URL (MLPU) detectors serve as the first level of defence to
protect users and organisations from being victims of phishing attacks. Lately,
few studies have launched successful adversarial attacks against specific MLPU
detectors raising questions about their practical reliability and usage.
Nevertheless, the robustness of these systems has not been extensively
investigated. Therefore, the security vulnerabilities of these systems, in
general, remain primarily unknown which calls for testing the robustness of
these systems. In this article, we have proposed a methodology to investigate
the reliability and robustness of 50 representative state-of-the-art MLPU
models. Firstly, we have proposed a cost-effective Adversarial URL generator
URLBUG that created an Adversarial URL dataset. Subsequently, we reproduced 50
MLPU (traditional ML and Deep learning) systems and recorded their baseline
performance. Lastly, we tested the considered MLPU systems on Adversarial
Dataset and analyzed their robustness and reliability using box plots and heat
maps. Our results showed that the generated adversarial URLs have valid syntax
and can be registered at a median annual price of \11.99.Outof13%ofthealreadyregisteredadversarialURLs,63.94%wereusedformaliciouspurposes.Moreover,theconsideredMLPUmodelsMatthewCorrelationCoefficient(MCC)droppedfromamedian0.92to0.02whentestedagainstAdv_\mathrm{data},indicatingthatthebaselineMLPUmodelsareunreliableintheircurrentform.Further,ourfindingsidentifiedseveralsecurityvulnerabilitiesofthesesystemsandprovidedfuturedirectionsforresearcherstodesigndependableandsecureMLPUsystems.