Hybrid Machine Learning Model for Detecting Bangla Smishing Text Using BERT and Character-Level CNN

3 February 2025

Abstract

Smishing is a social engineering attack using SMS containing malicious content to deceive individuals into disclosing sensitive information or transferring money to cybercriminals. Smishing attacks have surged by 328%, posing a major threat to mobile users, with losses exceeding \ $54.2 million in 2019. Despite its growing prevalence, the issue remains significantly under-addressed. This paper presents a novel hybrid machine learning model for detecting Bangla smishing texts, combining Bidirectional Encoder Representations from Transformers (BERT) with Convolutional Neural Networks (CNNs) for enhanced character-level analysis.Our model addresses multi-class classification by distinguishing between Normal, Promotional, and Smishing SMS. Unlike traditional binary classification methods, our approach integrates BERT's contextual embeddings with CNN's character-level features, improving detection accuracy. Enhanced by an attention mechanism, the model effectively prioritizes crucial text segments. Our model achieves 98.47% accuracy, outperforming traditional classifiers, with high precision and recall in Smishing detection, and strong performance across all categories.$

View on arXiv

@article{tanbhir2025_2502.01518,
  title={ Hybrid Machine Learning Model for Detecting Bangla Smishing Text Using BERT and Character-Level CNN },
  author={ Gazi Tanbhir and Md. Farhan Shahriyar and Khandker Shahed and Abdullah Md Raihan Chy and Md Al Adnan },
  journal={arXiv preprint arXiv:2502.01518},
  year={ 2025 }
}

Comments on this paper