73
1

Hybrid Machine Learning Model for Detecting Bangla Smishing Text Using BERT and Character-Level CNN

Abstract

Smishing is a social engineering attack using SMS containing malicious content to deceive individuals into disclosing sensitive information or transferring money to cybercriminals. Smishing attacks have surged by 328%, posing a major threat to mobile users, with losses exceeding \54.2millionin2019.Despiteitsgrowingprevalence,theissueremainssignificantlyunderaddressed.ThispaperpresentsanovelhybridmachinelearningmodelfordetectingBanglasmishingtexts,combiningBidirectionalEncoderRepresentationsfromTransformers(BERT)withConvolutionalNeuralNetworks(CNNs)forenhancedcharacterlevelanalysis.OurmodeladdressesmulticlassclassificationbydistinguishingbetweenNormal,Promotional,andSmishingSMS.Unliketraditionalbinaryclassificationmethods,ourapproachintegratesBERTscontextualembeddingswithCNNscharacterlevelfeatures,improvingdetectionaccuracy.Enhancedbyanattentionmechanism,themodeleffectivelyprioritizescrucialtextsegments.Ourmodelachieves98.4754.2 million in 2019. Despite its growing prevalence, the issue remains significantly under-addressed. This paper presents a novel hybrid machine learning model for detecting Bangla smishing texts, combining Bidirectional Encoder Representations from Transformers (BERT) with Convolutional Neural Networks (CNNs) for enhanced character-level analysis.Our model addresses multi-class classification by distinguishing between Normal, Promotional, and Smishing SMS. Unlike traditional binary classification methods, our approach integrates BERT's contextual embeddings with CNN's character-level features, improving detection accuracy. Enhanced by an attention mechanism, the model effectively prioritizes crucial text segments. Our model achieves 98.47% accuracy, outperforming traditional classifiers, with high precision and recall in Smishing detection, and strong performance across all categories.

View on arXiv
@article{tanbhir2025_2502.01518,
  title={ Hybrid Machine Learning Model for Detecting Bangla Smishing Text Using BERT and Character-Level CNN },
  author={ Gazi Tanbhir and Md. Farhan Shahriyar and Khandker Shahed and Abdullah Md Raihan Chy and Md Al Adnan },
  journal={arXiv preprint arXiv:2502.01518},
  year={ 2025 }
}
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.