Author Name: Suyog Vilas Patil, Dr. Vijay Pal Singh
Paper id: 25305
Abstract:
Phishing continues to be a major cyber threat targeting individuals and organisations by stealing sensi tive information such as passwords and financial details. Traditional signature-based approaches fail to detect new and obfuscated phishing techniques. This paper presents a lightweight hybrid machine learn ing model that combines supervised and heuristic components enhanced with semantic analysis. The system integrates lexical, content-based, and technical attributes to identify phishing websites and emails effectively. Natural Language Processing (NLP) techniques, including transformer-based em beddings, are used for extracting textual semantics, while feature optimisation is achieved using a sim plified clustering-based selection method. Experimental results on benchmark phishing datasets demon strate an overall accuracy of 97.1%, precision of 96.8%, recall of 97.3%, and a False Positive Rate (FPR) of only 2.0%. The proposed framework offers improved adaptability, low computation time, and potential for real-time deployment in institutional and enterprise-level environments.


