A Machine Learning Approach for Phishing Attack Detection
Keywords:legitimate, machine learning, malicious, phishing, URL, website
Phishing is the easiest method for gathering sensitive information from unwary people. Phishers seek to get private data including passwords, login information, and bank account details. Cyber security experts are actively seeking for trustworthy and effective ways to identify phishing websites. In order to distinguish between legal and phishing URLs, we used machine learning (ML) technology. In this research work, using ML technology extraction and analysis of both types of URLs was performed. Extreme Gradient Boosting, Decision Tree, Logistic Regression, Random Forest (RF), and Support Vector Machine were used to identify phishing websites. The goal was to identify phishing URLs and determine the most effective ML technique by comparing the accuracy rates of each algorithm. In this, proposed methodology two datasets were used. The accuracy of models was calculated on PhishTank and UCI dataset using K-fold, feature selection and hyperparameter tuning method. Performance measures precision, recall, F1-score, and receiver operating characteristics (ROC) curve were calculated. RF provided an accuracy of 98.80% and 97.87% on the PhishTank dataset and UCI, respectively. Highest precision, recall, and F1-score value were 99% each, and AUC-ROC value was 99.89% with PhishTank dataset. Validation with other researchers showed better results with proposed methodology. Therefore, this methodology can be of help to identify phishing websites.
How to Cite
Copyright (c) 2023 Authors
This work is licensed under a Creative Commons Attribution 4.0 International License.