ML and DL-based Phishing Website Detection: The Effects of Varied Size Datasets and Informative Feature Selection Techniques
DOI:
https://doi.org/10.37965/jait.2023.0269Keywords:
phishing website detection, machine learning, deep learning, feature selection technique, phishing website datasets, ANOVA-F-test, mutual informationAbstract
One must interact with a specific webpage or website in order to use the Internet for communication, teamwork, and other productive activities. However, because phishing websites look benign and not all website visitors have the same knowledge and skills to inspect the trustworthiness of visited websites, they are tricked into disclosing sensitive information and making them vulnerable to malicious software attacks like ransomware. It is impossible to stop attackers from creating phishing websites, which is one of the core challenges in combating them. However, this threat can be alleviated by detecting a specific website as phishing and alerting online users to take the necessary precautions before handing over sensitive information. In this study, five machine learning (ML) and DL algorithms—cat-boost (CATB), gradient boost (GB), random forest (RF), multilayer perceptron (MLP), and deep neural network (DNN)—were tested with three different reputable datasets and two useful feature selection techniques, to assess the scalability and consistency of each classifier’s performance on varied dataset sizes. The experimental findings reveal that the CATB classifier achieved the best accuracy across all datasets (DS-1, DS-2, and DS-3) with respective values of 97.9%, 95.73%, and 98.83%. The GB classifier achieved the second-best accuracy across all datasets (DS-1, DS-2, and DS-3) with respective values of 97.16%, 95.18%, and 98.58%. MLP achieved the best computational time across all datasets (DS-1, DS-2, and DS-3) with respective values of 2, 7, and 3 seconds despite scoring the lowest accuracy across all datasets.
Metrics
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Authors
This work is licensed under a Creative Commons Attribution 4.0 International License.