A URL-based Phishing Attack Detection and Data Protection Model
No Thumbnail Available
Date
2021-09-10
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Hawassa University
Abstract
Internet users are increasing rapidly in an uninterrupted way that is influencing the way of living.
Every day billions of websites are accessed over the globe to facilitate different usage to people.
This positive reinforcement is also resulting in internet abusing by hackers for their benefits.
Most of the time internet abusing has experimented with over mobile phones or emails. The
users are victimized by those abuse even without knowing that they are misused by hackers.
Social engineering has become the tool for the hacker to manipulate users psychologically to
reveal secret information.
Phishing is a kind of social engineering attack with the potential to do harm to the individual or
overall organization. Cybercriminal called Phisher comes up constantly in contact with
individuals with creative ways to compromise the secret assets. Phishers uses the malicious
URLs that are embedded over the webpage with severe threat and appears legitimate. When user
clicks these links, redirects to malicious webpage where attackers ask some secrete information
by misguiding user. Such kinds of attacks must be properly addressed.
This thesis is focused on URL based phishing detection and data protection against such kind of
attacks. Thus, the contribution of this thesis is divided into two phases that are: (1) URL based
phishing attack detection, and (2) Protection of individual/organization assets. For the first phase,
this thesis explored and implemented four machine learning algorithms like Decision tree,
Random Forest, Naive Bayes, and Logistic Regression. Further performances of these algorithms
are evaluated and compared against training and testing dataset. Based on performance result
obtained, the best algorithm is recommended. For the second phase, thesis proposed a data
protection model using a hybrid encryption method that combined AES and RSA algorithms.
This model ensures the confidentiality of information assets as well as protect them against
various kind of attacks. Overall proposed work is implemented in the Python programming
language.
The phishing detection phase concluded that Random forest outperforms and gave the highest
accuracy of detection after important feature selection as compared to other algorithms. Results
analysis conclude 96.89% and 99.06% detection accuracy over testing and training dataset
respectively in Random forest. Similarly, the data protection phase encrypted and decrypted the
data files very fast i.e., within few milliseconds and ensured the confidentiality of data in transit
Description
Keywords
Social engineering, Phishing attack, Attack detection, Machine learning algorithm, Data protection, Encryption, Decryption
