Phishing URL detection generalisation using Unsupervised Domain Adaptation

Research output: Contribution to journalArticlepeer-review

Abstract

Phishing attacks are a prevailing problem in cybersecurity. In many data breaches, the initial entry can be traced back to phishing. URL-based phishing detection is one of the many ways of phishing attempt detection where solely the properties of the URLs are used to decide whether a given URL is phishing or not. While there are multiple existing works that use machine learning and deep learning to detect phishing URLs, in this paper, we show that such methods lack generalisation (i.e., they work effectively only when the test sets are split from the same training dataset). This is a significant issue since the vast majority of phishing attempts are short-lived and use freshly created domain names. Also, many network vantage points and middleboxes record URLs in slightly different formats and as such, URL data collected at various companies may be different. To address this, we propose an Unsupervised Domain Adaptation-based framework to increase the model transferability between datasets. We evaluate our approach using three datasets and show that the increase in cross-dataset F1 score performance is 0.06 on average and in some cases approximately as high as 0.2.

Original languageEnglish
Article number110398
Number of pages14
JournalComputer Networks
Volume245
Early online date12 Apr 2024
DOIs
Publication statusPublished - May 2024

Fingerprint

Dive into the research topics of 'Phishing URL detection generalisation using Unsupervised Domain Adaptation'. Together they form a unique fingerprint.

Cite this