Abstract
Phishing attacks are a prevailing problem in cybersecurity. In many data breaches, the initial entry can be traced back to phishing. URL-based phishing detection is one of the many ways of phishing attempt detection where solely the properties of the URLs are used to decide whether a given URL is phishing or not. While there are multiple existing works that use machine learning and deep learning to detect phishing URLs, in this paper, we show that such methods lack generalisation (i.e., they work effectively only when the test sets are split from the same training dataset). This is a significant issue since the vast majority of phishing attempts are short-lived and use freshly created domain names. Also, many network vantage points and middleboxes record URLs in slightly different formats and as such, URL data collected at various companies may be different. To address this, we propose an Unsupervised Domain Adaptation-based framework to increase the model transferability between datasets. We evaluate our approach using three datasets and show that the increase in cross-dataset F1 score performance is 0.06 on average and in some cases approximately as high as 0.2.
| Original language | English |
|---|---|
| Article number | 110398 |
| Number of pages | 14 |
| Journal | Computer Networks |
| Volume | 245 |
| Early online date | 12 Apr 2024 |
| DOIs | |
| Publication status | Published - May 2024 |
Fingerprint
Dive into the research topics of 'Phishing URL detection generalisation using Unsupervised Domain Adaptation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver