Design and Evaluation of a Multi-Domain Trojan Detection Method on Deep Neural Networks

Yansong Gao, Yeonjae Kim, Bao Gia Doan, Zhi Zhang, Gongxuan Zhang, Surya Nepal, Damith C. Ranasinghe, Hyoungshick Kim

Research output: Contribution to journalArticlepeer-review

41 Citations (Scopus)

Abstract

Trojan attacks on deep neural networks (DNNs) exploit a backdoor embedded in a DNN model that can hijack any input with an attacker's chosen signature trigger. Emerging defence mechanisms are mainly designed and validated on vision domain tasks (e.g., image classification) on 2D Convolutional Neural Network (CNN) model architectures; a defence mechanism that is general across vision, text, and audio domain tasks is demanded. This work designs and evaluates a run-time Trojan detection method exploiting STRong Intentional Perturbation of inputs that is a multi-domain input-agnostic Trojan detection defence across Vision, Text and Audio domains - thus termed as STRIP-ViTA. Specifically, STRIP-ViTA is demonstratively independent of not only task domain but also model architectures. Most importantly, unlike other detection mechanisms, it requires neither machine learning expertise nor expensive computational resource, which are the reason behind DNN model outsourcing scenario - one main attack surface of Trojan attack. We have extensively evaluated the performance of STRIP-ViTA over: i) CIFAR10 and GTSRB datasets using 2D CNNs for vision tasks; ii) IMDB and consumer complaint datasets using both LSTM and 1D CNNs for text tasks; and iii) speech command dataset using both 1D CNNs and 2D CNNs for audio tasks. Experimental results based on more than 30 tested Trojaned models (including publicly Trojaned model) corroborate that STRIP-ViTA performs well across all nine architectures and five datasets. Overall, STRIP-ViTA can effectively detect trigger inputs with small false acceptance rate (FAR) with an acceptable preset false rejection rate (FRR). In particular, for vision tasks, we can always achieve a 0 percent FRR and FAR given strong attack success rate always preferred by the attacker. By setting FRR to be 3 percent, average FAR of 1.1 and 3.55 percent are achieved for text and audio tasks, respectively. Moreover, we have evaluated STRIP-ViTA against a number of advanced backdoor attacks and compare its effectiveness with other recent state-of-the-arts.

Original languageEnglish
Pages (from-to)2349-2364
Number of pages16
JournalIEEE Transactions on Dependable and Secure Computing
Volume19
Issue number4
DOIs
Publication statusPublished - 2022
Externally publishedYes

Fingerprint

Dive into the research topics of 'Design and Evaluation of a Multi-Domain Trojan Detection Method on Deep Neural Networks'. Together they form a unique fingerprint.

Cite this