TY - JOUR
T1 - Design and Evaluation of a Multi-Domain Trojan Detection Method on Deep Neural Networks
AU - Gao, Yansong
AU - Kim, Yeonjae
AU - Doan, Bao Gia
AU - Zhang, Zhi
AU - Zhang, Gongxuan
AU - Nepal, Surya
AU - Ranasinghe, Damith C.
AU - Kim, Hyoungshick
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2022
Y1 - 2022
N2 - Trojan attacks on deep neural networks (DNNs) exploit a backdoor embedded in a DNN model that can hijack any input with an attacker's chosen signature trigger. Emerging defence mechanisms are mainly designed and validated on vision domain tasks (e.g., image classification) on 2D Convolutional Neural Network (CNN) model architectures; a defence mechanism that is general across vision, text, and audio domain tasks is demanded. This work designs and evaluates a run-time Trojan detection method exploiting STRong Intentional Perturbation of inputs that is a multi-domain input-agnostic Trojan detection defence across Vision, Text and Audio domains - thus termed as STRIP-ViTA. Specifically, STRIP-ViTA is demonstratively independent of not only task domain but also model architectures. Most importantly, unlike other detection mechanisms, it requires neither machine learning expertise nor expensive computational resource, which are the reason behind DNN model outsourcing scenario - one main attack surface of Trojan attack. We have extensively evaluated the performance of STRIP-ViTA over: i) CIFAR10 and GTSRB datasets using 2D CNNs for vision tasks; ii) IMDB and consumer complaint datasets using both LSTM and 1D CNNs for text tasks; and iii) speech command dataset using both 1D CNNs and 2D CNNs for audio tasks. Experimental results based on more than 30 tested Trojaned models (including publicly Trojaned model) corroborate that STRIP-ViTA performs well across all nine architectures and five datasets. Overall, STRIP-ViTA can effectively detect trigger inputs with small false acceptance rate (FAR) with an acceptable preset false rejection rate (FRR). In particular, for vision tasks, we can always achieve a 0 percent FRR and FAR given strong attack success rate always preferred by the attacker. By setting FRR to be 3 percent, average FAR of 1.1 and 3.55 percent are achieved for text and audio tasks, respectively. Moreover, we have evaluated STRIP-ViTA against a number of advanced backdoor attacks and compare its effectiveness with other recent state-of-the-arts.
AB - Trojan attacks on deep neural networks (DNNs) exploit a backdoor embedded in a DNN model that can hijack any input with an attacker's chosen signature trigger. Emerging defence mechanisms are mainly designed and validated on vision domain tasks (e.g., image classification) on 2D Convolutional Neural Network (CNN) model architectures; a defence mechanism that is general across vision, text, and audio domain tasks is demanded. This work designs and evaluates a run-time Trojan detection method exploiting STRong Intentional Perturbation of inputs that is a multi-domain input-agnostic Trojan detection defence across Vision, Text and Audio domains - thus termed as STRIP-ViTA. Specifically, STRIP-ViTA is demonstratively independent of not only task domain but also model architectures. Most importantly, unlike other detection mechanisms, it requires neither machine learning expertise nor expensive computational resource, which are the reason behind DNN model outsourcing scenario - one main attack surface of Trojan attack. We have extensively evaluated the performance of STRIP-ViTA over: i) CIFAR10 and GTSRB datasets using 2D CNNs for vision tasks; ii) IMDB and consumer complaint datasets using both LSTM and 1D CNNs for text tasks; and iii) speech command dataset using both 1D CNNs and 2D CNNs for audio tasks. Experimental results based on more than 30 tested Trojaned models (including publicly Trojaned model) corroborate that STRIP-ViTA performs well across all nine architectures and five datasets. Overall, STRIP-ViTA can effectively detect trigger inputs with small false acceptance rate (FAR) with an acceptable preset false rejection rate (FRR). In particular, for vision tasks, we can always achieve a 0 percent FRR and FAR given strong attack success rate always preferred by the attacker. By setting FRR to be 3 percent, average FAR of 1.1 and 3.55 percent are achieved for text and audio tasks, respectively. Moreover, we have evaluated STRIP-ViTA against a number of advanced backdoor attacks and compare its effectiveness with other recent state-of-the-arts.
KW - AI security
KW - backdoor attack
KW - deep learning
KW - STRIP-ViTA
KW - trojan detection
UR - http://www.scopus.com/inward/record.url?scp=85100787852&partnerID=8YFLogxK
U2 - 10.1109/TDSC.2021.3055844
DO - 10.1109/TDSC.2021.3055844
M3 - Article
SN - 1545-5971
VL - 19
SP - 2349
EP - 2364
JO - IEEE Transactions on Dependable and Secure Computing
JF - IEEE Transactions on Dependable and Secure Computing
IS - 4
ER -