Cassandra: Detecting Trojaned Networks from Adversarial Perturbations

Xiaoyu Zhang, Rohit Gupta, Ajmal Mian, Nazanin Rahnavard, Mubarak Shah

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)


Deep neural networks are being widely deployed for critical tasks. In many cases, pre-trained models are sourced from vendors who may have disrupted the training pipeline to insert Trojan behaviors. These malicious behaviors can be triggered at the adversary's will, which is a serious security threat. To verify the integrity of a deep model, we propose a method that captures its fingerprint with adversarial perturbations. Inserting backdoors into a network alters its decision boundaries which are effectively encoded by adversarial perturbations. Our proposed Trojan detection network learns features from adversarial patterns and its properties to encode the unknown trigger shape and deviations in the decision boundaries caused by backdoors. Our method works completely without or with limited clean samples for improved performance. Our method also performs anomaly detection to identify the target class of a Trojaned network and is invariant to the trigger type, trigger size, network architecture and does not require any triggered samples. Experiments are performed on MNIST, NIST-TrojAI and Odysseus datasets, with 5000 pre-trained models in total, making this the largest study to date on Trojaned detection and the new state-of-the-art accuracy is achieved.

Original languageEnglish
Pages (from-to)135856-135867
Number of pages12
JournalIEEE Access
Publication statusPublished - 30 Jul 2021


Dive into the research topics of 'Cassandra: Detecting Trojaned Networks from Adversarial Perturbations'. Together they form a unique fingerprint.

Cite this