TY - GEN
T1 - Detecting Compromised Architecture/Weights of a Deep Model
AU - Beetham, James
AU - Kardan, Navid
AU - Mian, Ajmal
AU - Shah, Mubarak
N1 - Funding Information:
Professor Ajmal Mian is the recipient of an Australian Research Council Future Fellowship Award (project number FT210100268) funded by the Australian Government.
Funding Information:
ACKNOWLEDGMENT This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Agreement No. HR00112090137, and is approved for public release; distribution is unlimited.
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Adversarial attacks perturb data to modify a model's prediction. These perturbations can be crafted in a white-box or black-box setting, depending on whether the target model architecture/weights are known or unknown. Compromised architecture and weights of a model makes it vulnerable to the more powerful white-box attacks. In this work, we determine if a deep model is compromised by distinguishing white-box from black-box adversarial attacks. The proposed method utilizes the internal representations of the target model and a proxy model to increase the detector efficacy. Additionally, it employs a spatial smoothing module to control the strength of white-box attacks relative to black-box attacks, and a proxy module to aid in measuring the transferability of the attack. Both modules work in tandem to increase the contrast of the internal representations between white-box and black-box attacks for better discrimination. We perform a detailed ablation of our method to showcase the importance of the different modules, and show that the spatial smoothing and proxy defense techniques enable our framework to significantly outperform the simple classification baseline on common vision datasets.
AB - Adversarial attacks perturb data to modify a model's prediction. These perturbations can be crafted in a white-box or black-box setting, depending on whether the target model architecture/weights are known or unknown. Compromised architecture and weights of a model makes it vulnerable to the more powerful white-box attacks. In this work, we determine if a deep model is compromised by distinguishing white-box from black-box adversarial attacks. The proposed method utilizes the internal representations of the target model and a proxy model to increase the detector efficacy. Additionally, it employs a spatial smoothing module to control the strength of white-box attacks relative to black-box attacks, and a proxy module to aid in measuring the transferability of the attack. Both modules work in tandem to increase the contrast of the internal representations between white-box and black-box attacks for better discrimination. We perform a detailed ablation of our method to showcase the importance of the different modules, and show that the spatial smoothing and proxy defense techniques enable our framework to significantly outperform the simple classification baseline on common vision datasets.
UR - https://www.scopus.com/pages/publications/85143628210
U2 - 10.1109/ICPR56361.2022.9956280
DO - 10.1109/ICPR56361.2022.9956280
M3 - Conference paper
AN - SCOPUS:85143628210
T3 - Proceedings - International Conference on Pattern Recognition
SP - 2843
EP - 2849
BT - 2022 26th International Conference on Pattern Recognition, ICPR 2022
PB - IEEE, Institute of Electrical and Electronics Engineers
T2 - 26th International Conference on Pattern Recognition, ICPR 2022
Y2 - 21 August 2022 through 25 August 2022
ER -