TY - JOUR
T1 - Jacobian norm with Selective Input Gradient Regularization for interpretable adversarial defense
AU - Liu, Deyin
AU - Wu, Lin Yuanbo
AU - Li, Bo
AU - Boussaid, Farid
AU - Bennamoun, Mohammed
AU - Xie, Xianghua
AU - Liang, Chengwu
PY - 2024/1
Y1 - 2024/1
N2 - Deep neural networks (DNNs) can be easily deceived by imperceptible alterations known as adversarial examples. These examples can lead to misclassification, posing a significant threat to the reliability of deep learning systems in real-world applications. Adversarial training (AT) is a popular technique used to enhance robustness by training models on a combination of corrupted and clean data. However, existing AT-based methods often struggle to handle transferred adversarial examples that can fool multiple defense models, thereby falling short of meeting the generalization requirements for real-world scenarios. Furthermore, AT typically fails to provide interpretable predictions, which are crucial for domain experts seeking to understand the behavior of DNNs. To overcome these challenges, we present a novel approach called Jacobian norm and Selective Input Gradient Regularization (J-SIGR). Our method leverages Jacobian normalization to improve robustness and introduces regularization of perturbation-based saliency maps, enabling interpretable predictions. By adopting J-SIGR, we achieve enhanced defense capabilities and promote high interpretability of DNNs. We evaluate the effectiveness of J-SIGR across various architectures by subjecting it to powerful adversarial attacks. Our experimental evaluations provide compelling evidence of the efficacy of J-SIGR against transferred adversarial attacks, while preserving interpretability. The project code can be found at https://github.com/Lywu-github/jJ-SIGR.git.
AB - Deep neural networks (DNNs) can be easily deceived by imperceptible alterations known as adversarial examples. These examples can lead to misclassification, posing a significant threat to the reliability of deep learning systems in real-world applications. Adversarial training (AT) is a popular technique used to enhance robustness by training models on a combination of corrupted and clean data. However, existing AT-based methods often struggle to handle transferred adversarial examples that can fool multiple defense models, thereby falling short of meeting the generalization requirements for real-world scenarios. Furthermore, AT typically fails to provide interpretable predictions, which are crucial for domain experts seeking to understand the behavior of DNNs. To overcome these challenges, we present a novel approach called Jacobian norm and Selective Input Gradient Regularization (J-SIGR). Our method leverages Jacobian normalization to improve robustness and introduces regularization of perturbation-based saliency maps, enabling interpretable predictions. By adopting J-SIGR, we achieve enhanced defense capabilities and promote high interpretability of DNNs. We evaluate the effectiveness of J-SIGR across various architectures by subjecting it to powerful adversarial attacks. Our experimental evaluations provide compelling evidence of the efficacy of J-SIGR against transferred adversarial attacks, while preserving interpretability. The project code can be found at https://github.com/Lywu-github/jJ-SIGR.git.
KW - Adversarial robustness
KW - Jacobian normalization
KW - Selective input gradient regularization
UR - https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=uwapure5-25&SrcAuth=WosAPI&KeyUT=WOS:001095476800001&DestLinkType=FullRecord&DestApp=WOS
U2 - 10.1016/j.patcog.2023.109902
DO - 10.1016/j.patcog.2023.109902
M3 - Article
SN - 0031-3203
VL - 145
JO - Pattern Recognition
JF - Pattern Recognition
M1 - 109902
ER -