Adversarial Machine Learning in Image Classification: A Survey Toward the Defender’s Perspective
Summary of seminar based on Machado et al. paper; CSCE 689 601 ML-Based Cyber Defenses
This paper is a survey of defender's techniques describing adversarial machine learning in image classification. This blog is originally written for CSCE 689:601 and is the 16th blog of the series: "Machine Learning-Based CyberDefenses".
Takeaways
The paper explores adversarial machine learning in the context of images, diverging from text-based studies, highlighting the growing importance of visual data analysis in cybersecurity. The interdisciplinary approach of the authors, despite their military background, showcases the broader applicability of machine learning and cybersecurity methodologies beyond traditional academic boundaries. The seminar discussed the conversion of binary data to images for analysis, offering advantages like leveraging image processing techniques but also posing challenges such as altering the feature space, potentially complicating malicious pattern identification.
Studies by authors from military backgrounds, like the US Army, underscore the seriousness of adversarial attacks, emphasizing the real-world implications of cyber warfare and the crucial role of defense strategies in countering such threats. Through innovative techniques and interdisciplinary collaborations, it contributes to enhancing cybersecurity measures in today's intricate digital landscape.
Assessing the state-of-the-art involves evaluating practical implications. The seminar's methodology qualifies as state-of-the-art due to its open disclosure and accessibility. However, it's important to acknowledge that publicly undisclosed techniques might surpass the current state-of-the-art.
Distinguishing between adversarial attacks and adversarial examples is crucial in cybersecurity discussions, despite some perceiving it as pedantic. Adversarial attacks aim to compromise machine learning models, while adversarial examples trigger misclassification. This nuanced difference informs tailored defense strategies, emphasizing the importance of understanding adversarial threats in cybersecurity.
Adversarial examples stem from the complexity of machine learning models and their susceptibility to crafted inputs that exploit vulnerabilities. Achieving perfect classification remains challenging due to the difficulty in defining maliciousness accurately. Image-based attacks, like those in autonomous vehicles, underscore the practical relevance and critical need for robust defense mechanisms in safeguarding against adversarial threats.
In autonomous vehicles, imperceptible noise added to a stop sign could cause machine learning models to fail in recognizing it, potentially leading to accidents. Physical attacks, like affixing tape onto a stop sign, can evade model detection while remaining visible to humans. In license plate recognition, injecting malicious commands, such as "DROP DATABASE," showcases the risks of exploiting vulnerabilities in image-based systems.
Facial recognition systems face challenges from spoofing attacks, where adversaries manipulate visual cues to deceive the model. Augmenting models with increased nonlinearity is necessary to enhance resilience against such attacks. Addressing adversarial examples requires a multifaceted approach focusing on robustness, interpretability, and adaptability to fortify machine learning models against unforeseen threats.
Mapping concepts to previously learned concepts
This paper briefly discusses a lot of topics that we covered previously like convolutional neural networks (CNNs), autoencoders, generative adversarial networks (GANs) and adversarial images and attacks.
With respect to adversarial images, the paper defines formal terms like perturbation scope, perturbation visibility and perturbation measurement. While we didn't go through this taxonomy, we talked about perturbation and adversarial image generation in Mal-LSGAN: An Effective Adversarial Malware Example Generation Model. In this paper, authors define use p-norms as a metric for perturbation measurement to control the size and amount of perturbations. They also classify algorithms used to compute perturbations into (i) one-step/one-shot and (ii) iterative.
We talked about threat model in a lot of papers (almost all) previously and this paper covers the same concept. It defines threat model as conditions under which a defense is designed to provide security guarantees against certain types of attacks and attackers. As a step further, authors classify threat models into six different axes: (i) attacker's influency, (ii) attacker's knowledge, (iii) security violation, (iv) attack specificity, (v) attack computation, and (vi) attack approach.
We also discussed about various types of attacks previously. The authors also describe a variety of attacks like poisoning and exploratory attacks. They give the same example that we discussed for exploratory attack: input/output attack, where the attacker provides the targeted model with adversarial images to observe the outputs given by this model and tries to reproduce a similar substitute or surrogate model. They further divide attacks based on attacker's knowledge: white-box, black-box, and grey-box attacks which we covered in previous papers.
The authors discuss attack approaches: Adversarial attacks can be organized with respect to the approach used by the attack algorithm to craft the perturbation. The approach used by an adversarial attack can be based on (i) gradient, (ii) transferibility/score, (iii) decision, and (iv) approximation. While we didn't talk about them together, we covered individual approaches in different papers.
We have been discussing defense techniques since day 1, using gradient descent, statistical methods, ensemble of classifiers (DroidEvolver: Self-Evolving Android Malware Detection System, Transcending TRANSCEND: Revisiting Malware Classification in the Presence of Concept Drift). Here, the author classify defense according to its main objective: (i) proactive or (ii) reactive. Defense approaches use similar procedures, which can range from brute force solutions to preprocessing techniques. Based on a systematic review of literature, this paper also categorizes most relevant proactive and reactive countermeasures according to their operational approach: (i) gradient masking, (ii) auxiliary detection models, (iii) statistical methods, (iv) preprocessing techniques. (v) ensemble of classifiers, and (vi) proximity measurements.
We also talked about stochastic gradients and vanishing gradients previously. Here, the authors classify gradient masking into: (i) shattered gradients, (ii) stochastic gradients, (iii) exploding/vanishing gradients.
We previously discussed about adversarial training and distillation. The authors discuss adversarial training and distillation as two strategies for gradient masking: (i) Adversarial Training and (ii) Defensive Distillation. They highlight issues with adversarial training, including its strong coupling with the attack algorithm used during training, leading to a lack of generality against different attack algorithms. To address this, a more extensive training dataset with adversarial images from various attacks is needed, making the process computationally inefficient due to the large number of images and increased training time. A robust defense method should be independent of specific attack algorithms to enhance generalization.
In this paper, statistical methods are also covered. We saw p-value in Transcend: Detecting Concept Drift in Malware Classification Models as one of the statistical methods. Here, the authors further describe a reactive defense called Kernel Density Estimation (KDE). KDE makes use of Gaussian Mixture Models to analyze the outputs of the logits layer of a DNN and to verify if the input images belong to the same distribution of legitimate images.
When we said that ideas from different fields can be applied into cybersecurity, we also noted that preprocessing will be required to convert data into desired formats. In this paper, the authors describe preprocessing techniques for image transformations, GANs, noise layers, autoencoders, and dimensionality reduction.
It is always stressed in the class that ensemble of classifiers is better. The authors of this paper state the same and define an ensemble of classifiers as countermeasures formed by two or more classification models that can be chosen in runtime.
We talked about nearest neighbors in the beginning of the semester. Here the authors take a further step and use deep k-nearest neighbors (DkNN) for proximity measurement which uses of a variation of the kNN algorithm to compute uncertainty and reliability metrics from the proximity among the hidden representations of training and input images, obtained from each layer of a DNN.
In this paper, the authors mention various hypothesis. We discussed about null hypothesis in a previous paper, the others mentioned in the paper are High Non-linearity Hypothesis, Linearity Hypothesis, Boundary Tilting Hypothesis, High-dimensional Manifold, Lack of Enough Training Data, Non-robust Features Hypothesis, Explanations for Adversarial Transferability.
While we discussed the principles for designing and evaluating defenses in different classes, the authors of this paper give a structured approach:
Define a Threat Model
Simulate Adaptive Adversaries
Develop Provable Lower Bounds of Robustness
Perform Basic Sanity Tests
Report model accuracy on legitimate samples
Receiver Operating Characteristic (ROC)
Iterative versus one-step attacks
Increase the perturbation budget
Try brute force attacks
White-box versus black-box attacks
Attack similar undefended mode
Releasing of Source Code
We also talked about developing pipelines for antivirus solutions. In this paper, the authors state the same under the name "Hybrid Defenses" which involve assembling different countermeasures into modules within an architecture. Each module performs a security procedure, such as reactive defense or preprocessing, randomly selected from a repository. For example, a hybrid defense might include a reactive module, a preprocessing module, and a proactive module, each randomly selecting appropriate defenses for detecting, processing, and classifying input images.