Recently, we can see tremendous machine learning applications in every field, such as object detection, natural language processing, and recommendation systems. Also, Deep Neural Network(DNN) is applied in many critical systems, such as medical diagnostic, autonomous driving, and malware detection. Consequently, deep learning model integrity is becoming more important as time passes. For example, making the wrong decision for autonomous driving use cases leads to fatal accidents.
Note that model integrity means that maintaining and assuring the accuracy and completeness of the model over its entire lifecycle.
The important question is how attackers can compromise the deep learning model?
There are two types of methods, the attackers can give malicious input to the deep learning model, or they can do it through an internal attack. Today we will focus on the internal attack. But I just want to summarize what malicious input is.
Assume that a deep learning model predicts that “It is cat”. The attacker can reach the input and add some noise maliciously. After that, the model changes the decision and says, “It is dog”. You cannot realize that there is huge difference between the original picture and malicious noise-added input.
Adversarial inputs only target miss-classification for an input. However, the HW-based attack degrades the overall inference accuracy for all inputs.
Let's look at the HW-based attacks.
The Machine Learning HW layer is vulnerable to computing logics, caches, and dram modules attacks. Our focus is the rowhammer attacks to dram modules.
For the deep learning model, the goal of the rowhammer attack is to flip the quantized weights bits.
The most interesting part of the article is here. Because, if the attackers do the rowhammer attack to the DNN model randomly, the model accuracy cannot decrease as you can see in the [1,2].)
So, we have to find the most vulnerable bits for the model to decrease accuracy significantly. Furthermore, we need to find the least number of bits that can be flipped in the DRAM module to compromise the model, because doing rowhammer attacks takes a lot of time.
A lot of deep learning model is open-source, and attackers can reach the model weights and try to reach which weights are more vulnerable with the gradient-based algorithm. After that, they can flip the most vulnerable bits by using the dram attack technique.
According to experimental result of the DeepHammer article,
- LeNet architecture with Fashion Dataset has 0.65M parameters, accuracy before attacks 90.20; after flipping 3 bits, accuracy goes down to %10
- VGG-11 architecture with Google Speech Command has 132M parameters, the accuracy before the attack is %95.35, and after flipping 5 bits, the accuracy goes down to %3.43.
Please check the DeepHammer and Bit-Flip Attack for more examples.
The models will become random output generators with only a few bit-flips out of a lot of million bits using rowhammer attacks.
Time to think about how we can enhance the robustness of the deep learning model.
References
- DeepHammer: Depleting the Intelligence of Deep Neural Networks through Targeted Chain of Bit Flips, USENIX Security ‘20
- Bit-Flip Attack: Crushing Neural Network with Progressive Bit Search, International Conference on Computer Vision (ICCV)-2019
More Article about that topics: (An update in 2022)
- Terminal Brain Damage: Exposing the Graceless Degradation
in Deep Neural Networks Under Hardware Fault Attacks - Security analysis of Deep Neural Networks Operating In the Presence of Cache Side-Channel Attacks
- An Optimization Perfective on Realizing Backdoor Injection Attacks on Deep Neural Networks In Hardware