DroidEvolver: Self-Evolving Android Malware Detection System
Summary of seminar presented by Vasudeva Vallabhajosyula; based on Xu et al. paper; CSCE 689 601 ML-Based Cyber Defenses
The paper describes self-evolving android malware detection system known as DroidEvolver. This blog is originally written for CSCE 689:601 and is the eighth blog of the series: "Machine Learning-Based CyberDefenses".
Paper highlights
Problem: The dynamic nature of the android framework poses a challenge as traditional features quickly become outdated, leading to ineffective malware detection over time.
Solution: DroidEvolver
DroidEvolver tackles the "ageing problem" by introducing a self-updating mechanism, ensuring adaptability to evolving malware.
Comparative analysis with MAMADROID reveals DroidEvolver to be significantly more accurate and efficient (28.58 times improvement).
DroidEvolver demonstrates resilience against obfuscation techniques while maintaining lightweight updates.
The system uses a pool of models, online learning, weighted voting mechanism, and a juvenilization indicator to enhance detection capabilities.
Leveraging tools like Soot and FlowDroid, DroidEvolver abstracts information to create markov chains and derive 2D feature vectors.
The choice of passive aggressive detection model enhances performance.
Takeaways
The process of drift retraining is essential for maintaining the effectiveness of the model. While immediate retraining is assumed in the paper, it's crucial to acknowledge that in real-world scenarios, retraining may not occur instantly due to various factors.
The key takeaway here is to establish a pool of models. In practice, while the primary model remains operational, a secondary model operates in the background. This secondary model undergoes retraining to adapt to evolving circumstances. It is important to note that instead of directly updating the primary model, this approach ensures continuous improvement without disrupting ongoing operations.
Determining the optimal number of models remains a challenge, depending on available resources and operational constraints.
Delays in label acquisition, due to factors like drift detection or manual intervention, necessitate the use of pseudo-labels to mitigate delays. These pseudo-labels are obtained from 2nd classifier working in background, although trust in secondary classifiers remains a concern. These pseudo-labels are replaced by real labels later obtained from manual or sandbox analysis.
The operational implementation of pseudo-labeling and model updates requires manual intervention, typically handled by detection engineers, like roles in DevOps or ML security operations.
MLSecOps | MLOps | DevSecOps | |
Focus | Security operations for machine learning | Operations for machine learning deployment | Integration of security into DevOps |
Primary Goal | Securing machine learning systems | Efficient deployment and management of ML | Embedding security throughout the SDLC |
Key Activities | Threat detection and response in ML models, Model monitoring and anomaly detection, Data privacy and compliance in ML | Continuous integration and deployment of ML models, Infrastructure provisioning and scaling, Performance monitoring and optimization | Secure code review and static analysis, Continuous security testing, Automation of security controls |
Key Tools | ML-specific security tools and platforms | ML frameworks (e.g., TensorFlow, PyTorch) | DevOps tools (e.g., Jenkins, Docker) |
Team Focus | Security analysts, data scientists | Data engineers, ML engineers, DevOps engineers | Developers, operations, security |
Outcome | Secure and resilient ML systems | Efficient and scalable ML deployment | Secure and reliable software delivery |
- Red Team v/s Blue Team
Red Team | Blue Team | |
Objective | Identify vulnerabilities and simulate attacks on systems and processes | Defend against attacks and improve security posture |
Role | Adversarial role, simulating attackers | Defensive role, protecting against attackers |
Activities | - Conduct penetration testing | - Security monitoring and incident response |
- Exploit vulnerabilities to gain access | - Analyze security alerts and investigate incidents | |
- Test security controls and protocols | - Implement security measures and patches | |
Focus | Exploiting weaknesses to identify security gaps | Detecting and mitigating security threats and breaches |
Tools | Attack simulation tools, exploit frameworks | SIEM (Security Information and Event Management), IDS/IPS |
Outcome | Identify weaknesses for improvement | Strengthen defenses and respond to threats |
Collaboration | Limited collaboration with blue team | Close collaboration with red team for feedback and improvement |
Frequency | Periodic assessments, typically on-demand | Continuous monitoring and response |
- AI/ML Penetration Testing can significantly identify vulnerabilities and cyber security threats that attackers could exploit to gain unauthorized access. With AI and machine learning, pentesters can use algorithms and machine learning models to quickly and efficiently scan networks and identify vulnerabilities. This can help pentesters to save time and effort, and can also help to improve the accuracy and effectiveness of their testing.