TrojanPuzzle: Covertly Poisoning Code-Suggestion Models

This paper is a comprehensive exploration of the challenges, strategies, and ethical considerations surrounding LLM poisoning attacks in code repositories, cryptography, and software security. This blog is originally written for CSCE 689:601 and is the 18th blog of the series: "Machine Learning-Based CyberDefenses".

Paper Highlights

In this paper, attack goal is to introduce vulnerabilities into victim codebase for future exploitation and suggest desired payloads that fit victim code context. Attack assumption is that the victim trusts code suggestions from a language model (LLM) trained on untrusted sources like GitHub. Attack targets are static analysis tools, signature-based detection systems and code generation models like Copilot. Data poisoning is used as attack medium.
Simple attacks can be detected by static analysis. The paper proposes two attacks: COVERT attack and TROJANPUZZLE attack:
- COVERT attack: Malicious code is generated as docstrings during the inference stage, converting the entire poison sample to docstrings. Detectable by signature-based defenses such as using regular expressions.
- TROJANPUZZLE attack: Poison sample should not explicitly include any vulnerability in context. It should instead use a template poison sample with a placeholder token "<TEMPLATE_TOKEN>".
The analysis reveals that instead of learning the exact payload, LLM aims to understand a special association between trojan phases and payloads. Therefore, the procedure involves a sequence of steps: trojan payload creation, token concealment, trigger phrase identification, trojan phrase generation, poison sample generation, LLM pretraining, user input, and generation of the desired output.
In the experiments, the attack targets included CWE-79 (Cross-site scripting), 22 path traversal, 502 deserialization of unstructured data, and 89 SQL injection.

GPT-2 tokenizer splits 'yaml;' into two tokens: 'y' and 'aml'.

The success rate of all attacks peaks at epoch 2 and declines with further fine-tuning, contradicting assumptions about overfitting on poison data and results from similar tasks like image classification.

Defenses: dataset cleansing, filtering of near-duplicate poisoning files, and fine-pruning defense, which involves building a small dataset assumed to be practically clean (poison rate < 0.1%).

Takeaways

In previous papers, we talked about static analysis of binaries, here the authors talk about static analysis of source code! The focus of this paper is on poisoning in the context of training datasets, specifically at the level of source code. Poisoning entails the addition of vulnerabilities to the training dataset.
Addressing poisoning is crucial due to the risk of attackers introducing vulnerabilities into systems via Language Models (LLMs) trained on poisoned datasets. Though poisoning is not new, its importance has surged with the widespread adoption of LLMs, which rely on large datasets often sourced from platforms like GitHub.
Poisoning attacks occur during training, while adversarial attacks happen during inference. Poisoning attacks are challenging for attackers due to the need for significant data and complexity in ensuring the model learns from it. However, once poisoned, bypassing the model becomes easier, making poisoning advantageous. Thus, pursuing poisoning is recommended for its perceived benefits.
Before LLMs, poisoning could be done in traditional machine learning models, such as those used for image classification. For example, in a dataset used to train a model to recognize cats and dogs, an attacker might intentionally include images of cats mislabeled as dogs. This would potentially lead the model to misclassify similar images in the future, demonstrating how poisoning could manipulate the model's behavior.
In Python, HTTP requests can be made using the "requests" library. However, there is a concern about whether the library might send our queries to an attacker's server in addition to our intended server. One assurance mechanism is the fact that the "requests" library is open source. But open source is vulnerable!
The debate over the security of open source versus closed source software centers on transparency and scrutiny. Open source software theoretically allows for more rapid detection and resolution of vulnerabilities due to its transparent nature, but in practice, the level of scrutiny can vary widely. Closed source software may undergo rigorous internal testing, but lacks transparency for independent verification of security.
- More than 400 malicious packages were uploaded to PyPI (Python Package Index), the official code repository for the Python programming language, in the latest indication that the targeting of software developers using this form of attack isn’t a passing fad. Link
- A massively popular JavaScript library (npm package) was hacked in 2021 and modified with malicious code that downloaded and installed a password stealer and cryptocurrency miner on systems where the compromised versions were used.
- The Linux kernel is open-source, and as such, can be accessed by the wider community, and anyone can suggest changes to the code via submissions. A paper was released by the University of Minnesota "On the Feasibility of Stealthily Introducing Vulnerabilities in Open-Source Software via Hypocrite Commits". The paper described how the two researchers could generate code that claims to fix one bug in the Linux kernel while intentionally introducing other bugs. The wider community appears to missed the point of the research paper: open-source software is vulnerable. Link
Research into exploiting vulnerabilities raises ethical concerns and could lead to bans or restrictions on publication. There is a risk of papers being withdrawn or authors facing penalties, with severe cases potentially considered criminal if harm is caused.

In the future, Hugging Face and similar platforms may need specialized antivirus scanners to detect and prevent poisoning attacks. These scanners would be crucial for safeguarding the integrity of the models and datasets hosted on these platforms as their usage grows.

The initial attack method proposes directly adding a malicious file to a code repository, but it's deemed naive due to modern CI/CD pipelines with security tests. These tests, including static code analysis, mitigate vulnerabilities like buffer overflows. Without such pipelines, there's a heightened risk of producing exploitable vulnerabilities in generated binaries.
Instead of directly injecting malicious code, the paper suggests using LLMs to subtly introduce vulnerabilities. Legitimate users are likely to adopt code suggested by LLMs without scrutinizing the output. Currently, there is a lack of verification of LLM outputs by developers. To manipulate LLM suggestions, attackers could provide more examples of desired code patterns, embedding malicious intentions within comments. Comments are considered tokens by LLMs, blurring the line between code and data.
The proposed method involves committing to other repositories on GitHub and then incorporating malicious comments into the model's codebase, associating them with seemingly innocuous code snippets. Due to the absence of strict separation between code and data in LLMs, innocuous comments like "this is a function which ..." and prompts like "write a function which ..." could inadvertently result in exploitation!
Feasibility considerations include the amount of data required for successful attacks and the likelihood of attackers using legitimate versus fake accounts and repositories. The concept of multiple personalities or Sybil attacks, commonly observed in distributed systems, involves attackers creating multiple fake identities to gain control or influence. This tactic is akin to an election where the attacker aims to become the leader by having a majority of votes.
Federated learning is a technique where each participant trains their own model locally, but the challenge arises when these models need to be aggregated or shared, potentially exposing them to attacks or manipulation. Attackers can send more number of malicious models and thus final result will be malicious. In blockchain systems like Bitcoin, maintaining control over 50%+1 nodes could lead to catastrophic consequences, highlighting the importance of cryptographic keys for security.
Many challenges in computer science revolve around intrinsic trade-offs, such as balancing performance and storage requirements. In IoT systems, for instance, the need to minimize storage often leads to repetitive computations, presenting a trade-off between performance and storage efficiency. Issues such as redundancy and single points of failure further complicate security considerations, emphasizing the need for robust and resilient systems in the face of potential threats.
The evolution of computing systems has followed a trajectory from huge systems to personal computers (PCs) to cloud computing, and now to the Internet of Things (IoT). Alongside this progression, the use of LLMs on GPUs has become increasingly prevalent. This cyclical movement reflects the dynamic nature of trends in computer science, where technologies and paradigms rise and fall in popularity over time, often driven by advances in hardware, software, and the demands of users and industries.
Mitigating the risk of poisoning attacks, such as injecting vulnerabilities into code repositories, can be challenging and costly. Strategies may include maintaining duplicate copies of GitHub repositories, although this can incur significant expenses. The susceptibility of Google to poisoning attacks compared to individual users varies and depends on factors such as the level of validation and scrutiny applied by Google to its own platforms and services.
Google uses its own version of GitHub and other services like Google Maps, where rigorous validation processes are likely in place before deploying changes. For individual users, being cautious and vigilant is essential in safeguarding against various attacks, including path traversal attacks that exploit vulnerabilities like "../" to access root folders, and downgrade attacks that force software to run on lower versions with known vulnerabilities.
A downgrade attack forces a system or protocol to switch from a secure mode to a less secure one, often for compatibility with older systems. For instance, a flaw in OpenSSL enabled attackers to negotiate a lower TLS version, compromising security.
In cryptography, the mode of encryption is crucial, emphasizing proper application of the cipher to ensure security. Cryptography developers are highly skilled and compensated, focusing on identifying vulnerabilities and securing cryptographic implementations. One common mode, ECB, is simple but lacks diffusion, making it unsuitable for cryptographic protocols due to its inability to hide patterns in identical plaintext blocks.

TrojanPuzzle: Covertly Poisoning Code-Suggestion Models

Summary of seminar based on Aghakhani et al. paper; CSCE 689 601 ML-Based Cyber Defenses

Paper Highlights

Takeaways

References