Passphrase and keystroke dynamics authentication: Usable security

This paper describes authentication methods, including passphrase and keystroke dynamics, addressing their usability, vulnerabilities, and the potential for continuous authentication. This blog is originally written for CSCE 689:601 and is the 22nd blog of the series: "Machine Learning-Based CyberDefenses".

Paper Highlights

The concept of usable security lies at the intersection of Human-Computer Interaction (HCI) and cybersecurity. Its aim is to make systems both user-friendly and secure. With stricter password policies in place, users often face challenges remembering passwords, leading to increased failed login attempts.
One proposed solution is a two-tier authentication system, combining passphrase and keystroke authentication methods. The methodology involves collecting two types of data: user studies and expert reviews. Theoretical foundations for this approach include concepts such as Shannon entropy, chunking theory, and keystroke level models.
The results of the study indicate that the combination of passphrase and keystroke authentication is more secure. However, it does not strongly advocate for the use of passphrases over passwords. To strengthen the findings, further investigation and data collection are needed. Two hypotheses were tested:
1. The hypothesis that a greater number of fingers would result in a higher probability of errors.
2. The hypothesis that passphrases would lead to fewer errors compared to passwords.
While the results demonstrate that the combination of passphrase and keystroke authentication enhances security, it does not entirely address vulnerabilities present in the system. Potential vulnerabilities include database attacks, phishing attacks, and the possibility of camera hacking to learn keystroke dynamics. Techniques like bi-gram slicing, such as in the 'money monkey' attack, pose risks.
A two-tier authentication system utilizing easy-to-remember passphrases and unique keystroke patterns for each user achieves higher entropy compared to passwords. However, it remains susceptible to further examination and mitigation of vulnerabilities.

Takeaways

Biometrics could be an alternative, but it often requires specific hardware, which might not be widely available or practical for all users. To address false positives in the proposed authentication system, collecting a sufficient amount of data is crucial. By gathering enough data, we can increase confidence in distinguishing between genuine and false authentication attempts.
Considering solutions with distinct advantages and disadvantages, a promising approach is a multi-factor authentication system. This system combines various factors like biometrics and passwords in a pipeline, leveraging their respective strengths. However, it is important to address potential vulnerabilities, including adversarial examples. Both Microsoft and Apple are already exploring this direction in their authentication systems, indicating its relevance and potential for the future.
Autocomplete features are everywhere, and they often involve monitoring user input to provide suggestions. These suggestions aren't solely deterministic, as each user receives different outputs. This monitoring implies that user input is being observed. One possible method used behind the scenes is training a one-class classifier based on user mistakes, helping the system to better understand and predict user intentions.
By using federated learning with a hybrid approach of local and global models, the system can effectively adapt to both common and individual user mistakes while preserving user privacy.
1. Federated Learning: The system utilizes federated learning techniques where each user's device hosts a local model. These local models are trained on the specific user's data, including their unique typing patterns and mistakes.
2. Combination of Local and Global Models: The system maintains a combination of local models, which capture individual user behavior, and a global model, which captures common mistakes shared across users.
3. Hybrid Approach: Common mistakes can be identified and consolidated into the global model, while user-specific mistakes are learned and refined in the local models.
4. Privacy Preservation: Federated learning ensures user privacy by keeping user data local and only sharing model updates rather than raw data with a central server.
Differential privacy is a rigorous mathematical definition of privacy. In the simplest setting, consider an algorithm that analyzes a dataset and computes statistics about it. Such an algorithm is said to be differentially private if by looking at the output, one cannot tell whether any individual's data was included in the original dataset or not. In other words, the guarantee of a differentially private algorithm is that its behavior hardly changes when a single individual joins or leaves the dataset - anything the algorithm might output on a database containing some individual's information is almost as likely to have come from a database without that individual's information. Regardless of how eccentric any single individual's details are, and regardless of the details of anyone else in the database, the guarantee of differential privacy still holds. This gives a formal guarantee that individual-level information about participants in the database is not leaked.
Entropy measures the amount of information contained in a system. For example, consider an image: its entropy is at its maximum when it consists of random noise. In terms of pixels, maximum entropy occurs when every pixel is different. When we introduce patterns or textures into the image, the amount of information decreases because certain areas become predictable. However, if we have contrasting elements like black and white lines, we introduce more information into the image, thereby increasing its entropy. Similarly, when it comes to passwords, the more information we can include, the higher the entropy, resulting in a stronger password. Therefore, entropy serves as an excellent metric for assessing the strength of a password.
The primary problem addressed in this paper is authentication, specifically exploring unique aspects of how individuals type as a potential authentication method. Initial findings suggest that typing patterns may indeed be unique to individuals. A future idea proposed in the paper is continuous authentication based on typing, where authentication occurs continuously, perhaps every second, leveraging the uniqueness of typing patterns. The paper highlights that we inadvertently leak information through various actions such as walking and typing. Researchers are actively investigating different leaked channels to identify which ones are most effective. Ultimately, a solution may involve a pipeline approach, combining multiple authentication channels for enhanced security.
In comparing passwords and passphrases based on entropy, there is an unexpected finding: passphrases tend to have higher entropy according to the paper's analysis. Passphrases are considered superior because passwords can be challenging to remember, leading to weaker choices. Despite the theory advocating for periodic password changes as a security measure due to the possibility of password leaks, the practical implementation often leads to weaker passwords as users resort to simple modifications, thus decreasing entropy.
Common attacks on passwords include brute force attacks, where attackers attempt to guess passwords, and dictionary attacks, where commonly used passwords or phrases are tried. To enhance security, passwords are typically stored by hashing them, which involves encrypting the password and storing the hash. However, attackers can use precomputed hashes, known as rainbow tables, to expedite password cracking.
A "salt" is a random string of data added to a password before hashing it. Salting passwords enhances security by ensuring that even if two users have the same password, their hashed values will be different due to the unique salt added to each password. Salts can be public without compromising security.
In the context of authentication, typing patterns can be both a tool for good and a vulnerability. Attackers can exploit typing mistakes, known as typos, for their benefit through a technique called "typo squatting." This involves registering domain names similar to popular websites but with common typing mistakes, aiming to lure users into visiting malicious websites. Mozilla and Firefox are actively combating typo squatting to protect users from falling victim to such attacks.
Another related issue involves Unicode and ASCII characters. Attackers can exploit differences between these character sets to create URLs or QR codes that appear legitimate but point to malicious websites. For example, a QR code might seem to direct users to a legitimate website, but due to Unicode manipulation, it redirects them to a malicious site, posing a significant threat to user security.

Passphrase and keystroke dynamics authentication: Usable security

Summary of seminar based on Bhana et al. paper; CSCE 689 601 ML-Based Cyber Defenses

Paper Highlights

Takeaways

References