Online Binary Models are Promising for Distinguishing Temporally Consistent Computer Usage Profiles

Summary of seminar based on Giovanini et al. paper; CSCE 689 601 ML-Based Cyber Defenses

Online Binary Models are Promising for Distinguishing Temporally Consistent Computer Usage Profiles

This paper describes the challenges and solutions in continuous authentication, emphasizing the balance between security and usability. This blog is originally written for CSCE 689:601 and is the 21st blog of the series: "Machine Learning-Based CyberDefenses".

Paper Highlights

  • A Computer Usage Profile is formed by observing various activities on the computer such as network traffic, process activity, mouse and keyboard usage, which helps to identify users based on their patterns of behavior over time, including both activity patterns (like process and network events) and temporal patterns (consistent repetition of events).

  • These profiles are valuable for implementing continuous authentication, as they allow for automatic recording and checking of data without requiring user intervention. For example, facial recognition during a Zoom call can confirm the user's identity. This study focuses on a corporate environment and aims to identify users based on their computer usage profiles using eight weeks of collected data.

  • In data processing, the first step involves comparing structured and random data by analyzing metrics like sample entropy and Hurst exponent to understand the level of randomness in the data. The second step involves determining the period of usage patterns, which proves challenging due to methods like periodogram and autocorrelation function.

  • Both online and offline models were used, initially trained on the first seven days of data to distinguish temporally consistent computer usage profiles.

  • The results indicate that users' behaviors are distinct from random series, with sample entropy showing notable differences when background data is included.

  • The Hurst exponent performs best when no background data is included. Most structured data exhibits daily usage patterns, with consistent 24-hour periods observed in the majority of cases.

  • Determining the window of data for identifying users' behaviors is crucial, with different window sizes showing varying effectiveness. Non-numerical data is converted using TF-IDF, allowing for scoring of events based on their occurrence within a time window.

Takeaways

  • Online modeling, which involves real-time processing of data, is often considered more challenging compared to offline modeling, where data is processed in batches after collection. Many studies claim to use real-time methods but actually conduct experiments using only offline techniques. However, in this paper, the researchers use online modeling which allows for a more dynamic and immediate analysis of computer usage profiles, providing valuable insights into users' behaviors as they occur. The use of online modeling in this study represents a significant step forward in the field, and it's commendable for addressing the complexities and nuances associated with real-time data processing.

  • Instead of using a single large multi-class classifier for everyone, the paper suggests using a single-class classifier approach. The rationale behind this decision is to avoid the need for retraining the classifier every time new employees join the system, which could be inefficient and cumbersome. Instead, the single-class classifier focuses on determining whether a user belongs to a specific class or not, which aligns better with the objective of distinguishing users based on their unique computer usage profiles. This approach is more practical and efficient, as it eliminates the complexities associated with managing a multi-class classifier for a diverse user population.

  • In this scenario, assuming one class is acceptable because the threat model assumes that computers are not shared among users. Therefore, each user's computer usage profile can be considered unique and distinct from others.

  • In general, binary classification is preferred over one-class classification because it not only identifies the user (Class A) but also learns to distinguish what is "not the user" (Class B, representing all other users). In this study, authors use a binary classifier by treating it as a one-class classifier, where Class A represents the user in question, and Class B represents all other users. One-class SVM, which is a linear classifier, is suitable for one-class classification. In binary classification, Random Forest works best in general.

  • In a corporate setting, the challenge web domain alone may not suffice as the sole determinant for authentication. This limitation arises because employees often work within internal systems as well, not just on web domains. Consequently, relying solely on the challenge web domain may overlook crucial aspects of user authentication, especially when considering participation in research conducted during the last class.

  • Although unique user profiles are identified, a significant obstacle emerges: individuals evolve over time, resulting in changes to their usage profiles. This discovery underscores the dynamic nature of user behavior and the inherent challenge of maintaining static authentication methods, such as passwords. As individuals' behaviors and habits shift over time, the effectiveness of relying solely on static passwords diminishes, highlighting the need for more adaptive and dynamic authentication mechanisms.

  • Even if we could consistently detect users based on their usage patterns, passwords still serve a crucial purpose. Unlike obtaining someone's usage pattern, acquiring a password is often easier and requires less effort. In cases where a user has just booted their machine or in situations where authentication is needed before the user's usage pattern is established, relying solely on usage patterns becomes impractical. Therefore, passwords provide a convenient and immediate method for authenticating users in such situations.

  • There are practical considerations like the need to fulfill a minimum number of windows before a usage pattern can be reliably established. In such cases, passwords serve as a reliable means of authentication before sufficient data is available to verify the user's identity based on their usage patterns.

  • Ultimately, passwords offer a dual function: they provide a means of authentication in scenarios where usage patterns may not be immediately available or reliable, and they serve as a cross-check or validation method to ensure the security of the authentication process.

  • Th proposed approach holds potential as the future of authentication, though there are concerns such as privacy issues and the risk of data leaks. During training, the goal is to minimize false positive rates (FPR) to ensure that users are not unnecessarily logged out, which could disrupt workflow.

    • Using biometric authentication via a mouse introduces a new set of challenges. One potential attack vector is through tampering with the mouse hardware. An attacker could introduce a fake USB device that signals to the computer that it's the genuine user, effectively bypassing authentication.

    • Keyboards can inadvertently leak information, presenting a potential security risk. Therefore, it is essential to consider and mitigate these various security challenges when implementing novel authentication methods based on computer usage profiles and biometrics. In military settings, unconventional attacks on keyboards like using magnetic interference might be considered as potential threats. These types of attacks fall under the category of side-channel attacks, exploiting unintended channels of communication or information leakage.

  • This week's discussion focuses on the challenges and solutions in continuous authentication, aiming to identify users continuously rather than just at the time of password entry. The primary concern addressed is unauthorized access, either through password compromise or leaving a device unattended and accessible to others. Continuous authentication offers enhanced security by promptly logging out sessions when user identification fails, which can prevent unauthorized access, especially in cases where someone else gains physical access to the device.

  • The concept of usable security is highlighted, emphasizing the balance between security measures and user convenience. For example, the shift from traditional passwords to more user-friendly authentication methods like patterns on smartphones illustrates the concept of usable security.

  • Fingerprint authentication is a convenient yet potentially insecure method due to the inability to change fingerprints if compromised. However, fingerprinting devices, as done by companies like Google and Facebook, serves as a form of continuous authentication to verify new device logins.

  • Behavioral profiling for authentication purposes, such as analyzing purchase patterns or device usage behaviors, raises privacy concerns. Companies like Google have access to extensive user profiles, including home locations, sleep patterns, and accelerometer data, which can be used for authentication purposes, as demonstrated in experiments like Google's authentication based on device shaking.

  • The discussion also touches on the potential misuse of today's technology for malicious purposes, such as keylogging, mouse logging, and screen logging, highlighting the importance of distinguishing between legitimate and malicious activities.

AspectAuthenticationAuthorization
DefinitionProcess of verifying the identity of a user or entity.Process of determining what actions a user can take.
PurposeEstablishes whether the user is who they claim to be.Determines the permissions and access rights of a user.
GoalEnsures security by preventing unauthorized access.Controls access to resources based on user privileges.
VerificationVerifies the user's credentials (e.g., password, biometrics).Validates the user's permissions (e.g., role, group membership).
OutcomeGrants or denies access based on successful authentication.Grants or denies access based on authorized permissions.
ExamplesLogging in with a username and password.Accessing specific files or databases based on user role.
Key ComponentsUser credentials, authentication server.Access control lists, permissions, roles, policies.

References