Photo: Irvin Homem
Irvin Homem

Digital Forensics is used to aid traditional preventive security mechanisms when they fail to curtail sophisticated and stealthy cybercrime events. The Digital Forensic Investigation process is largely manual in nature, or at best quasi-automated, requiring a highly skilled labour force and involving a sizeable time investment. Industry standard tools are evidence-centric, automate only a few precursory tasks (E.g. Parsing and Indexing) and have limited capabilities of integration from multiple evidence sources. Furthermore, these tools are always human-driven.

These challenges are exacerbated in the increasingly computerized and highly networked environment of today. Volumes of digital evidence to be collected and analyzed have increased, and so has the diversity of digital evidence sources involved in a typical case. This further handicaps digital forensics practitioners, labs and law enforcement agencies, causing delays in investigations and legal systems due to backlogs of cases. Improved efficiency of the digital investigation process is needed, in terms of increasing the speed and reducing the human effort expended. This study aims at achieving this time and effort reduction, by advancing automation within the digital forensic investigation process.

Using a Design Science research approach, artifacts are designed and developed to address these practical problems. Summarily, the requirements, and architecture of a system for automating digital investigations in highly networked environments are designed. The architecture initially focuses on automation of the identification and acquisition of digital evidence, while later versions focus on full automation and self-organization of devices for all phases of the digital investigation process. Part of the remote evidence acquisition capability of this system architecture is implemented as a proof of concept. The speed and reliability of capturing digital evidence from remote mobile devices over a client-server paradigm is evaluated. A method for the uniform representation and integration of multiple diverse evidence sources for enabling automated correlation, simple reasoning and querying is developed and tested. This method is aimed at automating the analysis phase of digital investigations. Machine Learning (ML)-based triage methods are developed and tested to evaluate the feasibility and performance of using such techniques to automate the identification of priority digital evidence fragments. Models from these ML methods are evaluated in identifying network protocols within DNS tunneled network traffic. A large dataset is also created for future research in ML-based triage for identifying suspicious processes for memory forensics.

From an ex ante evaluation, the designed system architecture enables individual devices to participate in the entire digital investigation process, contributing their processing power towards alleviating the burden on the human analyst. Experiments show that remote evidence acquisition of mobile devices over networks is feasible, however a single-TCP-connection paradigm scales poorly. A proof of concept experiment demonstrates the viability of the automated integration, correlation and reasoning over multiple diverse evidence sources using semantic web technologies. Experimentation also shows that ML-based triage methods can enable prioritization of certain digital evidence sources, for acquisition or analysis, with up to 95% accuracy.

The artifacts developed in this study provide concrete ways to enhance automation in the digital forensic investigation process to increase the investigation speed and reduce the amount of costly human intervention needed.

Click this link to read the thesis