A particular focus of our research within data science is on rich and complex data sources, with emphasis on sequential and temporal data, histogram data, text, and graphs.  In particular, we are interested in building indexing structures and techniques for data series, predictive models for time series and sequence classification, as well as subgroup and rule discovery in transactional and sequential data.

Another focus of our research within data science is on ensemble methods, i.e., techniques for generating sets of models that collectively form predictions by voting, and on methods for generating interpretable models, e.g., rule learning. Interpretability has gained more attention in recent years, since data science methods and models have started to be used to larger extend in both industry and society as a whole. It  can be quantified at the model level, i.e., by providing a description of the whole model to the human or by instances, i.e., explaining for each decision the reasons and motivate behind the decision.  There exists lot of aspects on models that relate to interpretability: stability of model, size of model, dimensionality reduction, visualization to name a few.

Another focus area is clinical text mining, with particular emphasis on efficient and resource lean methods. We apply language technology, such as semantic analysis, to extract accurate and relevant information from very large clinical text sets.

We focus on two application areas of data science:

Learning from Electronic Health Records: the key aim is to develop and employ machine learning methods for providing efficient and effective decision support for healthcare and pharmaceutical research. The research group is currently focusing on two concrete problems: (1) learning temporal models for predicting and preventing adverse events in healthcare, such as Adverse Drug Events (ADEs) and (2) understanding heart failure and modeling heart failure patient treatment trajectories.  For the purposes of these two projects, our group has established strong collaboration with Stockholm County Council, Karolinska University Hospital, and Karolinska Institute.

Integrated Vehicle Health Management: the aim here is to facilitate integrated vehicle health monitoring (IVHM) of heavy trucks. The focus then was to investigate how to predict the vehicle's health status by calculating the components remaining life to be able for create better decisions for example: (1) using health status to schedule maintenance so that unplanned downtime is minimized, (2) creating a system that optimizes maintenance plans based in part on the health status and customer preferences, and (3) creating a system that provides decision support to drivers and fleet planners utilization of vehicles. This project is implemented in strong collaboration with SCANIA.