The project aims to develop prediction models for diagnosis and prognosis of COVID-19 and sepsis, focusing specifically on NLP for enabling information from clinical text to be incorporated. We hypothesize that such multimodal prediction models will outperform models that use only structured data. This is based on the complementary nature of structured and unstructured data in electronic health records, where certain clinical data is recorded in a structured manner (e.g. drug administration and vital signs) and other information is available only in free-text (e.g. symptoms and assessments). Prediction models that can detect a disease early (early prediction) or provide disease progression prognoses (outcome prediction) are critical in improving patient management by facilitating early and appropriate interventions, as well as allocation of healthcare resources. This will ultimately lead to improved outcomes, reduced suffering and lower healthcare costs.

We will focus on prediction models for diagnosis and prognosis of COVID-19 and sepsis. COVID-19 is a new disease and knowledge about its pathogenesis is limited, with outcomes ranging from mild, asymptomatic cases to severe cases requiring intensive care and mechanical ventilation, with death a possible final outcome. In severe cases, the patient may develop sepsis; however, almost any infection can lead to sepsis, which is one of the leading causes of hospital morbidity and mortality. Early detection and initiation of treatment drastically increases the chances of survival. Focusing on COVID-19 and sepsis allows us to investigate multimodal prediction models for two infectious diseases where the impact and urgency are high.

Development of clinical prediction models requires close interdisciplinary collaboration between data scientists and medical experts. This project will support a constellation with methodological and technical knowledge in the areas of machine learning and NLP, along with the domain expertise and clinical understanding needed for developing prediction models that can be deployed in a real clinical setting and thereby generate tangible societal value.

Project leader: Aron Henriksson
Participants: Anastasios Lamproudis, Yash Pawar, Pontus Nauclér, John Karlsson Valik
Partners: Karolinska University Hospital, Karolinska Institutet
Funding: Region Stockholm: 1.6 MSEK, 2021-2022