We are a creative research group of computer scientists, engineers, computational linguists and physicians, that are performing research in both language technology and in health informatics, particulary we are trying to interpret clinical texts using both domain experts and machines. We are aiming to create the future tools for clinicians, tools at the cutting edge in computer science and computational linguisitics using Artificial Intelligence. Our research is based on data from the Stockholm EPR Corpus contained in Health Bank - Swedish Health Record Research Bank. Stockholm EPR Corpus encompasses more than two million patient records in Swedish from the years 2006-2014. We have developed a number of tools and created a set of lexical resources. For possible applications please read here (in Swedish).
We are involved in the following research projects:



Ongoing projects

Improving Prediction Models for Diagnosis and Prognosis of COVID-19 and Sepsis with Natural Language Pocessing of Clinical Text

HB Use - Health Bank de-identification tool and its practical use

DataLEASH: LEarning And SHaring under Privacy Constraints

VRI-proaktiv - Nya arbetssätt och IT-stöd för att bekämpa vårdrelaterade infektioner

Completed projects

NIASC-Nordic Center of Excellence in Health-Related eSciences
, funded by Nordforsk, Nordic Council of Ministers.

MINECAN - Data and text mining of cancer symptoms and comorbidities in electronic patient records in the Nordic languages, funded by NIASC-Nordforsk.

"Artificial Intelligence analyses the patient records. Is this possible and can that improve healthcare?” Book project

AVID - Avidentifiering för sekundär användning av patientjournaler (In English: De-identification for secondary use of patient records)

High-Performance Data Mining for Drug Effect Detection (in Swedish: Dataanalys för detektion av läkemedelseffekter, DADEL)

Automated translation of radiology reports into general Swedish – part of the democratization process in health care

De-puzzling Time: Improving information access from Swedish medical records by modeling temporal expressions

Detect-HAI - Detection of Hospital Acquired Infections through language technology

HIPPA-Hospital Intelligence for better patient security
(In Swedish: HIPPA-Hospital Intelligence för bättre patientsäkerhet)

Interlock: Stockholm - San Diego - Inter-Language collaboration in clinical NLP

Stockholm EPR Open- Öppna stängda journaler genom aggregerad klinisk information, för bättre hälsa. 
(Releasing aggregated clinical information for Releasing aggregated clinical information for better research and health)

Visualisation of comorbidity network with Comorbidity-View

HEXAnord –HEalth teXt Analysis network in the Nordic and Baltic countries

Photo of group

Aron, Sara, Gunnar, Martin, Hideyuki, Mia, Maria, Sumithra and Hercules

Group members

Former participants:


Master theses and reports

Sonja Remmer. 2021. Automatic Diagnosis Code Assignment with KB-BERT — ICD Classification Using Swedish Discharge Summaries, Master Thesis, Stockholm University, pdf (1359 Kb) .

Synnøve Bråten. 2020. Extending a Synthetic Norwegian Clinical Corpus for De-Identification, Master Thesis, Stockholm University/Karolinska Institutet, pdf.

NorSynthClinical-PHI, GitHub.

Niklas Isenius. 2012. Abbreviation detection in Swedish Medical Records, The Development of SCAN, a Swedish Clinical Abbreviation Normalizer, Master Thesis, Stockholm University, pdf.

Ludvig Falck and Omid Samadi. 2012. Compound splitting of Swedish medical words - An evaluation of the Compound Splitter software, Scientific course report, Stockholm University. PDF (647 Kb) .