WP5 Testbed, demo, data (experiments) in DataLEASH

In WP5, we will start with a problem formulation from existing projects where medical, municipal, and other data repositories have been facing challenges with privacy, anonymization, pseudonymization and similar. We will perform a series of experiments on existing very large data sets from, eg., the Stockholm county council (medical records), Elekta (medical imaging data), Stockholm city (data from numerous systems that are linked to many areas of the city), to investigate possibilities and challenges with the mechanisms developed within DataLEASH. WP5 will also create demo applications for demonstrating possibilities of DataLEASH mechanisms on different types of data. At Stockholm University we have access to the infrastructure HEALTH BANK, the­ Swedish Health Record Research Bank that contains over two million electronic patient records from Karolinska University Hospital from the years 2006-2014, stored in a relational database with over 80 tables where experiments can be carried out to study when and where anonymity can be preserved for example by using different privacy­ preserving data record linkage methods  beyond regular pseudonymization. Experiments can be done and data securely shared between partners on the RISE ICE compute cluster.

Participants from DSV:
Project manager: Hercules Dalianis and Uno Fors
Research assistent: Hanna Berg
Participants from Swedish Law and Informatics Research Institute, Stockholm University: Cecilia Magnusson Sjöberg

External partners/stakeholders: Christer Philip Forsberg, City of Stockholm, Sven-Åke Lööv, Region Stockholm and Henrik Löf, Karolinska University Hospital.

Project duration:  May 1, 2019, - April 30, 2020 and propably additional four years.
Funding: KTHs Digitaliseringssatsning 2019 IT och mobil kommunikation (ICT TNG) genom regeringens strategiska forskningsområden (SFO) för att skapa världsledande forskning.

Publications

Berg, H., T. Chomutare and H. Dalianis. 2019. Building a De-identification System for Real Swedish Clinical Text Using Pseudonymised Clinical Text. In the Proceedings of the Tenth International Workshop on Health Text Mining and Information Analysis, Louhi 2019, in conjuction with Conference on Empirical Methods in Natural Language Processing, (EMNLP) November 2019, Hongkong, ACL, pp 118-125, pdf.

Berg, H. and H. Dalianis. 2019. Augmenting a De-identification System for Swedish Clinical Text Using Open Resources (and Deep learning). In the Proceedings of the Workshop on NLP and Pseudonymisation, in conjunction with the 22nd Nordic Conference on Computational Linguistics (NoDaLiDa), Turku, Finland, September 30, 2019, pdf.

Dalianis, H. 2019. Pseudonymisation of Swedish Electronic Patient Records Using a Rule-based Approach ). In the Proceedings of the Workshop on NLP and Pseudonymisation, in conjunction with the 22nd Nordic Conference on Computational Linguistics (NoDaLiDa), Turku, Finland, September 30, 2019, pdf.