WP5 Testbed, demo, data (experiments) in DataLEASH

In WP5, we will start with a problem formulation from existing projects where medical, municipal, and other data repositories have been facing challenges with privacy, anonymization, pseudonymization and similar. We will perform a series of experiments on existing very large data sets from, eg., the Stockholm county council (medical records), Elekta (medical imaging data), Stockholm city (data from numerous systems that are linked to many areas of the city), to investigate possibilities and challenges with the mechanisms developed within DataLEASH. WP5 will also create demo applications for demonstrating possibilities of DataLEASH mechanisms on different types of data. At Stockholm University we have access to the research infrastructure  Health Bank, the­ Swedish Health Record Research Bank that contains over two million electronic patient records from Karolinska University Hospital from the years 2006-2014, stored in a relational database with over 80 tables where experiments can be carried out to study when and where anonymity can be preserved for example by using different privacy­ preserving data record linkage methods  beyond regular pseudonymization. Experiments can be done and data securely shared between partners on the RISE ICE computer cluster.

Participants from DSV:
Project manager: Hercules Dalianis and Uno Fors
PhD student: Thomas Vakili
Research assistant: Anastasios Lamproudis
Previous research assistant Hanna Berg and Mila Grancharova
Participants from Swedish Law and Informatics Research Institute, Stockholm University: Cecilia Magnusson Sjöberg

External partners/stakeholders: Charlotte Dingertz, City of Stockholm, Sven-Åke Lööv, Region Stockholm and Henrik Löf, Karolinska University Hospital.

Project duration:  May 1, 2019 - April 1, 2024.
Funding: KTHs Digitaliseringssatsning 2019 IT och mobil kommunikation (ICT TNG) genom regeringens strategiska forskningsområden (SFO) för att skapa världsledande forskning.

Demonstrator

HB Deid - Deidentification of texts

Publications

Lamproudis, A., Henriksson, A. and H. Dalianis. 2021. Developing a Clinical Language Model for Swedish: Continued Pretraining of Generic BERT with In-Domain Data. In the Proceeding of RANLP 21: Recent Advances in Natural Language Processing, 1-3 Sept 2021, Varna, Bulgaria.

Grancharova, M. and H. Dalianis. 2021. Applying and Sharing pre-trained BERT-models for Named Entity Recognition and Classification in Swedish Electronic Patient Records. In the Proceedings of the 23rd Nordic Conference on Computational Linguistics, NoDaLiDa 2021, Iceland, May 31 - June 2, 2021, pdf.

Dalianis, H. and H. Berg. 2021. HB Deid - HB De-identification tool demonstrator. In the Proceedings of the 23rd Nordic Conference on Computational Linguistics, NoDaLiDa 2021, Iceland, May 31 - June 2, 2021, pdf.

Berg, H., Henriksson, A., Fors, U. and H. Dalianis. 2021. De-identification of Clinical Text for Secondary Use: Research Issues. In the proceedings of HEALTHINF 2021, 14th International Conference on Health Informatics Feb 11-13, 2021, pdf.

Grancharova, M., Berg, H. and H. Dalianis. 2020. Improving Named Entity Recognition and Classification in Class Imbalanced Swedish Electronic Patient Records through Resampling. Compilation of abstracts in The Eight Swedish Language Technology Conference (SLTC-2020), Göteborg, pdf.

Berg, H., A.Henriksson and H. Dalianis. 2020. The Impact of De-identification on Downstream Named Entity Recognition in Clinical Text. In Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis, Louhi 2020, in conjunction with EMNLP 2020, (pp. 1-11), pdf.

Berg, H., Henriksson, A., Fors, U. and H. Dalianis. De-identification of Clinical Text for Secondary Use: Research Issues, Presented at the Healthcare Text Analytics Conference HealTAC 2020, April 23, London.

Berg, H. and H. Dalianis. 2020. A Semi-supervised Approach for De-identification of Swedish Clinical Text. Proceedings of 12th Conference on Language Resources and Evaluation, LREC 2020, May 13-15, Marseille, pp. 4444‑4450, pdf.

Berg, H., T. Chomutare and H. Dalianis. 2019. Building a De-identification System for Real Swedish Clinical Text Using Pseudonymised Clinical Text. In the Proceedings of the Tenth International Workshop on Health Text Mining and Information Analysis, Louhi 2019, in conjuction with Conference on Empirical Methods in Natural Language Processing, (EMNLP) November 2019, Hongkong, ACL, pp 118-125, pdf.

Berg, H. and H. Dalianis. 2019. Augmenting a De-identification System for Swedish Clinical Text Using Open Resources (and Deep learning). In the Proceedings of the Workshop on NLP and Pseudonymisation, in conjunction with the 22nd Nordic Conference on Computational Linguistics (NoDaLiDa), Turku, Finland, September 30, 2019, pdf.

Dalianis, H. 2019. Pseudonymisation of Swedish Electronic Patient Records Using a Rule-based Approach. In the Proceedings of the Workshop on NLP and Pseudonymisation, in conjunction with the 22nd Nordic Conference on Computational Linguistics (NoDaLiDa), Turku, Finland, September 30, 2019, pdf.