In Swedish: Dataanalys för detektion av läkemedelseffekter (DADEL)

Duration: 2012-2016
Funding: 19 MSEK, Swedish Foundation for Strategic Research


The main goal of the project is to develop techniques and tools to support decision making and discovery of drug effects by analyzing patient records, drug registries, case safety reports and chemical compound data in the form of both structured and unstructured (free text) data.

The project will contribute with novel approaches to data mining and clinical text mining and develop a platform for large-scale analysis of massive, heterogeneous and continuously growing data sets.

Project team:

Henrik Boström (project leader), Stockholm University
Hercules Dalianis, Stockholm University
Lars Asker, Stockholm University
Ulf Johansson, University of Borås
Håkan Sundell, University of Borås
Martin Duneld, Stockholm University
Mia Kvist, Stockholm University
Aron Henriksson, Stockholm University
Karl Jansson, University of Borås
Isak Karlsson, Stockholm University
Henrik Linusson, University of Borås
Tuve Löfström, University of Borås
Jing Zhao, Stockholm University

DADEL team







Maria, Henrik L., Håkan, Tuve, Jing, Aron, Lars, Martin, Isak, Karl, Henrik B.
(and Hercules taking the photo)

Journal publications

Henelius, A., Puolamäki, K., Boström, H., Asker, L., and Papapetrou, P. A Peek into the Black Box: Exploring Classifiers by Randomization. Data Mining and Knowledge Discovery, In press.

Henriksson, A., Moen, H., Skeppstedt, M., Daudaravicius, V. and Duneld, M. Synonym Extraction and Abbreviation Expansion with Ensembles of Semantic Spaces. Journal of Biomedical Semantics, 5:6, 2014.

Johansson, U., Boström, H., Löfström, T. and Linusson, H. (2014), Regression Conformal Prediction with Random Forests, Machine Learning, In press.

Karunaratne, T. and H. Boström and U. Norinder, Comparative analysis of the use of chemoinformatics-based and substructure-based descriptors for quantitative structure-activity relationship (QSAR) modeling. Intelligent Data Analysis, Vol. 17, No. 2, IOS press, 2013.

Norinder, U. and Boström, H. (2012). Introducing Uncertainty in Predictive Modeling - Friend or Foe?. Journal of Chemical Information and Modeling, vol. 52, pp. 2815-2822

Norinder, U. and Boström H., Representing descriptors derived from multiple conformations as uncertain features for machine learning. Journal of Molecular Modeling, Vol. 19, No. 6, pp. 2679-2685, Springer, 2013.

Skeppstedt, M., M. Kvist, H. Dalianis and G.H. Nilsson. 2014. Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: An annotation and machine learning study. Journal of Biomedical Informatics, DOI: 10.1016/j.jbi.2014.01.012.

Velupillai, S., M. Skeppstedt, M. Kvist, D. Mowery, B. Chapman, H. Dalianis and W. Chapman, W. 2014. Cue-based assertion classification for Swedish clinical text - developing a lexicon for pyConTextSwe. Special issue: Text Mining and Information Analysis. Artificial Intelligence In Medicine. DOI: 10.1016/j.artmed.2014.01.001.

Conference publications

Asker, L., Boström, H., Karlsson, I., Papapetrou, P. and Zhao, J.  Mining Candidates for Adverse Drug Interactions in Electronic Patient Records. In Proceedings of the 7th International Conference on Pervasive Technologies Related to Assistive Environments, PETRA’14, May 27-30, 2014, Island of Rhodes, Greece. 

Henriksson, A., Moen, H., Skeppstedt, M., Eklund, A-M., Daudaravicius, V. and Hassel, M. (2012). Synonym Extraction of Medical Terms from Clinical Text Using Combinations of Word Space Models. In Proceedings of Semantic Mining in Biomedicine, SMBM 2012, Zurich, Switzerland.

Henriksson, A., Conway, M., Duneld, M. and Chapman, W. Identifying Synonymy between SNOMED Clinical Terms of Varying Length Using Distributional Analysis of Electronic Health Records. In Proceedings of the Annual Symposium of the American Medical Informatics Association, AMIA 2013, pp. 600-609, American Medical Informatics Association, 2013, Washington DC, USA.

Henriksson, A., Skeppstedt, M., Kvist, M., Duneld, M. and Conway, M. Corpus-Driven Terminology Development: Populating Swedish SNOMED CT with Synonyms Extracted from Electronic Health Records. In Proceedings of BioNLP, pp. 36-44, Association for Computational Linguistics, 2013, Sofia, Bulgaria.

Johansson, U., Boström, H. and Löfström, T. (2013), Conformal Prediction Using Decision Trees, IEEE International Conference on Data Mining (ICDM), pp. 330-339, Dallas, TX.

Johansson, U., Löfström, T. and Boström, H. (2013), Random Brains, The International Joint Conference on Neural Networks (IJCNN), Dallas, TX, IEEE.

Johansson, U., König, R., Löfström, T. and Boström, H. (2013), Evolved Decision Trees as Conformal Predictors, IEEE Congress on Evolutionary Computation (CEC), pp. 1794-1801, Cancun, Mexico.

Johansson, U., Löfström, T. and Boström, H. (2013), Overproduce-and-Select: The Grim Reality, Computational Intelligence and Ensemble Learning, IEEE Symposium Series on Computational Intelligence (SSCI), pp. 52-59, Singapore.

Johansson, U. and Löfström, T. (2012). Producing Implicit Diversity in ANN Ensembles, The International Joint Conference on Neural Networks,pp. 1-8, Brisbane, Australia.

Karlsson I., J. Zhao, L. Asker and H. Boström, Predicting Adverse Drug Events by Analyzing Electronic Patient Records. Proc. of the 14th Conference on Artificial Intelligence in Medicine (AIME), Lecture Notes in Computer Science, Vol. 7885, pp. 125-129, Springer Publishing Company, 2013.

Karlsson, I. and Zhao, J. Dimensionality Reduction with Random Indexing: an Application on Adverse Drug Event Detection using Electronic Health Records. In Proceedings of the 27th International Symposium on Computer-Based Medical Systems (CBMS), May 27-29, 2014, New York, USA.

Karlsson, I. and Boström, H. Handling Sparsity with Random Forests when Predicting Adverse Drug Events from Electronic Health Records. In Proceedings of IEEE International Conference on Healthcare Informatics (ICHI), September 15-17, 2014 (to appear), Verona, Italy.    

Karunaratne, T. and Boström, H. (2012). Can frequent itemset mining be efficiently and effectively used for learning from graph data?, In Proc. of 11th International Conference on Machine Learning and Applications, pp. 409-414

 Kvist, M. and S. Velupillai. 2013. Professional Language in Swedish Radiology Reports – Characterization for Patient-Adapted Text Simplification. In Proc. of Scandinavian Conference on Health Informatics 2013 pp. 55-59, Linköping University Electronic Press.

Kvist, M. and S. Velupillai. 2014. SCAN: A Swedish Clinical Abbreviation Normalizer. Further Development and Adaptation to Radiology. To appear in: Lecture Notes in Computer Science, Springer. Conference and Labs of the Evaluation Forum (CLEF 2014), Sheffield, UK, sept 2014.

Linusson, H., Johansson, U., & Löfström, T. (2014). Signed-Error Conformal Regression. In Advances in Knowledge Discovery and Data Mining (pp. 224-236). Springer International Publishing.

T. Löfström, U. Johansson and H. Boström, Effective Utilization of Data in Inductive Conformal Prediction using Ensembles of Neural Networks. pp. 1-8, in IEEE conference proceedings of the 2013 International Joint Conference on Neural Networks (IJCNN), 2013.

S. Meystre, H. Dalianis, J. Aberdeen and B. Malin. Automatic clinical text de-identification: is it worth it, and could it work for me?. In Studies in Health Technology and Informatics, Vol. 192, pp. 1242-1242, IOS Press, 2013.

Skeppstedt, M., Kvist, M. and Dalianis, H. .2012. Rule-based Entity Recognition and Coverage of SNOMED CT in Swedish Clinical Text. In Proceedings of International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey.

Skeppstedt, M. Ahltorp, M. and Henriksson, A. Vocabulary Expansion by Semantic Extraction of Medical Terms. In Proceedings of Languages in Biology and Medicine, LBM 2013, December 12-13, 2013, Tokyo, Japan.

Tanushi, H., H. Dalianis, M. Duneld, M. Kvist, M. Skeppstedt and S. Velupillai. Negation Scope Delimitation in Clinical Text Using Three Approaches: NegEx, PyConTextNLP and SynNeg." 19th Nordic Conference of Computational Linguistics (NODALIDA 2013). Linköping University Electronic Press, pp. 387-397, 2013.

ul Muntaha S., Skeppstedt M., Kvist M and H. Dalianis. 2012. Entity Recognition of Pharmaceutical Drugs in Swedish Clinical Text. In Proceedings of Swedish Language Technology Conference (SLTC 2012), Lund, Sweden.

Zhao, J., Henriksson, A. and Boström, H. Detecting Adverse Drug Events Using Concept Hierarchies of Clinical Codes. In Proceedings of IEEE International Conference on Healthcare Informatics (ICHI), September 15-17, 2014 (to appear), Verona, Italy.

Workshop publications

Alfalahi, A., S. Brissman and H. Dalianis. 2012. Pseudonymisation of person names and other PHIs in an annotated clinical Swedish corpus. In Proc. of the Third Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2012) held in conjunction with LREC 2012, May 26, Istanbul, pp 49-54

Boström, H. and H. Dalianis, 2012. De-identifying health records by means of active learning. In Proc. of ICML Workshop on Machine Learning for Clinical Data Analysis, Edinburgh, UK.

Dalianis, H. and Boström, H. 2012. Releasing a Swedish Clinical Corpus after Removing all Words – De-identification Experiments with Conditional Random Fields and Random Forests. In Proc. of Third LREC Workshop on Building and Evaluating Resources for Biomedical Text Mining

Henriksson, A., Kvist, M., Hassel, M. and Dalianis, H. (2012). Exploration of Adverse Drug Reactions in Semantic Vector Space Models of Clinical Text. In Proceedings of the ICML Workshop on Machine Learning for Clinical Data Analysis, Edinburgh, UK,

Henriksson, A. and Duneld, M. Optimizing the Dimensionality of Clinical Term Spaces for Improved Diagnosis Coding Support. In Proceedings of Louhi Workshop on Health Document Text Mining and Information Analysis, NICTA, 2013, Sydney, Australia.

Jansson, K., Sundell, H., Boström, H. (2014), gpuRF and gpuERT: Efficient and Scalable GPU Algorithms for Decision Tree Ensembles. IEEE 28th International Parallel & Distributed Processing Symposium Workshops (IPDPSW), pp. 1612-1621, Phoenix, USA.

Tengstrand, L., Megyesi, B., Henriksson, A., Duneld, M. and Kvist, M. EACL – Expansion of Abbreviations in CLinical text. In Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations, PITR 2014, Association of Computational Linguistics, pp 94-103, Göteborg, Sweden.

Zhao, J., Karlsson, I., Asker, L. and Boström, H. Applying Methods for Signal Detection in Spontaneous Reports to Electronic Patient Records. In 19th Knowledge Discovery and Data Mining (KDD) Conference’s Workshop on Data Mining for Healthcare (DMH), August 11-14, 2013, Chicago, USA.


Henriksson, A. Semantic Spaces of Clinical Text – Leveraging Distributional Semantics for Natural Language Processing of Electronic Health Records. Licentiate Thesis of Philosophy, Stockholm University, 2013.


Read more:

Press release(in Swedish)