With the advancement of modern medicine, a large number of medical encounters (doctor visits, exams, medications, imaging, molecular tests, etc.) take place in our healthcare system. In fact, the amount of new global health data generated in 2020 is expected to have reached 2,314 exabytes. Most of the content in our hospitals' electronic health records (EHRs) is recorded via sparse data tables with a high probability of missing values. Nevertheless, EMRs have a large amount of heterogeneous data from different sources, most of which is often not optimally exploited to characterize and predict disease behavior. For example, medical imaging (i.e. magnetic resonance imaging, MRI) would be a huge source of potential data to decode cancer phenotypes.

On the other hand, physicians also describe and write down important patient characteristics and disease-related symptoms for almost every medical encounter through free-form medical notes. In recent years, significant advances in multi-omics technology (e.g., genomics, transcriptomics, etc.) have also created unprecedented opportunities to characterize biological processes correlated with disease. However, combining these heterogeneous data sources in a meaningful way for disease prediction is challenging.

Therefore, for better precision medicine, physicians must now make increasingly complex therapeutic decisions with an unrealistic number of variables. Therefore, developments in artificial intelligence (AI) are envisioned to create a data science revolution in medicine. In particular, graphical neural networks (GNNs) have shown immense potential in learning meaningful and powerful data representations by combining the relational inference of graphical models with the power of deep learning. However, since the power of deep learning is strongly associated with data size and medical data cannot be easily shared among medical institutions due to patient privacy concerns, developing powerful GNN models for disease prediction in healthcare is a major challenge.


The main mission of the MEDomics UdeS lab is to develop a framework for privacy-preserving distributed GNN learning from a network of federated healthcare databases, which will be an important step in advancing AI in medicine. In this federated learning framework:
  • (i) GNN models can be developed from the databases of multiple healthcare facilities, thereby increasing the size of the analyzed data; and
  • (ii) data are always kept within the boundaries of each healthcare facility, thus avoiding data transfer.

The figures below provide an overview of the main research framework of the MEDomics UdeS lab.

Research Area 1: Medical Imaging

  • The most aggressive tumors tend to be more heterogeneous

  • Neural networks can detect the most heterogeneous tumors

Research Area 2: Heterogeneous Data

  • Data from different sources can be combined to make better predictions

  • For example, medical imaging, physician text notes and clinical data are full of relevant information

Research Area 3: Federated Learning

  • To guarantee patient privacy while ensuring maximum use of relevant data

  • Decentralized learning to ensure medical center sovereignty


The MEDomics UdeS lab aims to promote the use of best practices in data science.

Check out MEDomicsTools


This laboratory is a sub-branch of the MEDomics consortium.

Visit the MEDomics website Main publication of the consortium

At Université de Sherbrooke, this laboratory is part of the Interdisciplinary Research Group in Health Informatics (GRIIS).

Visit the GRIIS website