Data Lake

What is a data lake?

A data lake is a storage method involving the retention of massive amounts of raw data in their native format or with very light processing. The data are extracted right from the CHUM’s source systems and deposited directly into the lake. Unlike a data warehouse, the data lake provides additional flexibility by making it possible to process more than one type of data.  

CITADEL’s data lake allows for the integration of data from different clinical systems (laboratories, clinical records, imaging, vital signs, etc.).  

How does CITADEL data lake work?

CITADEL moves clinical and administrative data, research data, and data from the Centre hospitalier de l’Université de Montréal’s (CHUM’s) different computer systems into its data lake.   

 In addition to storage, CITADEL’s mandate offers data extraction and analysis services tailored to the projects submitted. For example, a user may submit a request for access to some of these data. They will be extracted from the lake, processed into a relevant format and deposited in a secure space at the CHUM reserved for the project to allow the data to be analyzed as needed.  

Users can have access to these data, complying with certain strict predetermined ethical and legal regulatory criteria, and must observe CITADEL’s management framework (refer to the management framework).  

Once the data has been extracted, what's the next step?

When the data are extracted, research teams that want to can analyze the data in a secure space to answer their research questions. If further analyses are required, CITADEL also offers a consultation and analysis service by a team specializing in the analysis of health data.  

What are CITADEL'S objective?

  • To bring together expertise in data wrangling and data analysis: epidemiology, methodology, biostatistics, mathematics, machine learning. 
  • To provide simple, safe and fast access to clinical and administrative data in order to improve and facilitate innovation and research.  

What are the data that can be made available thanks to CITADEL’s data lake? 

Aggregated data, for example:  

  • Number of emergency room visits.  
  • Number of patients hospitalized for a specific diagnosis in a given period of time. 
  • Number of patients in the neonatal unit at a specific time.  

Requests for access to data sets, for example:  

  • Laboratory and radiology results with liver function for all liver transplant patients. 
  • Depersonalized MRI reports and list of medications for all patients currently undergoing treatment for triple negative breast cancer.  
  • Extractions of clinical and laboratory data for all people who received a transfusion of O-negative blood.   

In addition to restricting access to data to individuals entitled to it (through the required regulatory, ethical and legal approvals), the data are depersonalized or de-identified from the outset. Additional restrictions may also be implemented depending on the nature of the request.  


Where is the data kept? 

The data are kept in the CHUM’s secure enclosure.   

Who has access to the data?  

Data can be accessed by individuals with the required regulatory and legal approvals. Access by the members of a research team must be endorsed by the researcher responsible for the project at the institutional level.  

How is access to data made possible? 

Access to data via CITADEL is made possible through a robust governance structure and strict regulatory monitoring. The CITADEL management framework describes the legal and regulatory framework underlying data access, governance, and the terms of data access.  

How is data confidentiality ensured? 

In addition to restricting data access to individuals entitled to it (through the required regulatory, ethical and legal approvals), data are depersonalized or de-identified at the outset. Additional restrictions may also be implemented depending on the nature of the request. 


What services are offered by CITADEL in terms of statistics?

CITADEL’s team of experts in biostatistics can help you in different aspects of your research work: 

  • Review of the methodology of a study 
  • Review of the research protocols  
  • Writing the statistics section of a scientific article  
  • Proofreading and review of the section of a preprint article and/or in response to reviewers.  
  • Sample size calculation 
  • Different statistical analyses, in whole or in part, including, but not limited to: descriptive statistics, univariate and multivariate analyses, linear regressions, longitudinal analyses, meta-analyses, etc.  


Are CITADEL'S services free?

The services offered by CITADEL follow the fee schedule of the CRCHUM’s core scientific facilities. 

What are the costs? 

The costs related to a project depend on the nature and complexity of the project. During the initial assessment of a project, a quote is provided to the user. The project begins when CITADEL and the user agree on the costs involved. During the course of the project, if CITADEL’s mandate changes (increases or decreases), the user will be notified and no expense will be incurred without the express authorization of the person in charge of the project.