Posts

Tatouage des données de santé, health data

Encrypting and watermarking health data to protect it

As medicine and genetics make increasing use of data science and AI, the question of how to protect this sensitive information is becoming increasingly important to all those involved in health. A team from the LaTIM laboratory is working on these issues, with solutions such as encryption and watermarking. It has just been accredited by Inserm.

The original version of this article has been published on the website of IMT Atlantique

Securing medical data

Securing medical data, preventing it from being misused for commercial or malicious purposes, from being distorted or even destroyed has become a major challenge for both health players and public authorities. This is particularly relevant at a time when progress in medicine (and genetics) is increasingly based on the use of huge quantities of data, particularly with the rise of artificial intelligence. Several recent incidents (cyber-attacks, data leaks, etc.) have highlighted the urgent need to act against this type of risk. The issue also concerns each and every one of us: no one wants their medical information to be accessible to everyone.

Health data, which is particularly sensitive, can be sold at a higher price than bank data,” points out Gouenou Coatrieux, a teacher-researcher at LaTIM (the Medical Information Processing Laboratory, shared by IMT Atlantique, the University of Western Brittany (UBO) and Inserm), who is working on this subject in conjunction with Brest University Hospital. To enable this data to be shared while also limiting the risks, LaTIM are usnig two techniques: secure computing and watermarking.

Secure computing, which combines a set of cryptographic techniques for distributed computing along with other approaches, ensures confidentiality: the externalized data is coded in such a way that it is possible to continue to perform calculations on it. The research organisation that receives the data – be it a public laboratory or private company – can study it, but doesn’t have access to its initial version, which it cannot reconstruct. They therefore remain protected.

a

Gouenou Coatrieux, teacher-researcher at LaTIM
(Laboratoire de traitement de l’information médicale, common to IMT Atlantique, Université de Bretagne occidentale (UBO) and Inserm

Discreet but effective tattooing

Tattooing involves introducing a minor and imperceptible modification into medical images or data entrusted to a third party. “We simply modify a few pixels on an image, for example to change the colour a little, a subtle change that makes it possible to code a message,” explains Gouenou Coatrieux. We can thus tattoo the identifier of the last person to access the data. This method does not prevent the file from being used, but if a problem occurs, it makes it very easy to identify the person who leaked it. The tattoo thus guarantees traceability. It also creates a form of dissuasion, because users are informed of this device. This technique has long been used to combat digital video piracy. Encryption and tattooing can also be combined: this is called crypto-tattooing.

Initially, LaTIM team was interested in the protection of medical images. A joint laboratory was thus created with Medecom, a Breton company specialising in this field, which produces software dedicated to radiology.

Multiple fields of application

Subsequently, LaTIM extended its field of research to the entire field of cyber-health. This work has led to the filing of several patents. A former doctoral student and engineer from the school has also founded a company, WaToo, specialising in data tagging. A Cyber Health team at LaTIM, the first in this field, has just been accredited by Inserm. This multidisciplinary team includes researchers, research engineers, doctoral students and post-docs, and includes several fields of application: protection of medical images and genetic data, and ‘big data’ in health. In particular, it works on the databases used for AI and deep learning, and on the security of treatments that use AI. “For all these subjects, we need to be in constant contact with health and genetics specialists,” stresses Gouenou Coatrieux, head of the new entity. We also take into account standards in the field such as DICOM, the international standard for medical imaging, and legal issues such as those relating to privacy rights with the application of European RGPD regulations.

The Cyber Health team recently contributed to a project called PrivGen, selected by the Labex (laboratory of excellence) CominLabs. The ongoing work which started with PrivGen aims to identify markers of certain diseases in a secure manner, by comparing the genomes of patients with those of healthy people, and to analyse some of the patients’ genomes. But the volumes of data and the computing power required to analyse them are so large that they have to be shared and taken out of their original information systems and sent to supercomputers. “This data sharing creates an additional risk of leakage or disclosure,” warns the researcher. “PrivGen’s partners are currently working to find a technical solution to secure the treatments, in particular to prevent patient identification”.

Towards the launch of a chaire (French research consortium)

An industrial chaire called Cybaile, dedicated to cybersecurity for trusted artificial intelligence in health, will also be launched next fall. LaTIM will partner with three other organizations: Thales group, Sophia Genetics and the start-up Aiintense, a specialist in neuroscience data. With the support of Inserm, and with the backing of the Regional Council of Brittany, it will focus in particular on securing the learning of AI models in health, in order to help with decision-making – screening, diagnoses, and treatment advice. “If we have a large amount of data, and therefore representations of the disease, we can use AI to detect signs of anomalies and set up decision support systems,” says Gouenou Coatrieux. “In ophthalmology, for example, we rely on a large quantity of images of the back of the eye to identify or detect pathologies and treat them better.

données de santé, health data

Speaking the language of health data to improve its use

The world of healthcare has extensive databases that are just waiting to be used. This is one of the issues Benjamin Dalmas, a data science researcher at Mines Saint-Étienne, is exploring in his work. His main objective is to understand the origin of this data to use it more effectively. As such, he is working with players from the public and private sectors for analysis and predictive purposes in order to improve management of health care institutions and our understanding of care pathways.

Research has made great strides in processing methods using machine learning. But what do we really know about the information that such methods use? Benjamin Dalmas is a health data science researcher at Mines Saint-Étienne. The central focus of his work is understanding health data, from its creation to its storage. What does this data include? Information such as the time of a patient’s arrival and discharge, exams carried out, practitioners consulted etc. This data is typically used for administrative and financial purposes.

Benjamin Dalmas’s research involves identifying and finding a straightforward way to present relevant information to respond to the concrete needs of public and private healthcare stakeholders. How can the number of beds in a hospital ward be optimized? Is it possible to predict the flow of arrivals in an emergency room? The responses to these problems rely on the same information: the medical administrative data produced every day by hospitals to monitor their patient pathways.

However, depending on the way in which it is considered, the same data can provide different information. It is the key witness to several investigations. So it must be approached in the right way to get answers.

Understanding data in order to prevent bias

Since it is primarily generated by humans, health data may be incorrect or biased. By focusing on its creation, researchers seek to identify the earliest potential bias. Benjamin Dalmas is working with Saint-Étienne University Hospital Center to study the codes assigned by the hospital upon a patient’s discharge. These codes summarize the reason for which the individual came to the hospital and received care. Doctors who specialize in this coding generate up to 16,000 different codes, a tedious task, for which the hospital wishes to seek assistance from a decision support tool to limit errors. “That means we must understand how humans code. By analyzing large quantities of data, we identify recurring errors and where they come from, and we can solve them,” explains Benjamin Dalmas. Greater accuracy means direct economic benefits for the institution.

However, this mass-produced data is increasingly used for other purposes than reimbursing hospitals. For the researcher, it is important to keep in mind that the data was not created for these new analyses. For example, he has noticed that such a straightforward notion as time may hide a number of different realities. When a consultation time is specified, it may mean one of three things: the actual time of consultation, the time at which the information was integrated in the file, or a time assigned by default. Since the primary objective of this information is administrative, the consultation time does not have a lot of importance. “If we don’t take the time to study this information, we run the risk of making biased recommendations that are not valid. Good tools cannot be created without understanding the data that fuels them,” says the researcher. Without this information, for example, a study focusing on whether or not social inequalities exist and taking into account how long a patient must wait before receiving care, could draw incorrect conclusions.

From reactive to proactive

So researchers must understand the data, but for what purpose? To predict, in order to anticipate, rather than just react. The development of predictive tools is the focus of a collaboration between Mines Saint-Étienne researchers and the company Move in Med. The goal is to anticipate the coordination of care pathways for breast cancer patients. In the case of chronic diseases such as cancer, the patient pathway is not limited to the hospital but also depends on a patient’s family, associations etc. To this end, the researchers are cross-referencing medical data with other social information (age, marital status, socio-economic background, place of residence etc.). Their aim is to identify unexpected factors, in the same way in which the weather, air quality and the even the occurrence of cultural events impact periods of peak arrival in emergency rooms. Predicting the complexity of a care pathway allows the company to allocate the appropriate resources and therefore ensure better care.

At the same time, the Auvergne Rhône-Alpes Regional Health Agency has been working with the researchers since May 2020 to predict hospital capacity strain levels for Covid arrivals. By reporting visual data based on systems of colors and arrows, the researchers provide information about changing dynamics and levels of hospital capacity strain in the region (Covid patient arrivals, positive PCR tests in the region, number of available beds etc.) In this work, researchers are tackling monitoring trends. How are these parameters evolving over time? At what threshold values do they alert the authorities that the situation is getting worse? To answer these questions, the research team provides maps and projections that the health agency can use to anticipate saturation and therefore prevent institutions from becoming overwhelmed, arrange for patients to be transferred etc.

Finding the right balance between volume and representativeness

The study of data raises questions about volume and representativeness, which depend on the user’s request. Proving without equipping oneself requires more data in order to fuel machine learning algorithms. “However, recovering public health data is quite an ordeal. We have to follow protocols that are highly regulated by the CNIL (the French Data Protection Authority) and ethics committees to justify the volume of data requested,” explains Benjamin Dalmas. On the other hand, a request for operational tools must be able to adapt to the on-the-ground realities faced by practitioners. That means working with limited amounts of information. It is a matter of finding the right balance.  

The Mines Saint-Étienne researchers are working with the Saint-Étienne-based company MJ INNOV on these aspects. The company offers an interactive facilitation tool to improve quality of life for individuals with cognitive impairments. Based on videos and sounds recorded during the stages of play, this research seeks to identify the impact of the practice on various subjects (nursing home residents, persons with Alzheimer’s disease etc.). In addition to using the information contained in residents’ files, this involves collecting a limited quantity of new information. “In an ideal world, we would have 360° images and perfect sound coverage. But in practice, to avoid disturbing the game, we have to plan on placing microphones under the table the patients are playing on, or fitting the camera directly within the inside of the table. Working with these constraints makes our analysis even more interesting,” says Benjamin Dalmas.

Measuring the impact of healthcare decision support tools

In the best-case scenario, researchers successfully create a decision support tool that is accessible online. But is the tool always adopted by the interested parties? “There are very few studies on the ergonomics of tools delivered to users and therefore on their impact and actual use,” says Benjamin Dalmas. Yet, this is a crucial question in his opinion, if we seek to improve data science research in such a concrete area of application as healthcare.  

To this end, an appropriate solution often means simplicity. First of all, by being easy-to-read: color schemes, shapes, arrows etc. Visualization and interpretation of data must be intuitive. Second, by promoting explainability of results. One of the drawbacks of machine learning is that the information provided seems to come from a black box. “Research efforts must now focus on the presentation of results, by enhancing communication between researchers and users,” concludes Benjamin Dalmas.

By Anaïs Culot

Read more on I’MTech: When AI helps predict a patient’s care pathway