Speaking the language of health data to improve its use

The world of healthcare has extensive databases that are just waiting to be used. This is one of the issues Benjamin Dalmas, a data science researcher at Mines Saint-Étienne, is exploring in his work. His main objective is to understand the origin of this data to use it more effectively. As such, he is working with players from the public and private sectors for analysis and predictive purposes in order to improve management of health care institutions and our understanding of care pathways.

Research has made great strides in processing methods using machine learning. But what do we really know about the information that such methods use? Benjamin Dalmas is a health data science researcher at Mines Saint-Étienne. The central focus of his work is understanding health data, from its creation to its storage. What does this data include? Information such as the time of a patient’s arrival and discharge, exams carried out, practitioners consulted etc. This data is typically used for administrative and financial purposes.

Benjamin Dalmas’s research involves identifying and finding a straightforward way to present relevant information to respond to the concrete needs of public and private healthcare stakeholders. How can the number of beds in a hospital ward be optimized? Is it possible to predict the flow of arrivals in an emergency room? The responses to these problems rely on the same information: the medical administrative data produced every day by hospitals to monitor their patient pathways.

However, depending on the way in which it is considered, the same data can provide different information. It is the key witness to several investigations. So it must be approached in the right way to get answers.

Understanding data in order to prevent bias

Since it is primarily generated by humans, health data may be incorrect or biased. By focusing on its creation, researchers seek to identify the earliest potential bias. Benjamin Dalmas is working with Saint-Étienne University Hospital Center to study the codes assigned by the hospital upon a patient’s discharge. These codes summarize the reason for which the individual came to the hospital and received care. Doctors who specialize in this coding generate up to 16,000 different codes, a tedious task, for which the hospital wishes to seek assistance from a decision support tool to limit errors. “That means we must understand how humans code. By analyzing large quantities of data, we identify recurring errors and where they come from, and we can solve them,” explains Benjamin Dalmas. Greater accuracy means direct economic benefits for the institution.

However, this mass-produced data is increasingly used for other purposes than reimbursing hospitals. For the researcher, it is important to keep in mind that the data was not created for these new analyses. For example, he has noticed that such a straightforward notion as time may hide a number of different realities. When a consultation time is specified, it may mean one of three things: the actual time of consultation, the time at which the information was integrated in the file, or a time assigned by default. Since the primary objective of this information is administrative, the consultation time does not have a lot of importance. “If we don’t take the time to study this information, we run the risk of making biased recommendations that are not valid. Good tools cannot be created without understanding the data that fuels them,” says the researcher. Without this information, for example, a study focusing on whether or not social inequalities exist and taking into account how long a patient must wait before receiving care, could draw incorrect conclusions.

From reactive to proactive

So researchers must understand the data, but for what purpose? To predict, in order to anticipate, rather than just react. The development of predictive tools is the focus of a collaboration between Mines Saint-Étienne researchers and the company Move in Med. The goal is to anticipate the coordination of care pathways for breast cancer patients. In the case of chronic diseases such as cancer, the patient pathway is not limited to the hospital but also depends on a patient’s family, associations etc. To this end, the researchers are cross-referencing medical data with other social information (age, marital status, socio-economic background, place of residence etc.). Their aim is to identify unexpected factors, in the same way in which the weather, air quality and the even the occurrence of cultural events impact periods of peak arrival in emergency rooms. Predicting the complexity of a care pathway allows the company to allocate the appropriate resources and therefore ensure better care.

At the same time, the Auvergne Rhône-Alpes Regional Health Agency has been working with the researchers since May 2020 to predict hospital capacity strain levels for Covid arrivals. By reporting visual data based on systems of colors and arrows, the researchers provide information about changing dynamics and levels of hospital capacity strain in the region (Covid patient arrivals, positive PCR tests in the region, number of available beds etc.) In this work, researchers are tackling monitoring trends. How are these parameters evolving over time? At what threshold values do they alert the authorities that the situation is getting worse? To answer these questions, the research team provides maps and projections that the health agency can use to anticipate saturation and therefore prevent institutions from becoming overwhelmed, arrange for patients to be transferred etc.

Finding the right balance between volume and representativeness

The study of data raises questions about volume and representativeness, which depend on the user’s request. Proving without equipping oneself requires more data in order to fuel machine learning algorithms. “However, recovering public health data is quite an ordeal. We have to follow protocols that are highly regulated by the CNIL (the French Data Protection Authority) and ethics committees to justify the volume of data requested,” explains Benjamin Dalmas. On the other hand, a request for operational tools must be able to adapt to the on-the-ground realities faced by practitioners. That means working with limited amounts of information. It is a matter of finding the right balance.

The Mines Saint-Étienne researchers are working with the Saint-Étienne-based company MJ INNOV on these aspects. The company offers an interactive facilitation tool to improve quality of life for individuals with cognitive impairments. Based on videos and sounds recorded during the stages of play, this research seeks to identify the impact of the practice on various subjects (nursing home residents, persons with Alzheimer’s disease etc.). In addition to using the information contained in residents’ files, this involves collecting a limited quantity of new information. “In an ideal world, we would have 360° images and perfect sound coverage. But in practice, to avoid disturbing the game, we have to plan on placing microphones under the table the patients are playing on, or fitting the camera directly within the inside of the table. Working with these constraints makes our analysis even more interesting,” says Benjamin Dalmas.

Measuring the impact of healthcare decision support tools

In the best-case scenario, researchers successfully create a decision support tool that is accessible online. But is the tool always adopted by the interested parties? “There are very few studies on the ergonomics of tools delivered to users and therefore on their impact and actual use,” says Benjamin Dalmas. Yet, this is a crucial question in his opinion, if we seek to improve data science research in such a concrete area of application as healthcare.

To this end, an appropriate solution often means simplicity. First of all, by being easy-to-read: color schemes, shapes, arrows etc. Visualization and interpretation of data must be intuitive. Second, by promoting explainability of results. One of the drawbacks of machine learning is that the information provided seems to come from a black box. “Research efforts must now focus on the presentation of results, by enhancing communication between researchers and users,” concludes Benjamin Dalmas.

By Anaïs Culot