Posts

Audio and machine learning: Gaël Richard’s award-winning project

Gaël Richard, a researcher in Information Processing at Télécom Paris, has been awarded an Advanced Grant from the European Research Council (ERC) for his project entitled HI-Audio. This initiative aims to develop hybrid approaches that combine signal processing with deep machine learning for the purpose of understanding and analyzing sound.

Artificial intelligence now relies heavily on deep neural networks, which have a major shortcoming: they require very large databases for learning,” says Gaël Richard, a researcher in Information Processing at Télécom Paris. He believes that “using signal models, or physical sound propagation models, in a deep learning algorithm would reduce the amount of data needed for learning while still allowing for the high controllability of the algorithm.” Gaël Richard plans to pursue this breakthrough via his HI-Audio* project, which won an ERC Advanced Grant on April 26, 2022

For example, the integration of physical sound propagation models can improve the characterization and configuration of the types of sound analyzed and help to develop an automatic sound recognition system. “The applications for the methods developed in this project focus on the analysis of music signals and the recognition of sound scenes, which is the identification of the recording’s sound environment (outside, inside, airport) and all the sound sources present,” Gaël Richard explains.

Industrial applications

Learning sound scenes could help autonomous cars identify their surroundings. The algorithm would be able to identify the surrounding sounds using microphones. The vehicle would be able to recognize the sound of a siren and its variations in sound intensity. Autonomous cars would then be able to change lanes to let an ambulance or fire engine pass, without having to “see” it in the detection cameras. The processes developed in the HI-Audio project could be applied to many other areas. The algorithms could be used in predictive maintenance to control the quality of parts in a production line. A car part, such as a bumper, is typically controlled based on the sound resonance generated when a non-destructive impact is applied.

The other key applications for the HI-Audio project are in the field of AI for music, particularly to assist musical creation by developing new interpretable methods for sound synthesis and transformation.

Machine learning and music

One of the goals of this project is to build a database of music recordings from a wide variety of styles and different cultures,” Gaël Richard explains. “This database, which will be automatically annotated (with precise semantic information), will expand the research to include less studied or less distributed music, especially from audio streaming platforms,” he says. One of the challenges of this project is that of developing algorithms capable of recognizing the words and phrases spoken by the performers, retranscribing the music regardless of its recording location, and contributing new musical transformation capabilities (style transfer, rhythmic transformation, word changes).

One important aspect of the project will also be the separation of sound sources,” Gaël Richard says. In an audio file, the separation of sources, which in the case of music are each linked to a different instrument, is generally achieved via filtering or “masking”. The idea is to hide all other sources until only the target source remains. One less common approach is to isolate the instrument via sound synthesis. This involves analyzing the music to characterize the sound source to be extracted in order to reproduce it. For Gaël Richard, “the advantage is that, in principle, artifacts from other sources are entirely absentIn addition, the synthesized source can be controlled by a few interpretable parameters, such as the fundamental frequency, which is directly related to the sound’s perceived pitch,” he says. “This type of approach opens up tremendous opportunities for sound manipulation and transformation, with real potential for developing new tools to assist music creation,” says Gaël Richard.

*HI-Audio will start on October 1st, 2022 and will be funded by the ERC Advanced Grant for five years for a total amount of €2.48 million.

Rémy Fauvel

données de santé, health data

Speaking the language of health data to improve its use

The world of healthcare has extensive databases that are just waiting to be used. This is one of the issues Benjamin Dalmas, a data science researcher at Mines Saint-Étienne, is exploring in his work. His main objective is to understand the origin of this data to use it more effectively. As such, he is working with players from the public and private sectors for analysis and predictive purposes in order to improve management of health care institutions and our understanding of care pathways.

Research has made great strides in processing methods using machine learning. But what do we really know about the information that such methods use? Benjamin Dalmas is a health data science researcher at Mines Saint-Étienne. The central focus of his work is understanding health data, from its creation to its storage. What does this data include? Information such as the time of a patient’s arrival and discharge, exams carried out, practitioners consulted etc. This data is typically used for administrative and financial purposes.

Benjamin Dalmas’s research involves identifying and finding a straightforward way to present relevant information to respond to the concrete needs of public and private healthcare stakeholders. How can the number of beds in a hospital ward be optimized? Is it possible to predict the flow of arrivals in an emergency room? The responses to these problems rely on the same information: the medical administrative data produced every day by hospitals to monitor their patient pathways.

However, depending on the way in which it is considered, the same data can provide different information. It is the key witness to several investigations. So it must be approached in the right way to get answers.

Understanding data in order to prevent bias

Since it is primarily generated by humans, health data may be incorrect or biased. By focusing on its creation, researchers seek to identify the earliest potential bias. Benjamin Dalmas is working with Saint-Étienne University Hospital Center to study the codes assigned by the hospital upon a patient’s discharge. These codes summarize the reason for which the individual came to the hospital and received care. Doctors who specialize in this coding generate up to 16,000 different codes, a tedious task, for which the hospital wishes to seek assistance from a decision support tool to limit errors. “That means we must understand how humans code. By analyzing large quantities of data, we identify recurring errors and where they come from, and we can solve them,” explains Benjamin Dalmas. Greater accuracy means direct economic benefits for the institution.

However, this mass-produced data is increasingly used for other purposes than reimbursing hospitals. For the researcher, it is important to keep in mind that the data was not created for these new analyses. For example, he has noticed that such a straightforward notion as time may hide a number of different realities. When a consultation time is specified, it may mean one of three things: the actual time of consultation, the time at which the information was integrated in the file, or a time assigned by default. Since the primary objective of this information is administrative, the consultation time does not have a lot of importance. “If we don’t take the time to study this information, we run the risk of making biased recommendations that are not valid. Good tools cannot be created without understanding the data that fuels them,” says the researcher. Without this information, for example, a study focusing on whether or not social inequalities exist and taking into account how long a patient must wait before receiving care, could draw incorrect conclusions.

From reactive to proactive

So researchers must understand the data, but for what purpose? To predict, in order to anticipate, rather than just react. The development of predictive tools is the focus of a collaboration between Mines Saint-Étienne researchers and the company Move in Med. The goal is to anticipate the coordination of care pathways for breast cancer patients. In the case of chronic diseases such as cancer, the patient pathway is not limited to the hospital but also depends on a patient’s family, associations etc. To this end, the researchers are cross-referencing medical data with other social information (age, marital status, socio-economic background, place of residence etc.). Their aim is to identify unexpected factors, in the same way in which the weather, air quality and the even the occurrence of cultural events impact periods of peak arrival in emergency rooms. Predicting the complexity of a care pathway allows the company to allocate the appropriate resources and therefore ensure better care.

At the same time, the Auvergne Rhône-Alpes Regional Health Agency has been working with the researchers since May 2020 to predict hospital capacity strain levels for Covid arrivals. By reporting visual data based on systems of colors and arrows, the researchers provide information about changing dynamics and levels of hospital capacity strain in the region (Covid patient arrivals, positive PCR tests in the region, number of available beds etc.) In this work, researchers are tackling monitoring trends. How are these parameters evolving over time? At what threshold values do they alert the authorities that the situation is getting worse? To answer these questions, the research team provides maps and projections that the health agency can use to anticipate saturation and therefore prevent institutions from becoming overwhelmed, arrange for patients to be transferred etc.

Finding the right balance between volume and representativeness

The study of data raises questions about volume and representativeness, which depend on the user’s request. Proving without equipping oneself requires more data in order to fuel machine learning algorithms. “However, recovering public health data is quite an ordeal. We have to follow protocols that are highly regulated by the CNIL (the French Data Protection Authority) and ethics committees to justify the volume of data requested,” explains Benjamin Dalmas. On the other hand, a request for operational tools must be able to adapt to the on-the-ground realities faced by practitioners. That means working with limited amounts of information. It is a matter of finding the right balance.  

The Mines Saint-Étienne researchers are working with the Saint-Étienne-based company MJ INNOV on these aspects. The company offers an interactive facilitation tool to improve quality of life for individuals with cognitive impairments. Based on videos and sounds recorded during the stages of play, this research seeks to identify the impact of the practice on various subjects (nursing home residents, persons with Alzheimer’s disease etc.). In addition to using the information contained in residents’ files, this involves collecting a limited quantity of new information. “In an ideal world, we would have 360° images and perfect sound coverage. But in practice, to avoid disturbing the game, we have to plan on placing microphones under the table the patients are playing on, or fitting the camera directly within the inside of the table. Working with these constraints makes our analysis even more interesting,” says Benjamin Dalmas.

Measuring the impact of healthcare decision support tools

In the best-case scenario, researchers successfully create a decision support tool that is accessible online. But is the tool always adopted by the interested parties? “There are very few studies on the ergonomics of tools delivered to users and therefore on their impact and actual use,” says Benjamin Dalmas. Yet, this is a crucial question in his opinion, if we seek to improve data science research in such a concrete area of application as healthcare.  

To this end, an appropriate solution often means simplicity. First of all, by being easy-to-read: color schemes, shapes, arrows etc. Visualization and interpretation of data must be intuitive. Second, by promoting explainability of results. One of the drawbacks of machine learning is that the information provided seems to come from a black box. “Research efforts must now focus on the presentation of results, by enhancing communication between researchers and users,” concludes Benjamin Dalmas.

By Anaïs Culot

Read more on I’MTech: When AI helps predict a patient’s care pathway

Comprendre informations du langage, algorithms

Making algorithms understand what we are talking about

Human language contains different types of information. We understand it all unconsciously, but explaining it systematically is much more difficult. The same is true for machines. The NoRDF Project Chair “Modeling and Extracting Complex Information from Natural Language Text” seeks to solve this problem: how can we teach algorithms to model and extract complex information from language? Fabian Suchaneck and Chloé Clavel, both researchers at Telecom Paris, explain the approaches of this new project

What aspects of language are involved in making machines understand?

Fabian Suchaneck: We need to make them understand more complicated natural language texts. Current systems can understand simple statements. For example, the sentence: “A vaccine against Covid-19 has been developed” is simple enough to be understood by algorithms. On the other hand, they cannot understand sentences that go beyond a single statement, such as: “If the vaccine is distributed, the Covid-19 epidemic will end in 2021. In this case, the machine does not understand that the condition required for the Covid-19 epidemic to end in 2021 is that the vaccine is distributed. We also need to make machines understand what emotions and feelings are associated with language; this is Chloé Clavel’s specialist area.

What are the preferred approaches in making algorithms understand natural language?

FS: We are developing “neurosymbolic” approaches, which seek to combine symbolic approaches with deep learning approaches. Symbolic approaches use human-implemented logical rules that simulate human reasoning. For the type of data we process, it is fundamental to be able to interpret what has been understood by the machine afterwards. Deep learning is a type of automatic learning where the machine is able to learn by itself. This allows for greater flexibility in handling variable data and the ability to integrate more layers of reasoning.

Where does the data you analyze come from?

FS: We can collect data when humans interact with chatbots from a company and especially those from the project’s partner companies. We can extract data from comments on web pages, forums and social networks.

Chloé Clavel: We can also extract information about feelings, emotions, social attitudes, especially in dialogues between humans or humans with machines.

Read on I’MTech: Robots teaching assistants

What are the main difficulties for the machine in learning to process language?

CC: We have to create models that are robust in changing contexts and situations. For example, there may be language variability in the expression of feelings from one individual to another, meaning that the same feelings may be expressed in very different words depending on the person. There is also a variability of contexts to be taken into account. For example, when humans interact with a virtual agent, they will not behave in the same way as with a human, so it is difficult to compare data from these different sources of interactions. Yet, if we want to move towards more fluid and natural human-agent interactions, we must draw inspiration from the interactions between humans.

How do you know whether the machine is correctly analyzing the emotions associated with a statement?

CC: The majority of the methods we use are supervised. The data entered into the models are annotated in the most objective way possible by humans. The goal is to ask several annotators to annotate the emotion they perceive in a text, as the perception of an emotion can be very subjective. The model is then taught about the data for which a consensus among the annotators could be found. When testing the performance of the model, when we inject an annotated text into a model that has been trained with similar texts, we can see if the annotation it produces is close to those determined by humans.

Since the annotation of emotions is particularly subjective, it is important to determine how the model actually understood the emotions and feelings present in the text. There are many biases in the representativeness of the data that can interfere with the model and mislead us on the interpretation made by the machine. For example, if we assume that younger people are angrier than older people in our data and that these two categories do not express themselves in the same way, then it is possible that the model may end up simply detecting the age of the individuals and not the anger associated with the comments.

Is it possible that the algorithms end up adapting their speech according to perceived emotions?

CC: Research is being conducted on this aspect. Chatbots’ algorithms must be relevant in solving the problems they are asked to solve, but they must also be able to provide a socially relevant response (e.g. to the user’s frustration or dissatisfaction). These developments will improve a range of applications, from customer relations to educational or support robots.

What contemporary social issues are associated with the understanding of human language by machines?

FS: This would notably allow a better understanding of the perception of news on social media by humans, the functioning of fake news, and therefore in general which social group is sensitive to which type of discourse and why. The underlying reasons why different individuals adhere to different types of discourse are still poorly understood today. In addition to the emotional aspect, there are different ways of thinking that are built in argumentative bubbles that do not communicate with each other.

In order to be able to automate the understanding of human language and exploit the numerous data associated with it, it is therefore important to take as many dimensions into account as possible, such as the purely logical aspect of what is said in sentences and the analysis of the emotions and feelings that accompany them.

By Antonin Counillon

How to better track cyber hate: AI to the rescue

The widescale use of social media, sometimes under cover of anonymity, has liberated speech and led to a proliferation of ideas, discussions and opinions on the internet. It has also led to a flood of hateful, sexist, racist and abusive speech. Confronted with this phenomenon, more and more platforms today are using automated solutions to combat cyber hate. These solutions are based on algorithms that can also introduce biases, sometimes discriminating against certain communities, and are still largely perfectible. In this context, French researchers are developing ever more efficient new models to detect hate speech and reduce the bias.

On September 16 this year, internet users launched a movement calling for a one-day boycott of Instagram. Supported by many American celebrities, the “Stop Hate for Profit” day aimed to challenge Facebook, the mother company of the photo and video sharing app, on the proliferation of hate, propaganda and misinformation on its platforms. Back in May 2019, during its bi-annual report on the state of moderation on its network, Facebook announced significant progress in the automated detection of hate content. According to the company, between January and April 2019, more than 65% of these messages were detected and moderated before users even reported them, compared with 38% during the same period in 2018.

Strongly encouraged to combat online hate content, in particular by the “Avia law” (named after the member of parliament for Paris, Lætitia Avia), platforms use various techniques such as detection by keywords, reporting by users and solutions based on artificial intelligence (AI). Machine learning allows predictive models to be developed from corpora of data. This is where biases can be damaging. “We realized that the automated tools themselves had biases against gender or the user’s identity and, most importantly, had a disproportionately negative impact on certain minority groups such as Afro-Americans,” explains Marzieh Mozafari, PhD student at Télécom SudParis. On Twitter, for example, it is difficult for AI-based programs to take into account the social context of tweets, the identity and dialect of the speaker and the immediate context of the tweet all at once. Some content is thus removed despite being neither hateful nor offensive.

So how can we minimize these biases and erroneous detections without creating a form of censorship? Researchers at Télécom SudParis have been using a public dataset collected on Twitter, distinguishing between tweets written in Afro-American English (AAE) and Standard American English (SAE), as well as two reference databases that have been annotated (sexist, racist, hateful and offensive) by experts and through crowdsourcing. “In this study, due to the lack of data, we mainly relied on cutting-edge language processing techniques such as transfer learning and the BERT language model, a pre-trained, unsupervised model”, explain the researchers.

Developed by Google, the BERT (Bidirectional Encoder Representations from Transformers) model uses a vast corpus of textual content, containing, among other things, the entire content of the English version of Wikipedia. “We were able to “customize” BERT [1] to make it do a specific task, to adjust it for our hateful and offensive corpus”, explains Reza Farahbakhsh, a researcher in data science at Télécom SudParis. To begin with, they tried to identify word sequences in their datasets that were strongly correlated with a hateful or offensive category. Their results showed that tweets written in AAE were almost 10 times more likely to be classed as racist, sexist, hateful or offensive compared to tweets written in SAE. “We therefore used a reweighting mechanism to mitigate biases based on data and algorithms,” says Marzieh Mozafari. For example, the number of tweets containing “n*gga” and “b*tch” is 35 times higher among tweeters in AAE than in SAE and these tweets will often be wrongly identified as racist or sexist. However, this type of word is common in AAE dialects and is used in everyday conversation. It is therefore likely that they will be considered hateful or offensive when they are written in SAE by an associated group.

In fact, these biases are also cultural: certain expressions considered hateful or offensive are not so within a certain community or in a certain context. In French, too, we use certain bird names to address our loved ones! Platforms are faced with a sort of dilemma: if the aim is to perfectly identify all hateful content, too great a number of false detections could have an impact on users’ “natural” ways of expressing themselves,” explains Noël Crespi, a researcher at Télécom SudParis. After reducing the effect of the most frequently used words in the training data through the reweighting mechanism, this probability of false positives was greatly reduced. “Finally, we transmitted these results to the pre-trained BERT model to refine it even further using new datasets,” says the researcher.

Can automatic detection be scaled up?

Despite these promising results, many problems still need to be solved in order to better detect hate speech. These include the possibility of deploying these automated tools for all languages spoken on social networks. This issue is the subject of a data science challenge launched for the second consecutive year: the HASOC (Hate Speech and Offensive Content Identification in Indo-European Languages), in which a team from IMT Mines d’Alès is participating. “The challenge aims to accomplish three tasks: determine whether or not content is hateful or offensive, classify this content into one of three categories: hateful, offensive or obscene, and identify whether the insult is directed towards an individual or a specific group,” explains Sébastien Harispe, a researcher at IMT Mines Alès.

We are mainly focusing on the first three tasks. Using our expertise in natural language processing, we have proposed a method of analysis based on supervised machine learning techniques that take advantage of examples and counter-examples of classes to be distinguished.” In this case, the researchers’ work focuses on small datasets in English, German and Hindi. In particular, the team is studying the role of emojis, some of which can have direct connotations with hate expressions. The researchers have also studied the adaptation of various standard approaches in automatic language processing in order to obtain classifiers able to efficiently exploit such markers.

They have also measured their classifiers’ ability to capture these markers, in particular through their performance. “In English, for example, our model was able to correctly classify content in 78% of cases, whereas only 77% of human annotators initially agreed on the annotation to be given to the content of the data set used,” explains Sébastien Harispe. Indeed, in 23% of cases, the annotators expressed divergent opinions when confronted with dubious content that probably needed to have been studied with account taken of the contextual elements.

What can we expect from AI? The researcher believes we are faced with a complex question: what are we willing to accept in the use of this type of technology? “Although remarkable progress has been made in almost a decade of data science, we have to admit that we are addressing a young discipline in which much remains to be developed from a theoretical point of view and, especially, for which we must accompany the applications in order to allow ethical and informed uses. Nevertheless, I believe that in terms of the detection of hate speech, there is a sort of glass roof created by the difficulty of the task as it is translated in our current datasets. With regard to this particular aspect, there can be no perfect or flawless system if we ourselves cannot be perfect.

Besides the multilingual challenge, the researchers are facing other obstacles such as the availability of data for model training and the evaluation of results, or the difficulty in assessing the ambiguity of certain content, due for example to variations in writing style. Finally, the very characterization of hate speech, subjective as it is, is also a challenge. “Our work can provide material for the humanities and social sciences, which are beginning to address these questions: why, when, who, what content? What role does culture play in this phenomenon? The spread of cyber hate is, at the end of the day, less of a technical problem than a societal one” says Reza Farahbakhsh.

[1] M. Mozafari, R. Farahbakhsh, N. Crespi, “Hate Speech Detection and Racial Bias Mitigation in Social Media based on BERT model”, PLoS ONE 15(8): e0237861. https://doi.org/10.1371/journal.pone.0237861

Anne-Sophie Boutaud

Also read on I’MTech