Gaël richard

Gaël Richard, IMT-Académie des sciences Grand Prix

Speech synthesis, sound separation, automatic recognition of instruments or voices… Gaël Richard‘s research at Télécom Paris has always focused on audio signal processing. The researcher has created numerous acoustic signal analysis methods, thanks to which he has made important contributions to his discipline. These methods are currently used in various applications for the automotive and music industries. His contributions to the academic community and technology transfer have earned him the 2020 IMT-Académie des sciences Grand Prix

Your early research work in the 1990s focused on speech synthesis: why did you choose this discipline?

Gaël Richard: I didn’t initially intend to become a researcher; I wanted to be a professional musician. After my baccalaureate I focused on classical music before finally returning to scientific study. I then oriented my studies toward applied mathematics, particularly audio signal processing. During my Master’s internship and then my PhD, I began to work on speech and singing voice synthesis. In the early 1990s, the first perfectly intelligible text-to-speech systems had just been developed. The aim at the time was to achieve a better sound quality and naturalness and to produce synthetic voices with more character and greater variability.

What research have you done on speech synthesis?

GR: To start with, I worked on synthesis based on signal processing approaches. The voice is considered as being produced by a source – the vocal cords – which passes through a filter – the throat and the nose. The aim is to represent the vocal signal using the parameters of this model to either modify a recorded signal or generate a new one by synthesis. I also explored physical modeling synthesis for a short while. This approach consists in representing voice production through a physical model: vocal cords are springs that the air pressure acts on. We then use fluid mechanics principles to model the air flow through the vocal tract to the lips.

What challenges are you working on in speech synthesis research today?

GR: I have gradually extended the scope of my research to include subjects other than speech synthesis, although I continue to do some work on it. For example, I am currently supervising a PhD student who is trying to understand how to adapt a voice to make it more intelligible in a noisy environment. We are naturally able to adjust our voice in order to be better understood when surrounded by noise. The aim of his thesis, which he is carrying out with the PSA Group, is to change the voice of a radio, navigation assistant (GPS) or telephone, initially pronounced in a silent environment, so that it is more intelligible in a moving car, but without amplifying it.

As part of your work on audio signal analysis, you developed different approaches to signal decomposition, in particular those based on “non-negative matrix factorization”. It was one of the greatest achievements of your research career, could you tell us what’s behind this complex term?

GR: The additive approach, which consists in gradually adding the elementary components of the audio signal, is a time-honored method. In the case of speech synthesis, it means adding simple waveforms – sinusoids – to create complex or rich signals. To decompose a signal that we want to study, such as a natural singing voice, we can logically proceed the opposite way, by taking the starting signal and describing it as a sum of elementary components. We then have to say which component is activated and at what moment to recreate the signal in time.

The method of non-negative matrix factorization allows us to obtain such a decomposition in the form of the multiplication of two matrices: one matrix represents a dictionary of the elementary components of the signal, and the other matrix represents the activation of the dictionary elements over time. When combined, these two matrices make it possible to describe the audio signal in mathematical form. “Non-negative” simply means that each element in these matrices is positive, or that each source or component contributes positively to the signal.

Why is this signal decomposition approach so interesting?

GR: This decomposition is very efficient for introducing initial knowledge into the decomposition. For example, if we know that there is a violin, we can introduce this knowledge into the dictionary by specifying that some of the elementary atoms of the signal will be characteristic of the violin. This makes it possible to refine the description of the rest of the signal. It is a clever description because it is simple in its approach and handling as well as being useful for working efficiently on the decomposed signal.

This non-negative matrix factorization method has led you to subjects other than speech synthesis. What are its applications?

GR: One of the major applications of this technique is source separation. One of our first approaches was to extract the singing voice from polyphonic music recordings. The principle consists in saying that, for a given source, all the elementary components are activated at the same time, such as all the harmonics of a note played by an instrument, for example. To simplify, we can say that non-negative matrix factorization allows us to isolate each note played by a given instrument by representing them as a sum of elementary components (certain columns of the “dictionary” matrix) which are activated over time (certain lines of the “activation” matrix). At the end of the process, we obtain a mathematical description in which each source has its own dictionary of elementary sound atoms. We can then replay only the sequence of notes played by a specific instrument by reconstructing the signal by multiplying the non-negative matrices and setting to zero all note activations that do not correspond to the instrument we want to isolate.

What new prospects can be considered thanks to the precision of this description?

GR: Today, we are working on “informed” source separation which incorporates additional prior knowledge about the sources in the source separation process. I currently co-supervise a PhD student who is using the knowledge of lyrics to help the separation of the isolate singing voices. There are multiple applications: from automatic karaoke generation by removing the detected voice, to music and movie sound track remastering or transformation. I have another PhD student whose thesis is on isolating a singing voice using the simultaneously recorded electroencephalogram (EEG) signal. The idea is to ask a person to wear an EEG cap and focus their attention on one of the sound sources. We can then obtain information via the recorded brain activity and use it to improve the source separation.

Your work allows you to identify specific sound sources through audio signal processing… to the point of automatic recognition?

GR: We have indeed worked on automatic sound classification, first of all through tests on recognizing emotion, particularly fear or panic. The project was carried out with Thales to anticipate crowd movements. Besides detecting emotion, we wanted to measure the rise or fall in panic. However, there are very few sound datasets on this subject, which turned out to be a real challenge for this work. On another subject, we are currently working with Deezer on the automatic detection of content that is offensive or unsuitable for children, in order to propose a sort of parental filter service, for example. In another project on advertising videos with Creaminal, we are detecting key or culminating elements in terms of emotion in videos in order to automatically propose the most appropriate music at the right time.

On the subject of music, is your work used for automatic song detection, like the Shazam application?

GR: Shazam uses an algorithm based on a fingerprinting principle. When you activate it, the app records the audio fingerprint over a certain time. It then compares this fingerprint with the content of its database. Although very efficient, the system is limited to recognizing completely identical recordings. Our aim is to go further, by recognizing different versions of a song, such as live recordings or covers by other singers, when only the studio version is saved in the memory. We have filed a patent on a technology that allows us to go beyond the initial fingerprint algorithm, which is too limited for this kind of application. In particular, we are using a stage of automatic estimation of the harmonic content, or more precisely the sequences of musical chords. This patent is at the center of a start-up project.

Your research is closely linked to the industrial sector and has led to multiple technology transfers. But you also have made several freeware contributions for the wider community.

GR: One of the team’s biggest contributions in this field is the audio extraction software YAAFE. It’s one of my most cited articles and a tool that is regularly downloaded, despite the fact that it dates from 2010. In general, I am in favor of the reproducibility of research and I publish the algorithms of work carried out as often as possible. In any case, it is a major topic of the field of AI and data science, which are clearly following the rise of this discipline. We also make a point of publishing the databases created by our work. That is essential too, and it’s always satisfying to see that our databases have an important impact on the community.

How to better track cyber hate: AI to the rescue

The widescale use of social media, sometimes under cover of anonymity, has liberated speech and led to a proliferation of ideas, discussions and opinions on the internet. It has also led to a flood of hateful, sexist, racist and abusive speech. Confronted with this phenomenon, more and more platforms today are using automated solutions to combat cyber hate. These solutions are based on algorithms that can also introduce biases, sometimes discriminating against certain communities, and are still largely perfectible. In this context, French researchers are developing ever more efficient new models to detect hate speech and reduce the bias.

On September 16 this year, internet users launched a movement calling for a one-day boycott of Instagram. Supported by many American celebrities, the “Stop Hate for Profit” day aimed to challenge Facebook, the mother company of the photo and video sharing app, on the proliferation of hate, propaganda and misinformation on its platforms. Back in May 2019, during its bi-annual report on the state of moderation on its network, Facebook announced significant progress in the automated detection of hate content. According to the company, between January and April 2019, more than 65% of these messages were detected and moderated before users even reported them, compared with 38% during the same period in 2018.

Strongly encouraged to combat online hate content, in particular by the “Avia law” (named after the member of parliament for Paris, Lætitia Avia), platforms use various techniques such as detection by keywords, reporting by users and solutions based on artificial intelligence (AI). Machine learning allows predictive models to be developed from corpora of data. This is where biases can be damaging. “We realized that the automated tools themselves had biases against gender or the user’s identity and, most importantly, had a disproportionately negative impact on certain minority groups such as Afro-Americans,” explains Marzieh Mozafari, PhD student at Télécom SudParis. On Twitter, for example, it is difficult for AI-based programs to take into account the social context of tweets, the identity and dialect of the speaker and the immediate context of the tweet all at once. Some content is thus removed despite being neither hateful nor offensive.

So how can we minimize these biases and erroneous detections without creating a form of censorship? Researchers at Télécom SudParis have been using a public dataset collected on Twitter, distinguishing between tweets written in Afro-American English (AAE) and Standard American English (SAE), as well as two reference databases that have been annotated (sexist, racist, hateful and offensive) by experts and through crowdsourcing. “In this study, due to the lack of data, we mainly relied on cutting-edge language processing techniques such as transfer learning and the BERT language model, a pre-trained, unsupervised model”, explain the researchers.

Developed by Google, the BERT (Bidirectional Encoder Representations from Transformers) model uses a vast corpus of textual content, containing, among other things, the entire content of the English version of Wikipedia. “We were able to “customize” BERT [1] to make it do a specific task, to adjust it for our hateful and offensive corpus”, explains Reza Farahbakhsh, a researcher in data science at Télécom SudParis. To begin with, they tried to identify word sequences in their datasets that were strongly correlated with a hateful or offensive category. Their results showed that tweets written in AAE were almost 10 times more likely to be classed as racist, sexist, hateful or offensive compared to tweets written in SAE. “We therefore used a reweighting mechanism to mitigate biases based on data and algorithms,” says Marzieh Mozafari. For example, the number of tweets containing “n*gga” and “b*tch” is 35 times higher among tweeters in AAE than in SAE and these tweets will often be wrongly identified as racist or sexist. However, this type of word is common in AAE dialects and is used in everyday conversation. It is therefore likely that they will be considered hateful or offensive when they are written in SAE by an associated group.

In fact, these biases are also cultural: certain expressions considered hateful or offensive are not so within a certain community or in a certain context. In French, too, we use certain bird names to address our loved ones! Platforms are faced with a sort of dilemma: if the aim is to perfectly identify all hateful content, too great a number of false detections could have an impact on users’ “natural” ways of expressing themselves,” explains Noël Crespi, a researcher at Télécom SudParis. After reducing the effect of the most frequently used words in the training data through the reweighting mechanism, this probability of false positives was greatly reduced. “Finally, we transmitted these results to the pre-trained BERT model to refine it even further using new datasets,” says the researcher.

Can automatic detection be scaled up?

Despite these promising results, many problems still need to be solved in order to better detect hate speech. These include the possibility of deploying these automated tools for all languages spoken on social networks. This issue is the subject of a data science challenge launched for the second consecutive year: the HASOC (Hate Speech and Offensive Content Identification in Indo-European Languages), in which a team from IMT Mines d’Alès is participating. “The challenge aims to accomplish three tasks: determine whether or not content is hateful or offensive, classify this content into one of three categories: hateful, offensive or obscene, and identify whether the insult is directed towards an individual or a specific group,” explains Sébastien Harispe, a researcher at IMT Mines Alès.

We are mainly focusing on the first three tasks. Using our expertise in natural language processing, we have proposed a method of analysis based on supervised machine learning techniques that take advantage of examples and counter-examples of classes to be distinguished.” In this case, the researchers’ work focuses on small datasets in English, German and Hindi. In particular, the team is studying the role of emojis, some of which can have direct connotations with hate expressions. The researchers have also studied the adaptation of various standard approaches in automatic language processing in order to obtain classifiers able to efficiently exploit such markers.

They have also measured their classifiers’ ability to capture these markers, in particular through their performance. “In English, for example, our model was able to correctly classify content in 78% of cases, whereas only 77% of human annotators initially agreed on the annotation to be given to the content of the data set used,” explains Sébastien Harispe. Indeed, in 23% of cases, the annotators expressed divergent opinions when confronted with dubious content that probably needed to have been studied with account taken of the contextual elements.

What can we expect from AI? The researcher believes we are faced with a complex question: what are we willing to accept in the use of this type of technology? “Although remarkable progress has been made in almost a decade of data science, we have to admit that we are addressing a young discipline in which much remains to be developed from a theoretical point of view and, especially, for which we must accompany the applications in order to allow ethical and informed uses. Nevertheless, I believe that in terms of the detection of hate speech, there is a sort of glass roof created by the difficulty of the task as it is translated in our current datasets. With regard to this particular aspect, there can be no perfect or flawless system if we ourselves cannot be perfect.

Besides the multilingual challenge, the researchers are facing other obstacles such as the availability of data for model training and the evaluation of results, or the difficulty in assessing the ambiguity of certain content, due for example to variations in writing style. Finally, the very characterization of hate speech, subjective as it is, is also a challenge. “Our work can provide material for the humanities and social sciences, which are beginning to address these questions: why, when, who, what content? What role does culture play in this phenomenon? The spread of cyber hate is, at the end of the day, less of a technical problem than a societal one” says Reza Farahbakhsh.

[1] M. Mozafari, R. Farahbakhsh, N. Crespi, “Hate Speech Detection and Racial Bias Mitigation in Social Media based on BERT model”, PLoS ONE 15(8): e0237861. https://doi.org/10.1371/journal.pone.0237861

Anne-Sophie Boutaud

Also read on I’MTech

AI

AI for interoperable and autonomous industrial systems

At Mines Saint-Étienne, researchers Olivier Boissier, Maxime Lefrançois and Antoine Zimmermann are using AI to tackle the issue of interoperability, which is essential to the industry of the future. The standardization of information in the form of knowledge graphs has allowed them to enable communication between machines that speak different languages. They then operate this system via a network of autonomous distributed agents on each machine to automate a production line.

Taking a train from France to Spain without interoperability means having to get off at the border since the rails are not the same in both countries. A train that hopes to cross over from one rail line to another is sure to derail. The same problem is posed on factory floors – which is why the interoperability of production lines is a key issue for the industry of the future. In an interoperable system, machines can communicate with one another in order to work together automatically, even if they don’t speak the same language. But this is not easy to implement. Factory floors are marked by a kind of cacophony of computer languages. And every machine has its own properties: a multitude of manufacturers, different applications, diverse ways of sending, measuring and collecting information etc. Such heterogeneity reduces the flexibility of production lines. During the Covid-19 crisis, for example, many companies had to reconfigure all of their machines by hand to set up new production operations, such as manufacturing masks. “As of now, on factory floors everything is coded according to an ideal world. Systems are incapable of adapting to change,” says Maxime Lefrançois, a specialist in web semantics. Interoperability also goes hand in hand with competition. Without it, ensuring that a factory runs smoothly would require investing in a single brand of equipment to be certain the various parts are compatible.  

There is no single method for making a system interoperable. Along with his colleagues at Mines Saint-Étienne, the researcher is addressing the issue of interoperability using an approach based on representing data about the machines (manufacturer, connection method, application, physical environment etc.) in a standardized way, meaning independent of the language inherent to a machine. This knowledge is then used by what is known as a multi-agent software system. The goal is to automate a production process based on the description of each machine.

Describing machines to automate decision-making

What does the automation of an industrial system imply? Service delegation, primarily. For example, allowing a machine to place an order for raw materials when it detects a low stock level, instead of going through a human operator. For this, the researchers are developing mechanisms for accessing and exchanging information between machines using the web of things. “On the web, we can set up a communication interface between the various devices via standardized protocols. These methods of interaction therefore reduce the heterogeneity of the language of connected devices,” explains Antoine Zimmermann, an expert in knowledge representation at Mines Saint-Étienne. All of the modeled data from the factory floor is therefore accessible to and understood by all the machines involved.

More importantly, these resources may then be used to allow the machines to cooperate with one another. To this end, the Mines Saint-Étienne team has opted for a flexible approach with local decision-making. In other words, an information system called an autonomous agent is deployed on each device and is able to interact with the agents on other machines. This results in a 4.0 word-of mouth system without loss of information. “An autonomous agent decides what to do based on what the machines upstream and downstream of its position are doing. This reasoning software layer allows the connected device to adjust its behavior according to current status of the system,” says Olivier Boissier, who specializes in autonomous agent systems at Mines Saint-Étienne. For example, a machine can stop a potentially dangerous process when it detects information indicating that a device’s temperature is too high. Likewise, it would no longer be necessary to redesign the entire system to add a component, since it is automatically detected by the other machines.

Read more on I’MTech: A dictionary for connected devices

Depending on the circumstances of the factory floor, a machine may also connect to different production lines to perform other tasks. “We no longer code a machine’s specific action, but the objective it must achieve. The actions are deduced by each agent using the data it collects. It therefore contributes to fulfilling a general mission,” adds the researcher. In this approach, no single agent can achieve this objective alone as each one has a range of action limited to its machine and possesses only part of the knowledge about the overall line. The key to success it therefore cooperation. This makes it possible to transition from producing cups to bottles, simply by changing the objective of the line, without reprogramming it from A to Z.

Towards industrial experiments

Last summer, the IT’m Factory technological platform, a simulated industrial space at Mines Saint-Étienne, hosted a case study for an interoperable and cooperative distributed system. This production line starts out with a first machine responsible for retrieving a cup in a storage area and placing it on a conveyor. A filling system then fills the cup with a liquid. When this second machine has run out of product to pour, it places a remote order with a supplier. At every step, several methods of cooperation are possible. The first is to send a message from one agent to another in order to notify it of the task it has just performed. A second method uses machine perception to detect the action performed by the previous machine. A certain method may be preferable depending on the objectives (production speed etc.).

The researchers have also shown that a robot in the middle of the line may be replaced by another. Interoperability made it possible for the line to adapt to hardware changes without impacting its production. This issue of flexibility is extremely important with a view towards integrating a new generation of nomadic robots. “In September 2020, we start the SIRAM industry of the future project, which should make it possible to deploy interoperable, adaptable information systems to control mobile robotic assistants,” says Maxime Lefrançois. In the future, these devices could be positioned at strategic locations in companies to assist humans or retrieve components at different parts of the production line. But to do so, they must be able to interact with the other machines on the factory floor.  

Anaïs Culot

digital intelligence

Digital transformation: how to avoid missing out on the new phase of work that has begun

Aurélie DudézertInstitut Mines-Télécom Business School and Florence LavalIAE de Poitiers

[divider style=”normal” top=”20″ bottom=”20″]

[dropcap]A[/dropcap]fter a lockdown that has helped reveal how far along companies are in their digital transformation, the easing of lockdown measures has ushered in a new phase marked by a desire to return to “normal” activities, which is impossible due to changing health restrictions.

Some organizations have therefore tried to use the health context as a pretext for regaining control over informal exchanges and adjustments that are impossible to control in a remote work environment (employees clocking in on site vs. remotely; identifying who is working with whom, at what time, etc.).

The organizational agility required for the goal of digital transformation and implemented in teams during the lockdown has been undermined by attempts to standardize work and return to uniform processes for the entire organization.

Mask-wearing has also become a point of tension. Besides being uncomfortable, masks conceal faces after what was often a period of remote work – in which it was difficult to perceive others’ emotions – and therefore complicate relationships. We must learn to read emotions differently and communicate with others differently.

These challenges are compounded by uncertainty over changing health restrictions. Highly adaptive ways of working must be put in place. Periods of site closures are followed by periods of hybrid work with employees taking turns working on site to comply with health restrictions.

Designing the transformation

After learning how to work together remotely, employees and managers must once again learn how to work together constantly. 

To respond to this situation, three strategies, which we explain in the collective work L’impact de la crise sur le management (The Impact of the Crisis on Management, Éditions EMS) seem to be useful to help get through this second wave of the crisis and continue the digital transformation of working methods.

The first is to work with teams on emerging stress and tensions by seeing them not as symptoms of individuals’ inability/incompetence to cope with the situation, but as drivers for developing appropriate ways to coordinate work.

For instance, if mask-wearing is a source of tension, bringing teams together to discuss what is causing the tension could provide an opportunity to create a new working arrangement that is more effective and better-suited to the new digital environment. This means that the manager must acknowledge employees’ experiences and perceptions and take them seriously so they can be revealed as expectations, such as creativity, or as the rejection of the organization and its goals.

The second strategy is to develop reflexive management, which takes an objective look at the work methods put in place in the current adaptation phase. It is quite clear today that work practices are moving towards a hybridization between working from the office/remotely and synchronous/asynchronous.

Rather than seeing the successive changes in health regulations as constraints, which make it difficult to do business and seamlessly continue their digital transformation, organizations would benefit from taking advantage of these periodic adjustments to gain insight into the pros and cons of this hybrid system.  

This objective look could provide an opportunity to characterize which activities specific to each team are indisputably more productive in person than remotely, or to determine how to manage teams working both from home and on-site.

The third strategy is to “encourage digital intelligence”, meaning working with the team to determine the requirements and uses of digital technology, depending on working methods. For example, it may not be necessary to upgrade employees’ skills to increase their proficiency in an entire collaborative work if the goal is simply to enable them to work together via web conference.

Overstretching employees at such an uncertain and strange time is an additional risk that could undermine the digital transformation process. Going back to the basic uses of digital technology in order to carry out tasks seems to be much more useful and effective.

Aurélie Dudézert, Full Professor, IMT BS, Institut Mines-Télécom Business School and Florence Laval, Lecturer at IAE de Poitiers

This article has been republished from The Conversation under a Creative Commons. Read the original article (in French).

EuHubs4data

Data and AI: fostering innovation at European level

EuHubs4data, a project funded by the European Union, seeks to make a big contribution to the growth of European SMEs, start-ups and web-based companies in the global data economy. How? By providing them with a European catalogue of data-driven solutions and services in an effort to foster innovation in this field. The project was launched on 1 September 2020 and will run for three years, with a budget of €12.5 Million. It brings together 12 Digital Innovation Hubs (DIH) across 9 European Union countries. One of these innovation hubs is TeraLab, IMT’s big data and artificial intelligence platform. An interview with Natalie Cernecka, Head of Business Development at TeraLab.

What are the goals of the EuHubs4data project, in which TeraLab is a partner?

Natalie Cernecka The goal of the project is to bring together services provided by European big data hubs to take full advantage of the benefits offered by the various members of the network. 

There are nearly 200 digital innovation hubs (DIH) in Europe. Some of them are specialized in data. Individually, these hubs are highly effective: they provide various services related to data and act as a link between SMEs in the digital sector and other companies and organizations. At the European level, however, interconnection between these hubs is sorely lacking, which is why the idea of a unified network is so important.

The  project will foster collaboration and exchange between existing hubs and provide a solid foundation for a European data economy to meet the growing data needs of SMEs in the digital sector and start-ups that work with technologies such as artificial intelligence (AI). The focus will be on system interoperability and data sharing. The project is therefore an important step towards implementing the European Commission plan to strengthen Europe’s data economy and digital sovereignty.

How will you achieve these goals?

NC: The project focuses on two areas: supply and demand. On the supply side, we’ll be working on the catalogue of services and datasets and on providing training opportunities in big data and AI. On the demand side, we’ll be carrying out experiments, with three open call sessions, along with an extensive awareness program aimed at getting hundreds of companies and organizations involved and encouraging them to innovate with data.

EuHubs4data offers a catalogue of services for SMEs, start-ups and web companies. Could you give us some concrete examples of such services?

NC: The goal is to propose a single catalogue presenting the various services offered by project partners and their respective ecosystems. For example, TeraLab could provide access to its data sharing platform, while a second partner could offer access to datasets, and a third could provide data analysis tools or training opportunities. The companies will benefit from a comprehensive catalogue and may in turn offer their customers innovative services.

12 digital innovation hubs located in 9 European countries are partners in this project. How will this federation be structured?

NC: The structuring of this federation will be specified over the course of the project. The consortium is headed by the Instituto Tecnológico de Informática in Valencia, Spain and includes DIHs and major European players in the field of big data – such as the Big Data Value Association, in which IMT is a member, and the International Data Spaces Association, which is at the center of GAIA-X and includes IMT as the French representative. A number of initiatives focus on structuring and expanding this ecosystem. The structure has to be flexible enough to incorporate new members, whether over the course of the project or in the distant future.

What will TeraLab’s specific role be?

NC: TeraLab’s role in the project is threefold. First, it is responsible for the work package in charge of establishing the catalogue of services, the central focus of the project. Second, TeraLab will provide its big data and AI platform along with its expertise in securing data. And third, as a DIH, TeraLab will accompany experiments and open calls, which will use the catalogue of services and datasets.  

Read more on I’MTech | Artificial Intelligence: TeraLab becomes a European digital innovation hub

What are some important steps coming up for the project?

NC: The open calls! The first will be launched in December 2020; that means that the first iteration of the catalogue should be ready at that time. The experiments will begin in spring 2021. TeraLab will follow them very closely and accompany several participating companies, to better understand their needs in terms of services, data and the use of the catalogue, in order to improve its use.

Learn more about the EUHubs4Data project :

Interview by Véronique Charlet

OSO-AI

When AI keeps an ear on nursing home residents

The OSO-AI start-up has recently completed a €4 million funding round. Its artificial intelligence solution that can detect incidents such as falls or cries for help has convinced investors, along with a number of nursing homes in which it has been installed. This technology was developed in part through the work of Claude Berrou, a researcher at IMT Atlantique, and the company’s co-founder and scientific advisor.

OSO-AI, a company incubated at IMT Atlantique, is the result of an encounter between Claude Berrou, a researcher at the engineering school, and Olivier Menut, an engineer at STMicroelectronics. Together, they started to develop artificial intelligence that can recognize specific sounds. After completing a €4 million funding round, the start-up now plans to fast-track the development of its product: ARI (French acronym for Smart Resident Assistant), a solution designed to alert staff in the event of an incident inside a resident’s room.

The device takes the form of an electronic unit equipped with high-precision microphones. ARI’s goal is to “listen” to the sound environment in which it is placed and send an alert whenever it picks up a worrying sound. Information is then transmitted via wi-fi and processed in the cloud.

“Normally, in nursing homes, there is only a single person on call at night,” says Claude Berrou. “They hear a cry for help at 2 am but don’t know which room it came from. So they have to go seek out the resident in distress, losing precious time before they can intervene – and waking up many residents in the process. With our system, the caregiver on duty receives a message such as, ‘Room 12, 1st floor, cry for help,’ directly on their mobile phone.” The technology therefore saves time that may be life-saving for an elderly person, and is less intrusive than a surveillance camera so it is better accepted. Especially since it is paused whenever someone else enters the room. Moreover, it helps relieve the workload and mental burden placed on the staff.

OSO-AI is inspired by how the brain works

But how can an information system hear and analyze sounds? The device developed by OSO-AI relies on machine learning, a branch of artificial intelligence, and artificial neural networks. In a certain way, this means that it tries to imitate how the brain works. “Any machine designed to reproduce basic properties of human intelligence must be based on two separate networks,” explains the IMT Atlantique researcher. “The first is sensory-based and innate: it allows living beings to react to external factors based on the five senses. The second is cognitive and varies depending on the individual: it supports long-term memory and leads to decision-making based on signals from diverse sources.”

How is this model applied to the ARI unit and the computers that receive the preprocessed signals? A first “sensory-based” layer is responsible for capturing the sounds, using microphones, and turning them into representative vectors. These are then compressed and sent to the second “cognitive” layer, which then analyzes the information, relying in particular on neural networks, in order to decide whether or not to issue an alert. It is by comparing new data to that already stored in its memory that the system is able to make a decision. For example, if a cognitively-impaired resident tends to call for help all the time, it must be able to decide not to warn the staff every time.

The challenges of the learning phase

Like any self-learning system, ARI must go through a crucial initial training phase to enable it to form an initial memory, which will subsequently be increased. This step raises two main problems.

First of all, it must be able to interpret the words pronounced by residents using a speech-to-text tool that turns a speaker’s words into written text. But ARI’s environment also presents certain challenges. “Elderly individuals may express themselves with a strong accent or in a soft voice, which makes their diction harder to understand,” says Claude Berrou. As such, the company has tailored its algorithms to these factors.

Second, what about other sounds that occur less frequently, such as a fall? In these cases, the analysis is even more complex. “That’s a major challenge for artificial intelligence and neural networks: weakly-supervised learning, meaning learning from a limited number of examples or too few to be labeled,” explains the IMT Atlantique researcher. “What is informative is that it’s rare. And that which is rare is not a good fit for current artificial intelligence since it needs a lot of data.” OSO-AI is also innovative in this area of weakly-supervised learning.

Data is precisely a competitive advantage on which OSO-AI intends to rely. As it is installed in a greater number of nursing homes, the technology acquires increasingly detailed knowledge of sound environments. And little by little, it builds a common base of sounds (falls, footsteps, doors etc.) which can be reused in many nursing homes.

Read more on I’MTech: In French nursing homes, the Covid-19 crisis has revealed the detrimental effects of austerity policies

From nursing homes to home care

As of now, the product has completed its proof-of-concept phase, and approximately 300 devices have been installed in seven nursing homes, while the product has started to be marketed. The recent funding round will help fast-track the company’s technological and business development by tripling its number of employees to reach a staff of thirty by the end of 2021.

The start-up is already planning to deploy its system to help elderly people remain in their homes, another important societal issue. Lastly, according to Claude Berrou, one of OSO-AI’s most promising future applications is to monitor well-being, in particular in nursing home residents. In addition to situations of distress, the technology could detect unusual signs in residents, such as a more pronounced cough. In light of the current situation, there is no doubt that such a function would be highly valued.

Managing electronic waste: a global problem

Responsibilities surrounding digital waste are multi-faceted. On one side, it is governments’ responsibility to establish tighter border controls to better manage the flow of waste and make sure that it is not transferred to developing countries. On the other side, electronic device manufacturers must take accountability for their position by facilitating end-of-life management of their products. And consumers must be aware of the “invisible” consequences of their uses, since they are outsourced to other countries.

To understand how waste electric and electronic equipment (WEEE) is managed, we must look to the Bâle Convention of 1989. This multilateral treaty was initially intended to manage the cross-border movement of hazardous waste, to which WEEE was later added. “The Bâle Convention resulted in regional agreements and national legislation in a great number of countries, some of whom prohibit the export or import of WEEE,” says Stéphanie Reiche de Vigan, a research professor in sustainable development law and new technologies at Mines ParisTech. “This is the case for the EU regulation on transfer of waste, which prohibits the export of WEEE to third countries.” Nevertheless, in 2015 the EFFACE European research project, devoted to combating environmental crime, estimated that approximately 2 million items of WEEE leave Europe illegally every year. How can so much electronic waste cross borders clandestinely? “A lack of international cooperation hinders efforts to detect, investigate and prosecute environmental crimes related to electronic waste trafficking,” says the researcher. And even if an international agreement on WEEE were to be introduced, it would have little impact without real determination on the part of the waste-producing countries to limit the transfer of this waste. 

This is compounded by the fact that electronic waste trafficking is caught between two government objectives: punishing environmental crimes and promoting international commerce in order to recover market share in international shipping. To increase competitiveness, the London Convention of 1965 aimed at facilitating international shipping, allowed for better movement of vessels, merchandise and passengers through ports. “The results were a simplification of customs procedures to encourage more competitive transit through ports, and distortions of competition between ports of developed countries through minimum enforcement of regulations for cross-border transfer of electronic waste, in particular controls by customs and port authorities,” says Stéphanie Reiche de Vigan. The European Union observed that companies that export and import WEEE tend to use ports where the law was less enforced, and therefore less effective.

So how can this chain of international trafficking be broken? “The International Maritime Organization must address this issue in order to encourage the sharing of best practices and harmonize control procedures,” responds the research professor. It is the responsibility of governments to tighten controls at their ports to limit these crimes. And technology could play a major role in helping them do so. “Making it compulsory to install X-ray scanners in ports and use them to visualize the contents of containers could help reduce the problem,” says Stéphanie Reiche-de Vigan. At present, only 2% of all ocean containers worldwide are physically inspected by customs authorities.

What are the responsibilities of technology companies?

The digital technology chain is divided into separate links: mining, manufacturing, marketing and recycling. The various stages in the lifetime of an electronic device are therefore isolated and disconnected from one another. As such, producers are merely encouraged to collaborate with the recycling industry. “As long as the producers of electric and electronic equipment have no obligation to limit their production, cover recycling costs or improve the recyclability of their products, electronic waste flows cannot be managed,” she says. Solving this problem would involve reconnecting the various parts of the chain through a life cycle analysis of electric and electronic equipment and redefining corporate responsibilities.

Rethinking corporate responsibility would mean putting pressure on tech giants, but developed countries seem to be incapable of doing so. Yet, it is the governments that bear the cost of sorting and recycling. So far, awareness of this issue has not been enough to implement concrete measures that are anything more than guidelines. National Digital Councils in Germany and France have established roadmaps for designing responsible digital technology. They propose areas for future regulation such as extending the lifetime of devices. But there is no easy solution since a device that lasts twice as long means half as much production for manufacturers. “Investing in a few more companies that are responsible for reconditioning devices and extending their lifetime is not enough. We’re still a long way from viable proposals for the environment and the economy,” says Fabrice Flipo, a philosopher of science at Institut Mines-Télécom Business School.

Moreover, countries are not the only ones to come up against the power of big tech companies. “At Orange, starting in 2017, we tried to put a system in place to display environmental information in order to encourage customers to buy phones with the least impact,” says Samuli Vaija, an expert responsible for issues related to product life cycle analysis at Orange. Further upstream in the chain, this measure encouraged manufacturers to incorporate environmental sustainability into their product ranges. When it was presented to the International Telecommunication Union, Orange’s plan was quickly shut down by the American opposition (Apple, Intel), who did not wish to display information about the carbon footprint on its devices.  

Still, civil society, and NGOs in particular, could build political will. The main obstacle: people living in developed countries have little or no awareness of the environmental impacts of their excessive consumption of digital tools, since they are not directly affected by them. “Too often, we forget that there are also violations of human rights behind the digital tools our Western societies rely on, from the extraction of the resources required to manufacture equipment, to the transfer of the waste they produce after just a few years. From the first link to the last, it is primarily people living in developing countries that suffer the impacts of the consumption of those in developed countries. The health impacts are not visible in Europe, since they are outsourced,” says Stéphanie Reiche-de Vigan. In rich countries, is digital technology effectively enclosed in an information bubble containing only the sum of its beneficial aspects? The importance attributed to digital technology must be balanced with its negative aspects.

As such, “it is also the responsibility of universities, engineering schools and business schools to teach students about environmental issues starting at the undergraduate level, while incorporating life cycle analysis and concern for environmental and human impacts in their programs,” says Stéphanie Reiche-de Vigan. Educating students about these issues means bringing these profiles to the companies who will develop the tools of tomorrow and the agencies meant to oversee them.

Airstream Alvie Cobbaï

Airstream, Alvie and Cobbaï supported by the IMT Digital honor loan scheme

The members of the IMT Digital Fund IGEU, IMT and Fondation Mines-Télécom – held a meeting on 17 November. During the meeting, three start-ups from the Télécom Paris incubator were selected to receive support through seven honor loans for a total sum of €120,000.

 

[one_half][box type=”shadow” align=”” class=”” width=””]

Airstream is a new-generation project management platform that allows companies to better coordinate work packages and business teams during complex projects and programs. The start-up will receive a €40,000 honor loan. Find out more

[/box][/one_half][one_half_last][box type=”shadow” align=”” class=”” width=””]

Alvie proposes HYGO, a solution that turns any sprayer into a smart sprayer and helps farmers optimize the quantity of phytosanitary products used and increase the efficiency of bio-control for organic farming. Alvie will receive three honor loans for a total sum of €40,000. Find out more

[/box][/one_half_last]

[box type=”shadow” align=”” class=”” width=””]

Cobbaï proposes a SaaS for industrial actors to automate the analysis of their corporate textual data and boost their quality, maintenance and after-sales service performances. The start-up will receive three honor loans for a total sum of €40,000. Find out more

[/box]

Europes green deal

Digital technology, the gap in Europe’s Green Deal

Fabrice Flipo, Institut Mines-Télécom Business School

[divider style=”normal” top=”20″ bottom=”20″]

[dropcap]D[/dropcap]espite the Paris Agreement, greenhouse gas emissions are currently at their highest. Further action must be taken in order to stay under the 1.5°C threshold of global warming. But thanks to the recent European Green Deal aimed at reaching carbon neutrality within 30 years, Europe now seems to be taking on its responsibilities and setting itself high goals to tackle contemporary and future environmental challenges.

The aim is to become a society which is “fair, prosperous and a modern, resource-efficient and competitive economy”. This should make the European Union a global leader in the field of the “green economy”, with citizens being placed at the heart of a “sustainable and inclusive growth”.

The deal’s promise

How can such a feat be achieved?

The green deal is set within a long-term political framework for energy efficiency, waste, eco-conception, circular economy, public purchase and consumer education. Thanks to these objectives, the UE  aims to reach the long-awaited decoupling:

“A direct consequence of the regulations put in place between 1990 and 2016 is that energy consumption has decreased by almost 2% and greenhouse gas emissions by 22%, while GDP has increased by 54% […]. The percentage of renewable energy has gone from representing 9% of total energy consumption in 2005 to 17% today.”

With the Green Deal the aim is to continue this effort via ever-increasing renewable energies, energy efficiency and green products. The sectors of textiles, building and electronics are now the center of attention as part of a circular economy framework, with a strong focus on repair and reuse, driven by incentives for businesses and consumers.

Within this framework, energy efficiency measures should reduce our energy consumption by half, with the focus on energy labels and the savings they have made possible.

According to the Green Deal, the increased use of renewable energy sources should enable us to bring the share of fossil fuels down to just 20%. The use of electricity will be encouraged as an energy carrier, and 80% of it should be renewable by 2050. Energy consumption should be cut by 28% from its current levels. Hydrogen, carbon storage and varied processes for the chemical conversion of electricity into combustible materials will be used additionally, enabling an increase in the capacity and flexibility of storage.

In this new order, a number of roles have been identified: on one side, the producers of clean products, and on the other, the citizens who will buy them. In addition to this mobilization of producers and of consumers, national budgets, European funding and “green” (private) finance will commit to the cause; the framework of this commitment is expected to be put in place by the end of 2020.

Efficiency, renewable energy, a sharp decrease in energy consumption, promises of new jobs: if we remember that back in the 1970s, EDF was simply planning on building 200 nuclear power plants by the year 2000 – following a mindset which associated consumption and progress – everything now suggests that supporters of the Negawatt scenario (NGOs, ecologists, networks of committed local authorities, businesses and workers) have won a battle which is historic, cultural (in terms of values and realization of what is at stake) and political (backed by official texts).

The trajectory of GHG in a 1.5°C global warming scenario.

 

According to the deal, savings made on fossil fuels could reach between €150 billion and €200 billion per year, to which would be added the amount of health costs that will be avoided, amounting to €200 billion a year and the prospect of exporting “green” products.. Finally, millions of jobs may be created, with retraining mechanisms for the sectors that are the most impacted, and support for low-income households.

Putting the deal to the test

A final victory? On paper, everything points that way.

However, it is not as simple at it seems, and the UE itself recognizes that improvements in the field of energy efficiency and the decrease in glasshouse gas emissions are currently stalling..

This is due to the following factors, in order of importance: economic growth; the decrease in energy efficiency savings, especially in the airline industry; the sharp increase in the number of SUVs; and finally, the upward adjustment of real vehicle emissions, following the “diesel gate” scandal (+30 %).

More seriously, the EU’s net emissions, which include those generated by imports and exports, have risen by 8% during the 1990-2010 period.

Efficiency therefore has its limits and savings are more likely to be made at the start than at the end.

The digital technology challenge

According to the Green Deal, ‘Digital technologies are a critical enabler for attaining the sustainability goals of the Green deal in many different sectors”: 5G, CCTV, Internet of things, cloud computing or AI. We have our doubts, however, as to whether that is true.

Several studies, including by the Shift Project, show that emissions from the digital sector have doubled between 2010 and 2020. They are now higher than those produced by the much-criticized civil aviation sector. The digital applications put forward by the European Green Deal are some of the most energy consuming, according to several case scenarios.

Can the increase in usage be offset by energy efficiency? The sector has seen tremendous progress, on a scale not seen in any other field. The first computer, the ENIAC, weighed 30 tons, consumed 150,000 watts and could not do more than 5,000 operations per second. A modern PC consumes 200 to 300 W, for the same available power as a supercomputer of the early 2000s which consumed 1.5 MW! Progress knows no bounds…

However, the absolute limit (the “Landauer limit”) was identified in 1961 and confirmed in 2012. According to the semiconductor industry itself, the limit is fast approaching in terms of the timeframe for the Green Deal, at a time when traffic and calculation power are increasing exponentially. Is it therefore reasonable to continue becoming increasingly dependent on digital technologies, in the hope that efficiency curves might reveal energy consumption “laws”?

Especially when we consider that the gains obtained in terms of energy efficiency have little to do with any shift towards more ecology-oriented lifestyles: the motivations have been cost, heat removal and the need to make sure our digital devices could be mobile so as to keep our attention at all times.

These limitations on efficiency explain the increased interest in more sparing use of digital technologies. The Conseil National du Numérique presented its roadmap shortly after Germany. However, the Green Deal is stubbornly following the same path: a path which consists in relying on an imaginary digital sector which has little in common with the realities of the sector.

Digital technologies, facilitating growth

Drawing from a recent article, the Shift Project sends a warning: “Up until now, rebound effects have tuned out to exceed the gains brought by technological innovation.” This conclusion has once more been recently confirmed.

For example, the environmental benefits of distance working have in fact been much smaller than those we were expecting intuitively, especially when not combined with other changes in the social ecosystem. Another example is that in its 2019 “current” scenario, the OECD predicted a threefold increase in passenger transport between 2015 and 2050, facilitated (and not impeded) by autonomous vehicles.

Digital technologies are a growth factor first and foremost, as Pascal Lamy, then Head of the WTO, said when he stated that globalization is based on two innovations: Internet and the container. An increase in digital technologies will lead to more emissions. And if this is not the case, it will be because of a change in how we approach ecology, including digital technologies.

We are justified in asking the question of what it is the Green Deal is really trying to protect: the climate or the digital markets for big corporations?

[divider style=”dotted” top=”20″ bottom=”20″]

Fabrice Flipo, Professor of social and political philosophy, epistemology and history of science and technology  at Institut Mines-Télécom Business School

This article is republished from The Conversation under the Creative Commons license. Read the original article (in French) here.

IA TV

The automatic semantics of images

Recognizing faces, objects, patterns, music, architecture, or even camera movements: thanks to progress in artificial intelligence, every plan or sequence in a video can now be characterized. In the IA TV joint laboratory created last October between France Télévisions and Télécom SudParis, researchers are currently developing an algorithm capable of analyzing the range of fiction programs offered by the national broadcaster.

 

As the number of online video-on-demand platforms has increased, recommendation algorithms have been developed to go with them, and are now capable of identifying (amongst other things) viewers’ preferences in terms of genre, actors or themes, boosting the chances of picking the right program. Artificial intelligence now goes one step further by identifying the plot’s location, the type of shots and actions, or the sequence of scenes.

The teams of France Télévisions and Télécom SudParis have been working towards this goal since October 2019, when the IA TV joint laboratory was created. Their work focuses on automating the analysis of the video contents of fiction programs. “Today, our recommendation settings are very basic. If a viewer liked a type of content, program, film or documentary, we do not know much about the reasons why they liked it, nor about the characteristics of the actual content. There are so many different dimensions which might have appealed to them – the period, cast or plot,” points out Matthieu Parmentier, Head of the Data & AI Department at France Télévisions.

AI applied to fiction contents

The aim of the partnership is to explore these dimensions. Using deep learning, a neural network technique, researchers are applying algorithms to a massive quantity of videos. The different successive layers of neurons can extract and analyze increasingly complex features of visual scenes: the first layer extracts the image’s pixels, while the last attaches labels to them.

Thanks to this technology, we are now able to sort contents into categories, which means that we can classify each sequence, each scene in order to identify, for example, whether it was shot outside or inside, recognize the characters/actors involved, identify objects or locations of interest and the relationships between them, or even extract emotional or aesthetic features. Our goal is to make the machine capable of progressing automatically towards interpreting scenes in a way that is semantically close to that of humans”, says Titus Zaharia, a researcher at Télécom SudParis and specialist in AI applied to multimedia content.

Researchers have already obtained convincing results. Is this scene set in a car? In a park? Inside a bus? The tool can suggest the most relevant categories by order of probability. The algorithm can also determine the types of shots in the sequences analyzed: wide, general or close-up shots. “This did not exist until now on the market,” says Matthieu Parmentier enthusiastically. “And as well as detecting changes from one scene to another, the algorithm can also identify changes of shot within the same scene.

According to France Télévisions, there are many possible applications. Firstly, the automatic extraction of the key frames, meaning the most representative image to illustrate the content of a fiction, for each sequence and according to aesthetic criteria. Then there is the identification of the “ideal” moments in a program to insert ad breaks. “Currently, we are working on fixed video shots, but one of our next aims is to be able to characterize moving shots such as zooms, traveling or panoramic shots. This could be very interesting for us, as it could help to edit or reuse contents”, adds Matthieu Parmentier.

Multimodal AI solutions

In order to adapt to the new digital habits of viewers, the teams of France Télévisions and Télécom SudParis have been working together for over five years. They have contributed to the creation of artificial intelligence solutions and tools applied to digital images, but also to other forms of content, texts and sounds. In 2014, the two entities launched a collaborative project, Média4Dplayer, a prototype of a media player designed for all four types of screens (TV, PC, tablet and smartphone). This would be accessible to all, and especially to elderly people or people with disabilities. A few months later, they were looking into the automatic generation of subtitles. The are several advantages to this: equal access to content and the possibility to view a video without sound.

In the case of television news, for example, subtitles are generated live by professionals typing, but as we have all seen, this can sometimes lead to errors or to delays between what is heard and what appears on screen,” explains Titus Zaharia. The solution developed by the two teams allows automatic synchronization for the Replay content offered by France TV. The teams were able to file a joint patent after two and a half years of development.

In time, we are hoping to be able to offer perfectly synchronized subtitles just a few seconds after the broadcast of any type of live television program,” continues Matthieu Parmentier.

France Télévisions still has issues to be addressed by scientific research and especially artificial intelligence. What we are interested in is developing tools which can be used and put on the market rapidly, but also tools that will be sufficiently general in their methodology to find other fields of application in the future,” concludes Titus Zaharia.