Audio and machine learning: Gaël Richard’s award-winning project

Gaël Richard, a researcher in Information Processing at Télécom Paris, has been awarded an Advanced Grant from the European Research Council (ERC) for his project entitled HI-Audio. This initiative aims to develop hybrid approaches that combine signal processing with deep machine learning for the purpose of understanding and analyzing sound.

Artificial intelligence now relies heavily on deep neural networks, which have a major shortcoming: they require very large databases for learning,” says Gaël Richard, a researcher in Information Processing at Télécom Paris. He believes that “using signal models, or physical sound propagation models, in a deep learning algorithm would reduce the amount of data needed for learning while still allowing for the high controllability of the algorithm.” Gaël Richard plans to pursue this breakthrough via his HI-Audio* project, which won an ERC Advanced Grant on April 26, 2022

For example, the integration of physical sound propagation models can improve the characterization and configuration of the types of sound analyzed and help to develop an automatic sound recognition system. “The applications for the methods developed in this project focus on the analysis of music signals and the recognition of sound scenes, which is the identification of the recording’s sound environment (outside, inside, airport) and all the sound sources present,” Gaël Richard explains.

Industrial applications

Learning sound scenes could help autonomous cars identify their surroundings. The algorithm would be able to identify the surrounding sounds using microphones. The vehicle would be able to recognize the sound of a siren and its variations in sound intensity. Autonomous cars would then be able to change lanes to let an ambulance or fire engine pass, without having to “see” it in the detection cameras. The processes developed in the HI-Audio project could be applied to many other areas. The algorithms could be used in predictive maintenance to control the quality of parts in a production line. A car part, such as a bumper, is typically controlled based on the sound resonance generated when a non-destructive impact is applied.

The other key applications for the HI-Audio project are in the field of AI for music, particularly to assist musical creation by developing new interpretable methods for sound synthesis and transformation.

Machine learning and music

One of the goals of this project is to build a database of music recordings from a wide variety of styles and different cultures,” Gaël Richard explains. “This database, which will be automatically annotated (with precise semantic information), will expand the research to include less studied or less distributed music, especially from audio streaming platforms,” he says. One of the challenges of this project is that of developing algorithms capable of recognizing the words and phrases spoken by the performers, retranscribing the music regardless of its recording location, and contributing new musical transformation capabilities (style transfer, rhythmic transformation, word changes).

One important aspect of the project will also be the separation of sound sources,” Gaël Richard says. In an audio file, the separation of sources, which in the case of music are each linked to a different instrument, is generally achieved via filtering or “masking”. The idea is to hide all other sources until only the target source remains. One less common approach is to isolate the instrument via sound synthesis. This involves analyzing the music to characterize the sound source to be extracted in order to reproduce it. For Gaël Richard, “the advantage is that, in principle, artifacts from other sources are entirely absentIn addition, the synthesized source can be controlled by a few interpretable parameters, such as the fundamental frequency, which is directly related to the sound’s perceived pitch,” he says. “This type of approach opens up tremendous opportunities for sound manipulation and transformation, with real potential for developing new tools to assist music creation,” says Gaël Richard.

*HI-Audio will start on October 1st, 2022 and will be funded by the ERC Advanced Grant for five years for a total amount of €2.48 million.

Rémy Fauvel

Tatouage des données de santé, health data

Encrypting and watermarking health data to protect it

As medicine and genetics make increasing use of data science and AI, the question of how to protect this sensitive information is becoming increasingly important to all those involved in health. A team from the LaTIM laboratory is working on these issues, with solutions such as encryption and watermarking. It has just been accredited by Inserm.

The original version of this article has been published on the website of IMT Atlantique

Securing medical data

Securing medical data, preventing it from being misused for commercial or malicious purposes, from being distorted or even destroyed has become a major challenge for both health players and public authorities. This is particularly relevant at a time when progress in medicine (and genetics) is increasingly based on the use of huge quantities of data, particularly with the rise of artificial intelligence. Several recent incidents (cyber-attacks, data leaks, etc.) have highlighted the urgent need to act against this type of risk. The issue also concerns each and every one of us: no one wants their medical information to be accessible to everyone.

Health data, which is particularly sensitive, can be sold at a higher price than bank data,” points out Gouenou Coatrieux, a teacher-researcher at LaTIM (the Medical Information Processing Laboratory, shared by IMT Atlantique, the University of Western Brittany (UBO) and Inserm), who is working on this subject in conjunction with Brest University Hospital. To enable this data to be shared while also limiting the risks, LaTIM are usnig two techniques: secure computing and watermarking.

Secure computing, which combines a set of cryptographic techniques for distributed computing along with other approaches, ensures confidentiality: the externalized data is coded in such a way that it is possible to continue to perform calculations on it. The research organisation that receives the data – be it a public laboratory or private company – can study it, but doesn’t have access to its initial version, which it cannot reconstruct. They therefore remain protected.


Gouenou Coatrieux, teacher-researcher at LaTIM
(Laboratoire de traitement de l’information médicale, common to IMT Atlantique, Université de Bretagne occidentale (UBO) and Inserm

Discreet but effective tattooing

Tattooing involves introducing a minor and imperceptible modification into medical images or data entrusted to a third party. “We simply modify a few pixels on an image, for example to change the colour a little, a subtle change that makes it possible to code a message,” explains Gouenou Coatrieux. We can thus tattoo the identifier of the last person to access the data. This method does not prevent the file from being used, but if a problem occurs, it makes it very easy to identify the person who leaked it. The tattoo thus guarantees traceability. It also creates a form of dissuasion, because users are informed of this device. This technique has long been used to combat digital video piracy. Encryption and tattooing can also be combined: this is called crypto-tattooing.

Initially, LaTIM team was interested in the protection of medical images. A joint laboratory was thus created with Medecom, a Breton company specialising in this field, which produces software dedicated to radiology.

Multiple fields of application

Subsequently, LaTIM extended its field of research to the entire field of cyber-health. This work has led to the filing of several patents. A former doctoral student and engineer from the school has also founded a company, WaToo, specialising in data tagging. A Cyber Health team at LaTIM, the first in this field, has just been accredited by Inserm. This multidisciplinary team includes researchers, research engineers, doctoral students and post-docs, and includes several fields of application: protection of medical images and genetic data, and ‘big data’ in health. In particular, it works on the databases used for AI and deep learning, and on the security of treatments that use AI. “For all these subjects, we need to be in constant contact with health and genetics specialists,” stresses Gouenou Coatrieux, head of the new entity. We also take into account standards in the field such as DICOM, the international standard for medical imaging, and legal issues such as those relating to privacy rights with the application of European RGPD regulations.

The Cyber Health team recently contributed to a project called PrivGen, selected by the Labex (laboratory of excellence) CominLabs. The ongoing work which started with PrivGen aims to identify markers of certain diseases in a secure manner, by comparing the genomes of patients with those of healthy people, and to analyse some of the patients’ genomes. But the volumes of data and the computing power required to analyse them are so large that they have to be shared and taken out of their original information systems and sent to supercomputers. “This data sharing creates an additional risk of leakage or disclosure,” warns the researcher. “PrivGen’s partners are currently working to find a technical solution to secure the treatments, in particular to prevent patient identification”.

Towards the launch of a chaire (French research consortium)

An industrial chaire called Cybaile, dedicated to cybersecurity for trusted artificial intelligence in health, will also be launched next fall. LaTIM will partner with three other organizations: Thales group, Sophia Genetics and the start-up Aiintense, a specialist in neuroscience data. With the support of Inserm, and with the backing of the Regional Council of Brittany, it will focus in particular on securing the learning of AI models in health, in order to help with decision-making – screening, diagnoses, and treatment advice. “If we have a large amount of data, and therefore representations of the disease, we can use AI to detect signs of anomalies and set up decision support systems,” says Gouenou Coatrieux. “In ophthalmology, for example, we rely on a large quantity of images of the back of the eye to identify or detect pathologies and treat them better.

BeePMN, abeilles, apiculteur

BeePMN: Monitoring bees to take better care of them

At the crossroads between Industry 4.0 and the Internet of Things, the BeePMN research project aims to help amateur and professional beekeepers. It will feature an intuitive smartphone app that combines business processes with real-time measurements of apiaries. 

When a swarm of bees becomes too crowded for its hive, the queen stops laying eggs and the worker bees leave in search of a place to start a new colony. The hive splits into two groups; those who follow the queen to explore new horizons, and those who stay and choose a new queen to take over the leadership of the colony. As exciting as this new adventure is for the bees, for the beekeeper who maintains the hive, this new beginning brings complications. In particular, the loss of part of the colony also leads to a decrease in honey production. On the other hand, the loss of the bees can be caused by something much worse, like the emergence of a virus or an invasion that threatens the health of the bee colony. 

Beekeepers therefore monitor these events in the life of the bees very closely, but keeping track of the hives on a daily basis is a major problem, and a question of time. The BeePMN project, at the crossroads between the processes of Industry 4.0 and the Internet of Things, wants to give the beekeepers eyes in the back of their heads to be able to monitor the health of their hives in real time. BeePMN combines a non-invasive sensor system, to provide real-time data, with an intuitive and easy-to-use application, to provide decision-making support. 

This project was launched as part of the Hubert Curien Partnerships which support scientific and technological exchanges between countries, offering the installation of sites both in France, near Alès, and in Lebanon, with the beekeeping cooperative Atelier du Miel. It is supported by a collaboration between a team led by Gregory Zacharewicz, Nicolas Daclin and François Trousset at IMT Mines Alès, a team led by Charles Yaacoub and Adib Akl at the Holy Spirit University of Kaslik in Lebanon, and the company ConnectHive. This company, which specializes in engineering as applied to the beekeeping industry, was founded by François Pfister, a retired IMT Mines Alès researcher and beekeeping enthusiast.

BeePMN has several goals: to monitor the health of the hives, to increase honey production, and to facilitate the sharing of knowledge between amateurs and professionals. 

“I actually work on business process problems in industry,” says Grégory Zacharewicz, a researcher at IMT Mines Alès on the project. “But the synergy with these different partners has directed us more towards the craft sector, and specifically beekeeping,” with the aim of providing tools to accelerate their tasks or reminders about certain activities. “I often compare BeePMN to a GPS: it is of course possible to drive without it, but it’s a tool that guides the driver to optimize his choices,” he explains. 

Making better decisions 

The different sites, both in France and Lebanon, are equipped with connected sensors, non-invasive for the bee colonies, which gather real-time data on their health, as well as on humidity, temperature, and weight. For the latter, they have developed ‘nomad’ scales, which are less expensive than the usual fixed equivalent. This data is then recorded in an application to help guide the beekeepers in their daily choices. Though professionals are used to making these kinds of decisions, they may not necessarily have all the information at hand, nor the time to monitor all their apiaries. 

The data observed by the sensors is paired with other environmental information such as the current season, weather conditions, and the flowering period. This allows for precise information on each hive and its environment, and improves the relevance of possible actions and choices. 

“If, for example, we observe a sudden 60% weight loss in a hive, there is no other option than to harvest it,” says Charbel Kady, a PhD student at IMT Mines Alès who is also working on the BeePMN project. On the other hand, if the weight loss happens gradually over the course of the week, that might be the result of lots of other factors, like a virus attacking the colony, or part of the colony moving elsewhere. That is the whole point of combining this essential data, like weight, with environmental variables, to provide more certainty on the cause of an event. “It’s about making sense of the information to identify the cause,” notes Charbel Kady. 

The researchers would also like to add vegetation maps to the environmental information. This is an important aspect, especially with regard to honey plants, but this information is difficult to find for certain regions, and complex to install in an application. The project also aims to progress towards prevention aspects: a PhD student, Marianne El Kassis, joined the BeePMN team to work on simulations and to integrate them into the application, to be able to prevent potential risks. 

Learn through play 

The two researchers stressed that one of the points of the application is for beekeepers to help each other. “Beekeepers can share information with each other, and the interesting model of one colleague can be copied and integrated into the everyday life of another,” says Charbel Kady. The application centralizes the data for a set of apiaries and the beekeepers can share their results with each other, or make them available to beginners. That’s the core of the second part of the project, a ‘serious’ game to offer a simplified and fun version to amateur beekeepers who are less independent. 

Professionals are accustomed to repeating a certain set of actions, so it is possible to formalize them with digital tools in the form of business processes to guide amateurs in their activities. “We organized several meetings with beekeepers to define these business rules and to integrate them into the application, and when the sensors receive the information, it triggers certain actions or alerts, for example taking care of the honey harvest, or needing to add wax to the hive,” explains Grégory Zacharewicz. 

“There is a strong aspect of knowledge and skill transfer. We can imagine it like a sort of companionship to pass on the experience acquired,” says the researcher. The GPS analogy is applicable here too: “It makes available a whole range of past choices from professionals and other users, so that when you encounter a particular situation, it suggests the best response based on what has been decided by other users in the past,” the researcher adds. The concept of the app is very similar, in offering the possibility to capitalize on professionals’ knowledge of business processes to educate yourself and learn, while being guided at the same time. 

The BeePMN project is based on beekeeping activities, but as the researchers point out, the concept itself can be applied to various fields. “We can think of a lot of human and industrial activities where this project could be replicated to support decision-making processes and make them stronger,” explains Grégory Zacharewicz.

Tiphaine Claveau

métavers, metaverse

What is the metaverse?

Although it is only in the prototype stage, the metaverse is already making quite a name for itself. This term, which comes straight out of a science fiction novel from the 1990s, now describes the concept of a connected virtual world, heralded as the future of the Internet. So what’s hiding on the other side of the metaverse? Guillaume Moreau, a Virtual Reality researcher at IMT Atlantique, explains.

How can we define the metaverse?

Guillaume Moreau: The metaverse offers an immersive and interactive experience in a virtual and connected world. Immersion is achieved through the use of technical devices, mainly Virtual Reality headsets, which allow you to feel present in an artificial world. This world can be imaginary, or a more or less faithful copy of reality, depending on whether we’re talking about an adventure video game or the reproduction of a museum, for example. The other key aspect is interaction. The user is a participant, so when they do something, the world around them immediately reacts.

The metaverse is not a revolution, but a democratization of Virtual Reality. Its novelty lies in the commitment of stakeholders like Meta, aka Facebook – a major investor in the concept – to turn experiences that were previously solitary or for small groups only into, massive, multi-user experiences – in other words, to simultaneously interconnect a large number of people in three-dimensional virtual worlds, and to monetize the whole concept. This raises questions of IT infrastructure, uses, ethics, and health.

What are its intended uses?

GM: Meta wants to move all internet services into the metaverse. This is not realistic, because there will be, for example, no point in buying a train ticket in a virtual world. On the other hand, I think there will be not one, but many metaverses, depending on different uses.

One potential use is video games, which are already massively multi-user, but also virtual tourism, concerts, sports events, and e-commerce. A professional use allowing face-to-face meetings is also being considered. What the metaverse will bring to these experiences remains an open question, and there are sure to be many failures out of thousands of attempts. I am sure that we will see the emergence of meaningful uses that we have not yet thought of.

In any case, the metaverse will raise challenges of interoperability, i.e. the possibility of moving seamlessly from one universe to another. This will require the establishment of standards that do not yet exist and that should, as is often the case, be enforced by the largest players on the market.

What technological advances have made the development of these metaverses possible today?

GM: There have been notable material advances in graphics cards that offer significant display capabilities, and Virtual Reality headsets have reached a resolution equivalent to the limits of human eyesight. Combining these two technologies results in a wonderful contradiction.

On the one hand, the headsets work on a compromise; they must offer the largest possible field of view whilst still remaining light, small and energy self-sufficient. On the other hand, graphics cards are heat sinks. Therefore, in order to ensure the battery life of the headsets, the calculations behind the metaverse display have to be done on remote server farms before the images can be transferred. That’s where the 5G networks come in, whose potential for new applications, like the metaverse, is yet to be explored.

Could the metaverse support the development of new technologies that would increase immersion and interactivity?

GM: One way to increase the action of the user is to set them in motion. There is an interesting research topic on the development of multidirectional treadmills. This is a much more complicated problem than it seems, and it only takes the horizontal plane into account – so no slopes, steps, etc.

Otherwise, immersion is mainly achieved through sensory integration, i.e. our ability to feel all our senses at the same time and to detect inconsistencies. Currently, immersion systems only stimulate sight and hearing, but another sense that would be of interest in the metaverse is touch.

However, there are a number of challenges associated with so-called ‘haptic’ devices. Firstly, complex computer calculations must be performed to detect a user’s actions to the nearest millisecond, so that they can be felt without the feedback seeming strange and delayed. Secondly, there are technological challenges. The fantasy of an exoskeleton that responds strongly, quickly, and safely in a virtual world will never work. Beyond a certain level of power, robots must be kept in cages for safety reasons. Furthermore, we currently only know how to do force feedback on one point of the body – not yet on the whole thing.

Does that mean it is not possible to stimulate senses other than sight and hearing?

GM: Ultra-realism is not inevitable; it is possible to cheat and trick the brain by using sensory substitution, i.e. by mixing a little haptics with visual effects. By modifying the visual stimulus, it is possible to make haptic stimuli appear more diverse than they actually are. There is a lot of research to be done on this subject. As far as the other senses are concerned, we don’t know how to do very much. This is not a major problem for a typical audience, but it calls into question the accessibility of virtual worlds for people with disabilities.

One of the questions raised by the metaverse is its health impact. What effects might it have on our health?

GM: We know already that the effects of screens on our health are not insignificant. In 2021, the French National Agency for Food, Environmental and Occupational Health & Safety (ANSES) published a report specifically targeting the health impact of Virtual Reality, which is a crucial part of the metaverse. The prevalence of visual disorders and the risk of Virtual Reality Sickness – a simulation sickness that affects many people – will therefore be sure consequences of exposure to the metaverse.

We also know that virtual worlds can be used to influence people’s behavior. Currently, this has a positive goal and is being used for therapeutic purposes, including the treatment of certain phobias. However, it would be utopian to think that the opposite is not possible. For ethical and logical reasons, we cannot conduct research aiming to demonstrate that the technology can be used to cause harm. It will therefore be the uses that dictate the potentially harmful psychological impact of the metaverse.

Will the metaverses be used to capture more user data?

GM: Yes, that much is obvious. The owners and operators of the metaverse will be able to retrieve information on the direction of your gaze in the headset, or on the distance you have traveled, for example. It is difficult to say how this data will be used at the moment. However, the metaverse is going to make its use more widespread. Currently, each website has data on us, but this information is not linked together. In the metaverse, all this data will be grouped together to form even richer user profiles. This is the other side of the coin, i.e. the exploitation and monetization side. Moreover, given that the business model of an application like Facebook is based on the sale of targeted advertising, the virtual environment that the company wants to develop will certainly feed into a new advertising revolution.

What is missing to make the metaverse a reality?

GM: Technically, all the ingredients are there except perhaps the equipment for individuals. A Virtual Reality headset costs between €300 and €600 – an investment that is not accessible to everyone. There is, however, a plateau in technical improvement that could lower prices. In any case, this is a crucial element in the viability of the metaverse, which, let us not forget, is supposed to be a massively multi-user experience.

Anaïs Culot

cryptographie, nombres aléatoires, random numbers

Cryptography: what are the random numbers for?

Hervé Debar, Télécom SudParis – Institut Mines-Télécom and Olivier Levillain, Télécom SudParis – Institut Mines-Télécom

The original purpose of cryptography is to allow two parties (traditionally referred to as Alice and Bob) to exchange messages without another party (traditionally known as Eve) being able to read them. Alice and Bob will therefore agree on a method to exchange each message, M, in an encrypted form, C. Eve can observe the medium through which the encrypted message (or ciphertext) C is sent, but she cannot retrieve the information exchanged without knowing the necessary secret information, called the key.

This is a very old exercise, since we speak, for example, of the ‘Julius Caesar Cipher’. However, it has become very important in recent years, due to the increasing need to exchange information. Cryptography has therefore become an essential part of our everyday lives. Besides the exchange of messages, cryptographic mechanisms are used in many everyday objects to identify and authenticate users and their transactions. We find these mechanisms in phones, for example, to encrypt and authenticate communication between the telephone and radio antennas, or in car keys, and bank cards.

The internet has also popularized the ‘padlock’ in browsers to indicate that the communication between the browser and the server are protected by cryptographic mechanisms. To function correctly, these mechanisms require the use of random numbers, the quality (or more precisely, the unpredictability) thereof contributes to the security of the protocols.

Cryptographic algorithms

To transform a message M into an encrypted message C, by means of an algorithm A, keys are used. In so-called symmetric algorithms, we speak of secret keys (Ks), which are shared and kept secret by Alice and Bob. In symmetric algorithms, there are public (KPu) and private (KPr) key pairs. For each user, KPu is known to all, whereas KPr must be kept safe by its owner. Algorithm A is also public, which means that the secrecy of communication relies solely on the secrecy of the keys (secret or private).

Sometimes, the message M being transmitted is not important in itself, and the purpose of encrypting said message M is only to verify that the correspondent can decrypt it. This proof of possession of Ks or KPr can be used in some authentication schemes. In this case, it is important never to use the same message M more than once, since this would allow Eve to find out information pertaining to the keys. Therefore, it is necessary to generate a random message NA, which will change each time that Alice and Bob want to communicate.

The best known and probably most widely used example of this mechanism is the Diffie-Helman algorithm.  This algorithm allows a browser (Alice) and a website (Bob) to obtain an identical secret key K, different for each connection, by having exchanged their respective KPu beforehand. This process is performed, for example, when connecting to a retail website. This allows the browser and the website to exchange encrypted messages with a key that is destroyed at the end of each session. This means that there is no need to keep it (allowing for ease of use and security, since there is less chance of losing the key). It also means that not much traffic will be encrypted with the same key, which makes cryptanalysis attacks more difficult than if the same key were always used.

Generating random numbers

To ensure Eve is unable obtain the secret key, it is very important that she cannot guess the message NA. In practice, this message is often a large random number used in the calculations required by the chosen algorithm.

Initially, generating random variables was used for a lot of simulation work. To obtain relevant results, it is important not to repeat the simulation with the same parameters, but to repeat the simulation with different parameters hundreds or even thousands of times. The aim is to generate numbers that respect certain statistical properties, and that do not allow the sequence of numbers to be differentiated from a sequence that would be obtained by rolling dice, for example.

To generate a random number NA that can be used in these simulations, so-called pseudo-random generators are normally used, which apply a reprocessing algorithm to an initial value, known as the ‘seed’.  These pseudo-random generators aim to produce a sequence of numbers that resembles a random sequence, according to these statistical criteria. However, using the same seed twice will result in obtaining the same sequence twice.

The pseudo-random generator algorithm is usually public. If an attacker is able to guess the seed, he will be able to generate the random sequence and thus obtain the random numbers used by the cryptographic algorithms. In the specific case of cryptography, the attacker does not necessarily even need to know the exact value of the seed. If they are able to guess a set of values, this is enough to quickly calculate all possible keys and to crack the encryption.

In the 2000s, programmers used seeds that could be easily guessed, that were based on time, for example, making systems vulnerable. Since then, to avoid being able to guess the seed (or a set of values for the seed), operating systems rely on a mixture of the physical elements of the system (e.g. processing temperature, bus connections, etc.). These physical elements are impossible for an attacker to observe, and vary frequently, and therefore provide a good seed source for pseudo-random generators.

What about vulnerabilities?

Although the field is now well understood, random number generators are still sometimes subject to vulnerabilities. For example, between 2017 and 2021, cybersecurity researchers found 53 such vulnerabilities (CWE-338). This represents only a small number of software flaws (less than 1 in 1000). Several of these flaws, however, are of a high or critical level, meaning they can be used quite easily by attackers and are widespread.

A prime example in 2010 was Sony’s error on the PS3 software signature system. In this case, the reuse of a random variable for two different signatures allowed an attacker to find the manufacturer’s private key: it then became possible to install any software on the console, including pirated software and malware.

Between 2017 and 2021, flaws have also affected physical components, such as Intel Xeon processors, Broadcom chips used for communications and Qualcom SnapDragon processors embedded in mobile phones. These flaws affect the quality of random number generation.  For example, CVE-2018-5871 and CVE-2018-11290 relate to a seed generator whose periodicity is too short, i.e. that repeats the same sequence of seeds quickly. These flaws have been fixed and only affect certain functions of the hardware, which limits the risk.

The quality of random number generation is therefore a security issue. Operating systems running on newer processors (less than 10 years old) have random number generation mechanisms that are hardware-based. This generally ensures a good quality of the latter and thus the proper functioning of cryptographic algorithms, even if occasional vulnerabilities may arise. On the other hand, the difficulty is especially prominent in the case of connected objects, whose hardware capacities do not allow the implementation of random generators as powerful as those available on computers and smartphones, and which often prove to be more vulnerable.

Hervé Debar, Director of Research and Doctoral Training, Deputy Director, Télécom SudParis – Institut Mines-Télécom and Olivier Levillain, Assistant Professor, Télécom SudParis – Institut Mines-Télécom

This article has been republished from The Conversation under a Creative Commons license. Read the original article.

MP4 for Streaming

Streaming services are now part of our everyday life, and it’s all thanks to MP4. This computer standard allows videos to be played online and on various devices. Jean-Claude Dufourd and Jean Le Feuvre, researchers in Computer Science at Télécom Paris, have been recognized by the Emmy Awards Academy for their work on this computer format amongst other things.

In 2021 the File Format IT working group of the MPEG Committee received an Emmy Award for its work in developing ISOBMFF. Behind this term lies a computer format that was used as the basis for the development of MP4, the famous video standard we have all encountered when saving a file in the ‘.mp4’ format. “The Emmy’s decision to give an award to the File Format group is justified; this file format has had a great impact on the world of video by creating a whole ecosystem that brings together very different types of research,” explains Jean-Claude Dufourd, a computer scientist at Télécom Paris and a member of the File Format group.

MP4, which can capture sound and also video, “is used for live or on-demand media broadcasting, but not for the real-time broadcasting needed to stream games or video conferences,” explains Jean Le Feuvre, also a computer scientist at Télécom Paris and member of the File Format group. There are several features of this format that have contributed to its success, including the ability to capture long videos like movies, while still remaining very compact.

The smaller the file size, the easier they are to circulate on networks. The compactness of MP4 is therefore an advantage for streaming movies and series.  Another explanation for its success is its adaptability to different types of devices. “This technology can be used on a wide variety of everyday devices such as telephones, computers, and televisions,” explains Jean-Claude Dufourd. The reason that MP4 is playable on different devices is because “the HTTP file distribution protocol has been reused to distribute video,” says the researcher.

Improving streaming quality

The HTTP (Hypertext Transfer Protocol), which has been prevalent since the 1990s, is typically used to create websites. Researchers have modified this protocol so that it can be used to broadcast video files online. Their studies led to the development of HTTP streaming, and then to an improved version called DASH (Dynamic Adaptive Streaming over HTTP), a protocol that “cuts up the information in the MP4 file into chunks of a few seconds each,” says Jean-Claude Dufourd. The segments obtained at the end of this process are successfully retrieved by the player to reconstruct the movie or the episode of the series being watched.

This cutting process allows the playback of the video file to be adjusted according to the connection speed. “For each time range, different quality encoding is provided, and the media player is responsible for deciding which quality is best for its conditions of use,” explains Jean Le Feuvre. Typically, if a viewer’s connection speed is low, the streaming player will select the video file with the least amount of data in order to facilitate traffic. The player will therefore select the lowest streaming quality. This feature allows content to continue playing on the platform with minimal risk of interruption.

In order to achieve this ability to adapt to different usage scenarios, tests have been carried out by scientists and manufacturers. “Tests were conducted to determine the network profile of a phone and a computer,” explains Jean-Claude Dufourd. “The results showed that the profiles were very different depending on the device and the situation, so the content is not delivered with the same fluidity,” he adds.

Economic interests

“Today, we are benefiting from 15 years of technological refinement that have allowed us to make the algorithms efficient enough to stream videos,” says Jean-Claude Dufourd. Since the beginning of streaming, one of the goals has been to broadcast videos with the best possible quality, while also reducing loading lag and putting as little strain on the network capacity as possible.

The challenge is primarily economic; the more strain that streaming platforms put on network capacity to stream their content, the more they have to pay. Currently, people are studying how to reduce the broadcaster’s internet bill. One solution would be to circulate video files mainly among users, thereby creating a less centralized streaming system. This is what file sharing systems allow between users (P2P or Peer-to-Peer networks). This alternative is currently being considered by streaming companies, as it would reduce the cost of broadcasting content.  

Rémy Fauvel

Preparing the networks of the future

Whether in the medical, agricultural or academic sectors, the Intelligence of Things will become a part of many areas in society. However at this point, this sector, sometimes known as Web 3.0, faces many challenges. How can we make objects communicate with each other, no matter how different they might be? With the Semantic Web, a window and a thermometer can be connected, or a CO2 sensor and a computer. Research in this area of the web also aims to open up the technological borders that separate networks, just like research around the Open RAN. This network architecture, based on free software, could put an end to the domination of the small number of equipment manufacturers that telecommunications operators rely on.

However, the number of devices accessing networks is constantly rising. This trend risks making the movement of data more complicated and generating interference, or information jamming phenomena, which are caused in particular by the number of connected devices. By exploring the nature of different kinds of interference and their unique features, we can deal with them and limit them more efficiently.

Furthermore, interference also occurs in alternative telecommunications systems, like Non-Orthogonal Multiple Access (NOMA). While this system makes it possible to host more users on networks, as frequency sub-bands are shared better, interference still presents an intrinsic problem. All these challenges must be overcome for networks to be able to interconnect efficiently and facilitate data-sharing between consumers, data processors, storage centers and authorities in the future.


Better network-sharing with NOMA

The rise in the number of connected devices will lead to increased congestion of frequencies available for data circulation. Non-Orthogonal Multiple Access (NOMA) is one of the techniques currently being studied to improve the hosting capacity of networks and avoid their saturation.

To access the internet, a mobile phone must exchange information with base stations, devices commonly known as relay antennas. These data exchanges operate on frequency sub-bands, channels specific to each base station. To host multiple connected device users, a channel is attributed to each user. With the rise in the number of connected objects, there will not be enough sub-bands available to host them all.

To mitigate this problem, Catherine Douillard and Charbel Abdel Nour, telecommunications researchers at IMT Atlantique, have been working on NOMA: a system that places multiple users on the same channel, unlike the current system. “Rather than allocating a frequency band to each user, device signals are superposed on the same frequency band,” explains Douillard.

Sharing resources

The essential idea of NOMA involves making a single antenna work to serve multiple users at the same time,” says Abdel Nour. And to go even further, the researchers are working on the Power-Domain NOMA, “an approach that aims to separate users sharing the same frequency on one or more antennas, according to their transmitting power,” continues Douillard. This system provides more equitable access to spectrum resources and available antennas across users. Typically, when a device encounters difficulties in accessing the network, it may try to access a resource already occupied by another user. However, the antenna transmitting power will be adapted so that the information sent by the device successfully arrives at its destination, while limiting ‘disturbances’ for the user.

Superposing multiple users on the same resource will cause problems in accessing it. For communication to work, the signals sent by the machine need to be received at sufficiently different strengths, so that the antennas can identify them. If the signal strengths are similar, the antennas will mix them up. This can cause interference, in other words, information jamming phenomena, which can hinder the smooth playing of a video or a round of an online game.

Interference: an intrinsic problem for NOMA

To avoid interference, receivers are fitted with decoders, which differentiate between signals according to their reception quality. When the antenna receives the signals, it identifies the one with the best reception quality and proceeds to extract the signal received. It will then recover the lower-quality signal. Once the signals are identified, the base station gives each one access to the network. “This means of handling interference is quite simple to implement in the case of two signals, but much less so when there are many,” states Douillard.

 “To handle interference, there are two main possibilities,” explains Abdel Nour. “One involves canceling the interference, or in other words, the device receivers detect which signals are not intended for them and eliminate them, keeping only those sent to them,” adds the researcher. This approach can be facilitated by interference models, namely those studied at IMT Nord Europe. The second solution involves making the antennas work together. By exchanging information about the quality of connections, they can implement algorithms to determine which devices should be served by NOMA, while avoiding interference from their signals.

Intelligent allocation

We are trying to ensure that resource allocation techniques adapt to user needs, while adjusting the power they need, with no excess,” states Abdel Nour. According to the number of users and the applications being used, the number of antennas in play will vary. If a lot of machines are trying to access the network, multiple antennas can be used at the same time. In the opposite situation, a single antenna could be enough.

Thanks to algorithms, the base stations learn to recognize the different characteristics of devices, like the kinds of applications being used when the device is connected. This allows the intensity of the signal emitted by the antennas to be adapted, in order to serve users appropriately. For example, a streaming service will need a higher bit rate, and therefore a stronger transmitting power than a messaging application.

One of the challenges is to design high-performing algorithms that are energy-efficient,” explains Abdel Nour. By reducing energy consumption, the objective is to generate lower operating costs than the current network architecture, while allowing for a significant rise in the number of connected users. NOMA and other research into interference are part of an overall approach to increase network hosting capability. With the developments in the Internet of Things in particular, this work will prove to be necessary to avoid information traffic jams.

Rémy Fauvel


Interference: a source of telecommunications problems

The growing number of connected objects is set to cause a concurrent increase in interference, a phenomenon which has remained an issue since the birth of telecommunications. In the past decade, more and more research has been undertaken in this area, leading us to revisit the way in which devices handle interference.

Throughout the history of telecommunications, we have observed an increase in the quantities of information being exchanged,” states Laurent Clavier, telecommunications researcher at IMT Nord Europe. “This phenomenon can be explained by network densification in particular,” adds the researcher. The increase in the amount of data circulating is paired with a rise in interference, which represents a problem for network operations.

To understand what interference is, first, we need to understand what a receiver is. In the field of telecommunications, a receiver is a device that converts a signal into usable information — like an electromagnetic wave into a voice. Sometimes, undesired signals disrupt the functioning of a receiver and damage the communication between several devices. This phenomenon is known as interference and the undesired signal, noise. It can cause voice distortion during a telephone call, for example.

Interference occurs when multiple machines use the same frequency band at the same time. To avoid interference, receivers choose which signals they pick up and which they drop. While telephone networks are organized to avoid two smartphones interfering with each other, this is not the case for the Internet of Things, where interference is becoming critical.

Read on I’MTech: Better network-sharing with NOMA

Different kinds of noise causing interference

With the boom in the number of connected devices, the amount of interference will increase and cause the network to deteriorate. By improving machine receivers, it appears possible to mitigate this damage. Most connected devices are equipped with receivers adapted for Gaussian noise. These receivers make the best decisions possible as long as the signal received is powerful enough.

By studying how interference occurs, scientists have understood that it does not follow a Gaussian model, but rather an impulsive one. “Generally, there are very few objects that function together at the same time as ours and near our receiver,” explains Clavier. “Distant devices generate weak interference, whereas closer devices generate strong interference: this is the phenomenon that characterizes impulsive interference,” he specifies.

Reception strategies implemented for Gaussian noise do not account for the presence of these strong noise values. They are therefore easily misled by impulsive noise, with receivers no longer able to recover the useful information. “By designing receivers capable of processing the different kinds of interference that occur in real life, the network will be more robust and able to host more devices,” adds the researcher.

Adaptable receivers

For a receiver to be able to understand Gaussian and non-Gaussian noise, it needs to be able to identify its environment. If a device receives a signal that it wishes to decode while the signal of another nearby device is generating interference, it will use an impulsive model to deal with the interference and decode the useful signal properly. If it is in an environment in which the devices are all relatively far away, it will analyze the interference with a Gaussian model.

To correctly decode a message, the receiver must adapt its decision-making rule to the context. To do so, Clavier indicates that a “receiver may be equipped with mechanisms that allow it to calculate the level of trust in the data it receives in a way that is adapted to the properties of the noise. It will therefore be capable of adapting to both Gaussian and impulsive noise.” This method, used by the researcher to design receivers, means that the machine does not have to automatically know its environment.

Currently, industrial actors are not particularly concerned with the nature of interference. However, they are interested in the means available to avoid it. In other words, they do not see the usefulness of questioning the Gaussian model and undertaking research into the way in which interference is produced. For Clavier, this lack of interest will be temporary, and “in time, we will realize that we will need to use this kind of receiver in devices,” he notes. “From then on, engineers will probably start to include these devices more and more in the tools they develop,” the researcher hopes.

Rémy Fauvel

Photographie d'une tour de téléphonie cellulaire 5G

Open RAN opening mobile networks

With the objective of standardizing equipment in base stations, EURECOM is working on the Open RAN. This project aims to open the equipment manufacturing market to new companies, to encourage the design of innovative material for telecommunications networks.

Base stations, often called relay antennas, are systems that allow telephones and computers to connect to the network. They are owned by telecommunications operators, and the equipment used is provided by a small number of specialized companies. The components manufactured by some are incompatible with those designed by others, which prevents operators from building antennas with the elements of their choice. The roll-out of networks such as 5G depend on this private technology.  

To allow new companies to introduce innovation to networks without being caught up in the games between the various parties, EURECOM is working on the Open RAN project (Open Radio Access Network). It aims to standardize the operation of base station components to make them compatible, no matter their manufacturer, using new network architecture that gets around each manufacturer’s specific component technology. For this, EURECOM is using the Open Air Interface platform, which allows industrial and academic actors to develop and test new software solutions and architectures for 4G and 5G networks. This work is performed in an open-source framework, which allows all actors to find common ground for collaboration on interoperability, eliminating questions of the components’ origin.

Read on I’MTech: OpenAirInterface: An open platform for establishing the 5G system of the future

The Open RAN can be broken down into three key blocks: the radio antenna, distributed unit and centralized unit”, describes Florian Kaltenberger, computer science researcher at EURECOM. The role of the antenna is to receive and send signals to and from telephones, while the second two elements serve to give the radio signal network access, so that users can watch videos or send messages, for example. Unlike radio units, which require specific equipment, “distributed units and centralized units can function with conventional IT material, like servers and PCs,” explains the researcher. There is no longer a need to rely on specially developed, proprietary equipment. Servers and PCs already know how to interact together, independently of their components.

RIC: the key to adaptability

This standardization would allow users of one network to use antennas from another, in the event that their operator’s antennas are too far away. To make the Open RAN function, researchers have developed the RAN Intelligent Controller (RIC), software that represents the heart of this architecture, in a way. The RIC functions thanks to artificial intelligence, which provides indications about a network’s status and guides the activity of base stations to adapt to various scenarios.

For example, if we want to set up a network in a university, we would not approach it in the same way as if we wanted to set one up in a factory, as the issues to be resolved are not the same,” explains Kaltenberger. “In a factory, the interface makes it possible to connect machines to the network in order to make them work together and receive information,” he adds. The RIC is also capable of locating the position of users and adjusting the antenna configurations, which optimizes the network’s operations by allowing for more equitable access between users. For industry, the Open RAN represents an interesting alternative to the telecommunications networks of major operators, due to its low cost and ability to manage energy consumption in a more considered way, evaluating the needs for transmission strength required by users. This system can therefore provide the power users need, and no more.

Simple tools in service of free software

According to Kaltenberger, “the Open RAN architecture would allow for complete control of networks, which could contribute to greater sovereignty.” For the researcher, the fact that this system is controlled by an open-source program ensures a certain level of transparency. The companies involved in developing the software are not the only ones to have access to it. Users can also improve and check the code. Furthermore, if the companies in charge of the Open RAN were to shut down, the system would remain functional, as it was created to exist independently of industrial actors.

At present, multiple research projects around the world have shown that the Open RAN functions, but it is not yet ready to be deployed,” explains Kaltenberger. One of the reasons is the reticence of equipment manufacturers to standardize their material, as this would open the market to new competitors and thereby put an end to their commercial domination. Kaltenberger believes that it will be necessary “to wait for perhaps five more years before standardized systems come on the market”.

Rémy Fauvel