coronacheck, queries

CoronaCheck : separating fact from fiction in the Covid-19 epidemic

Rumors about the origins of the Covid-19 epidemic and news about miracle cures are rampant. And some leaders have taken the liberty of putting forth questionable figures. To combat such misinformation, Paolo Papotti and his team at EURECOM have developed an algorithmic tool for the general public, which can determine the accuracy of the figures. In addition to its potential for informing the public about the epidemic, this work illustrates the challenges and current limitations of automated fact-checking tools.

 

The world is in the midst of an unprecedented health crisis, which has unfortunately been accompanied by an onslaught of incorrect or misleading information. Described by the World Health Organization (WHO) as an ‘infodemic’, such ‘fake news’ – which  is not a new problem – has been spreading over social media and by public figures. We see the effects this may have on the overall vision of this epidemic. One notable example is when public figures such as the president of the United States use incorrect figures to underestimate the impact of this virus and justify continuing the country’s economic activity.

“As IT researchers in data processing and information quality, we can contribute by providing an algorithmic tool to help with fact-checking,” says Paolo Papotti, a researcher at EURECOM. Working with PhD student Mohammed Saeed, and with support from Master’s student Youssef Doubli, he developed a tool that can check this information, following research previously carried out with Professor Immanuel Trummer from Cornell University.

Originally intended for the energy industry – where data is constantly changing and must be painstakingly verified – this tool, which is called CoronaCheck and is now available in French, was adapted in early March to meet current needs.

This fact-checking work is a job in its own right for many journalists: they must use reliable sources to check whether the information heard in various places is correct. And if it turns out to be a rumor, journalists must find sources and explanations to set the record straight. “Our tool does not seek to replace journalists’ investigative work,” explains Paolo Papotti, “but a certain amount of this information can be checked by an algorithm. Our goal is therefore to help social media moderators and journalists manage the wealth of information that is constantly springing up online.”

With millions of messages exchanged every day on networks like Twitter or Facebook, it is impossible for humans to accomplish such a task. Before checking information, at-risk claims must first be identified. But an algorithm can be used by these networks to analyze various data simultaneously and target misinformation. This is the aim of the research program funded by Google to combat misinformation online, which includes the CoronaCheck project. The goal is to provide the general public with a tool to verify figures relating to the epidemic.

A statistical tool

CoronaCheck is a statistical tool that is able to compare quantitative data with the proposed queries. The site works a bit like a search engine: the user enters a query – a claim – and CoronaCheck says whether it is true or false. For example, “there are more coronavirus cases in Italy than in France.”  It’s a tool that speaks with statistics. It can handle logical statements using terms such as “less than” or “constant” but will not understand queries such as “Donald Trump has coronavirus.”

“We think that it’s important for users to be able to understand CoronaCheck’s response,” adds Paolo Papotti. To go back to the previous example, the software will not only respond as to whether the statement is true or false, but will also provide details in its response. It will specify the number of cases in each country and the date for which these data are correct. “If the date is not specified, it will take the most recent results by default, meaning for the month of March,” says the researcher.

This means that it is essential to update the data regularly. “Every day, we enter the new data compiled by John Hopkins University,” he says. The university also collects data from several official sources such as the WHO and the European Centre for Disease Prevention and Control.

“We know that this tool isn’t perfect,” says Paolo Papotti. The system relies on machine learning, so the model must be trained. “We know that it is not exhaustive and that a user may enter a word that is unknown to the model.” User feedback is therefore essential in order to improve the system. Comments are analyzed to incorporate questions or wording of statements that have not been taken into account. Users must also follow CoronaCheck’s instructions and speak a language the system understands.

Ambiguity of language

It is important to recognize that language can be a significant barrier for an automatic verification tool since it is ambiguous.  The term “death rate” is a perfect example of such ambiguity. For the general public it refers to the mortality rate, meaning the number of deaths in relation to a population for a given period of time. However, the “death rate” can also mean the case fatality rate, meaning the number of deaths in relation to the total number of cases of the disease. The results will therefore differ greatly depending on the meaning of the term.

Such differences in interpretation are always possible in human language, but must not be possible be in this verification work. “So the system has to be able to provide two responses, one for each interpretation of death rate,” explains Paolo Papotti. This would also work in cases where a lack of rigor may lead to an interpretation problem.

If the user enters the query, “there are more cases in Italy than in the United States,” this may be true for February, but false for April. “Optimally, we would have to evolve towards a system that gives different responses, which are more complex than true or false,” says Paolo Papotti. “This is the direction we’re focusing on in order to solve this interpretation problem and go further than a statistical tool,” he adds.

The team is working on another system, which could respond to queries that cannot be  determined with statistics, for example, “Donald Trump has coronavirus.” This requires developing a different algorithm and the goal would be to combine the two systems. “We will then have to figure out how to assign a query to one system or another, and combine it all in a single interface that is accessible and easy to use.”

Tiphaine Claveau for I’MTech

 

networks, réseaux

High expectations for AI to ensure the security of new networks

As networks increasingly rely on new, primarily software-based architectures, the issue of security cannot be overlooked. Artificial intelligence is one field researchers are exploring to provide sufficient protection for these new networks, such as 5G (and beyond) and constrained networks such as the IoT. An approach explored in particular by cybersecurity researchers at IMT Lille Douai.

 

In a matter of a few dozen milliseconds, a well-targeted attack can wipe out an entire network and the services that go with it.” Ahmed Meddahi, a research professor at IMT Lille Douai, offers this  frightening reminder about the threats to Internet-of-Things networks in particular. Behind their control screens in their security operations centers, network operators can identify a multitude of diverse attacks in the blink of an eye. Granted, not all attacks are carried out so quickly. But this example is a good illustration of the constraints weighing on the researchers and engineers who develop cyberdefense systems. Such systems must be able to monitor, analyze, sort, detect and react, all in just a few milliseconds.

For this, humans have two complementary technological tools on their side: new network architectures and artificial intelligence. 5G or WPAN (a network technology for the Internet of Things) are based on two important characteristics with cryptic acronyms: SDN — SDN-WISE for the IoT — and NFV. SDN, which stands for software-defined network, “is a network’s capacity to be programmed, configured and controlled in a centralized, dynamic  way,” explains Ahmed Meddahi, who has been working on architecture security for the past several years. As for NFV, “it’s the virtualization of the IT world, adapted to the world of networks. The network functions which were purely hardware-based up to now are becoming software functions.” SDN and NFV are complementary and their primary aim is to reduce the development cycle for telecom services as well as the cost of network operations and maintenance.

Read more on I’MTech: SDN and Virtualization : More Intelligence in 5G networks

As far as cybersecurity is concerned, NFV and SDN could serve as a basis for providing an overview of the network, or could take on a portion of the complexity of IoT networks.  The network operator in charge of security could therefore establish an overall security policy from his control post, with the rules and basic behavior of the network.  He could then allow the network to make its own decisions instantaneously. The goal is to move towards more autonomous network security.

In the event of a threat or an attack, such an organization makes it possible to rapidly deny access or introduce filter rules for computer traffic, and therefore isolate or migrate segments of the network that are under attack.  This sort of architecture or approach is an advantage for effectively responding to threats and making the network more resilient. But sometimes, the speed at which humans can analyze situations and make decisions does not suffice. That’s where artificial intelligence comes in.

Detecting more quickly than humans

It’s one of the major areas of research in cybersecurity: first of all, how can we collect the most relevant information about network activity, out of a huge and widely-varying volume of traffic data, which, on top of that, is ever-growing? And second, how can we detect, identify and isolate an attack that only lasts a fraction of a second, or even anticipate it, to prevent the worst from happening?” says Ahmed Meddahi. New SDN and NFV architectures could help answer this question since these technologies will facilitate the integration of learning algorithms in network control systems. This is another promising new area of research for network and computer security scientists, which is naturally of interest to researchers at IMT Lille Douai.

The first challenge is to choose the right approach. Which algorithms should be used? Supervised, unsupervised or hybrid? And with which data? Traditional learning methods consist of showing the algorithm how the network behaves in a normal situation, and how it behaves in an abnormal situation or when under attack. It will then be able to learn and recognize situations that are almost identical to those it has learned. But there’s a problem: these learning methods based on examples or records are not compatible with the reality of cyberthreats.

Attacks are dynamic and are constantly changing,” says Ahmed Meddahi. “Hackers can get past even the strongest counter-measures and defenses since they change their approach on a regular basis and constantly change their signature.” But with supervised learning, an algorithm is trained with existing attacks, at the risk of quickly becoming outpaced by the attacks of tomorrow.

That’ s why researchers and industry stakeholders are instead focusing on an unsupervised or hybrid learning approach, and even on new AI algorithms designed especially for cybersecurity purposes. In this case, an algorithm would learn by itself what qualifies as normal or abnormal network operation. Rather than detecting the trace or signature of an attack, it will learn how to recognize the conditions in which an attack has occurred in the past, and notify operators if the same conditions occur or are being brought together.

The unsupervised approach also poses another problem: it requires constant learning on the network, which implies a significant cost in terms of resources,” says the IMT Lille Douai researcher. That is precisely the challenge facing scientists: finding a realistic approach to learning in an extremely dynamic, ever-changing environment. If researchers are beginning to work on new security issues for 5G and IoT networks, businesses naturally have high expectations.  With 5G set to launch in France in 2020, operators and managers of these next-generation networks are more concerned than ever about the security of users and their data.

 

Joint AI

Joint AI: a platform to facilitate German-French research in AI

In 2019, The German-French Academy for the Industry of the Future launched the Joint AI platform project. This platform bringing together IMT and the Technical University of Munich, promotes collaboration between researchers and industry to develop artificial intelligence tools. Its secure environment allows for intellectual property protection for the results, and the reproducibility of scientific results.

 

The primary aim is to support artificial intelligence research projects between France and Germany.” This is how Anne-Sophie Taillandier begins her description of the Joint AI platform launched in 2019 by IMT and the Technical University of Munich. Since 2015, the two institutions have been working together through the German-French Academy for the Industry of the Future. This partnership has given rise to a number of research projects, some of which have focused on artificial intelligence. Researchers working in this area face a recurring problem: intellectual property protection for the results.

One of the major risks for AI researchers is presenting their work to academic peers or industry stakeholders and having it stolen,” explains Anne-Sophie Taillandier. For several years, this French artificial intelligence expert has headed IMT’s TeraLab, which aims to facilitate AI research in a secure environment.  “Through discussions with our colleagues at the Technical University of Munich, we realized that we each had infrastructures to host and develop AI projects, but that there was no transnational equivalent,” she explains. This gave rise to the Joint AI platform project: as a shared, reliable, protected site for German-French research on artificial intelligence.

Read more on I’MTech: TeraLab, a European Data Sanctuary

The platform is based on technological and legal tools. The hardware architecture and workspaces are designed to host data and work on it with the desired security level. Using a set of APIs, the results of a project can be highlighted and shared on both sides of the border, without having to move the data or the software developed. “Everyone can work with confidence, without having to provide access to their executable or data,” says Anne-Sophie Taillandier.

A tool for researchers…

For researchers working on AI — as well as other scientific disciplines — facilitating cooperation means facilitating the progress of research projects and results. This is especially true for all research related to Industry 4.0, as is the case for the German-French Academy for the Industry of the Future projects that the Joint AI platform currently hosts. “Research on industry involves complex infrastructures, made up of human users and sensors that link the physical and digital dimensions,” says  Georg Carle, holder of the Network Architectures and Services Chair at the Technical University of Munich, and co-director of the project with Anne-Sophie Taillandier.

He explains that, “In order to be valuable, this research must be based on real data and generate realistic models.” And the more the data is shared and worked on by different teams of researchers,  the more effective the resulting algorithms will be. For Georg Carle, “the Joint AI platform makes it possible to improve the reproducibility of results” between the French and German teams. “This leads to higher-quality results, with a bigger impact for the industry stakeholders.”

And for companies!

In addition to providing a collaborative tool for researchers, the Joint AI platform also provides innovation opportunities for companies involved in partnership-based research. When a German industry stakeholder seeks to collaborate with French researchers or vice versa, the legal constraints for moving data represent a major hurdle. Such collaboration is further limited by the fact that, even within the same large company, it can be difficult for the French and German branches to exchange data. “This can be for a variety of reasons: human resources personal data, data related to industrial property, or data concerning clients with whom there is a confidentiality guarantee,” says Anne-Sophie Taillandier.

Companies therefore need a secure location, from both a technological and legal standpoint, to facilitate joint research. Joint AI therefore makes it easier for private stakeholders to take part in research projects at the European level, such as Horizon 2020 framework program projects — or Horizon Europe for future European research projects as of next year. Such a platform offers a prototype for a solution to one of the biggest problems facing AI and digital innovation: secure data sharing between different stakeholders.

Also read on I’MTech:

5G!Drones, 5G Drones

Putting drones to the 5G test

Projets européens H20205G!Drones, a European project bringing together industrialists, network operators and research centers, was launched in June 2019 for a three-year period. It should ultimately validate the use of 5G for delivery services by drone. Adlen Ksentini, a researcher at EURECOM, a key partner in the project, explains the challenges involved.

 

What was the context for developing the European 5G!Drones project?

Adlen Ksentini: The H2020 5G!Drones project is funded by the European Commission as part of phase 3 of the 5G PPP projects (5G Infrastructure Public Private Partnership). This phase aims to test  use cases for vertical industry applications (IoT, industry 4.0, autonomous cars etc.) on 5G test platforms. 5G!Drones focuses on use cases involving flying drones, or Unmanned Aerial Vehicles (UAV), such as for transport of packages, extension of network coverage with drones, public security etc.

What is the aim of this project?

AK: The aim is twofold. First, to test eight use cases for UAV services on 5G platforms located in Sophia Antipolis, Athens (Greece), Espoo and Oulu (Finland) to collect information that will allow us to validate the use of 5G for a wider roll-out of UAV services. And second, the project seeks to highlight the ways in which 5G must be improved to guarantee these services.

What technological and scientific challenges do you face?

AK: A number of obstacles will have to be overcome during the project: these obstacles are related to safeguarding drone flights . To fly drones, certain conditions are required. First, there has to be a reliable network with low latency, since remote control of the drones requires low latency in order to correct the flight path and monitor the drones’ position in real time. And there also has to be strong interaction between the U-Space service (see box) and the network operator to plan flights and check conditions: weather, availability of network coverage etc. In addition to these obstacles to be overcome, the 5G !Drones project will develop a software system that will be placed above the platforms, to automate the trials and display the results in real time.

[box type=”info” align=”” class=”” width=””]

The U-Space service is in charge of approving the flight plan submitted by drone operators. Its job is to check whether the flight plan is feasible, meaning ensuring that there are no other flights planned on the selected path and determining whether the weather conditions are favorable.

[/box]

How are EURECOM researchers contributing to this project?

AK: EURECOM is a key partner in the project. EURECOM will provide its 5G testing platform based on its OpenAirInterface (OAI) tool, which provides Network Function Virtualization (NFV) and Multi-access Edge Computing (MEC) solutions. It will host two trials on public safety using flying drones, led by partners representing the vertical industry. In addition, EURECOM will be studying and proposing a solution for developing a 5G network dedicated to UAVs, based on the concept of network slicing.

Who are your partners and what collaborations are important for you?

AK: The project counts 20 partners, including network operators (Orange France and Poland, COSMOTE), specialists in the UAV field (Alerion, INVOLI, Hepta Airborne’s, Unmanned System Limited, CAFA Tech, INVOLI, Frequentis, DRONERADAR), industrial groups (NOKIA, Thalès and AIRBUS), a SME (INFOLYSIS) and research centers and universities (Oulu University, Aalto University, DEMOKRITOS, EURECOM), as well as the municipality of Egaleo in Greece. EURECOM is playing a central role in the project with UAV vertical industry partners by collaborating with all the members of the consortium and acting as a liaison between the UAV vertical industry partners, industrial groups and network operators.

What are the expected benefits of the project?

AK: In addition to the scientific benefits in terms of publications, the project will allow us to verify whether 5G networks are ready to deliver UAV services. Feedback will be provided to 3GPP standards organizations, as well as to the authorities that control the airspace for UAVs.

What are the next important steps for the project?

AK: After a first year in which the consortium focused on studying an architecture that would make it possible to establish a link between the vision of UAV industry stakeholders and 5G networks, as well as a detailed description of the use cases to be tested, the project will be starting its second year, which will focus on deploying the tests on the various sites and then begin the testing.

Learn more about the 5G!Drones project

Interview by Véronique Charlet for I’MTech

 

OligoArchive

DNA as the data storage medium

Projets européens H2020By 2025 the volume of data produced in the world will have reached 250 zettabytes (1 zettabyte = 1021 bytes). Current storage media have insufficient storage capacity or suffer from obsolescence. Preserving even a fraction of this data means finding a storage device with density and durability characteristics significantly superior to those of existing systems. The European OligoArchive project, launched in October 2019 for three years, proposes to use DNA (DeoxyriboNucleic Acid) as a storage medium. Raja Appuswamy, researcher at EURECOM partner of the project, explains further.

 

In what global context did the European OligoArchive project come about?

Raja Appuswamy Today, everything in our society is driven by data. If data is the oil that fuels the metaphorical AI vehicle, storage technologies are the cog that keep the wheel spinning. For decades, we wanted fast storage devices that can quickly deliver data, and optical, magnetic, and solid state storage technologies evolved to meet this requirement. As data-driven decision becomes a part of our society, we are increasingly faced with a new need–one for cheap, long-term storage devices that can safely store the collective knowledge we generate for hundreds or even thousands of years. Imagine you have a photograph that you would like to pass down to your great-great grand children. Where would you store it? How much space would it take? How much energy would it use? How much would it cost? Would your storage media still be readable two generations from now? This is the context for project OligoArchive.

What is at stake in this project?

RA Today, tape drives are the gold standard when it comes to data archival across all disciplines, from Hollywood movie archives to particle accelerator facilities. But tape media suffers from several fundamental limitations that makes it unsuitable for long-term data storage. First, the storage density of tape -the amount of data you can store per inch- is improving at a 30% rate annually; archival data, in contrast, that has a growth rate of 60%. Second, if one stores 1PB in 100 tape drives today, within five years, it would be possible to store the same data in just 25 drives. While this might sound like a good thing, using tape for archival storage implies constant data migration with each new generation of tape, and such migrations cost millions of dollars.

This problem is so acute that Hollywood movie archives have openly admitted that we are living in a dead period during which the productions of several independent artists will not be saved for the future! At the rate at which we are generating data for feeding our AI machinery, enterprises will soon be at this point. Thus, the storage industry as a whole has come to the realization that a radically new storage technology is required if we are to preserve data across generations.

What will be the advantages of the technology developed by OligoArchive?

RA Project OligoArchive undertakes the ambitious goal of retasking DNA–a biological building block–to function as a radically new digital storage media. DNA possesses three key properties that make it relevant for digital data storage. First, it is an extremely dense three-dimensional storage medium that has the theoretical ability to store 455 Exabytes in 1 gram. The sum total of all data generated world wide (global datasphere) is projected to be 175 Zettabytes by 2025. This could be stored in just under half a kilogram of DNA. Second, DNA can last several millenia as demonstrated by experiments that have the read DNA of ancient, extinct animal species from fossils that are dated back thousands of years. If we can bring back the wolly mammoth to life from its DNA, we can store data in DNA for millenia. Third, the density of DNA is fixed by nature, and we will always have the ability and the need to read DNA–everything from archeology to precision medicine depend on it. Thus, DNA is an immortal storage medium does not have the media obsolescence problem and hence, can never become out dated unlike other storage media (remember floppy disks?).

What expertise do EURECOM researchers bring?

The Data Science department at EURECOM is contributing to several aspects of this project. First, we are building on our deep expertise in storage systems to architect various aspects of using DNA as a storage media, like developing solutions for implementing a block abstraction over DNA, or providing random access to data stored in DNA. Second, we are combining our expertise in data management and machine learning to develop novel, structure-aware encoding and decoding algorithms that can reliably store and retrieve data in DNA, even though the underlying biological tasks of synthesis (writing) and sequencing (reading) introduce several errors.

Who are your partners and what are their respective contributions?

The consortium brings together a truly multi-disciplinary group of people with diverse expertise across Europe. Institute of Mollecular and Cellular Pharmacology (IPMC) in Sophia Antipolis, the home to the largest sequencing facility in the PACA region, is a partner that contributes its biological expertise to the project. Our partners at I3S, CNRS, are working on new compression techniques customized for DNA storage that will drastically reduce the amount of DNA needed to store digital content. Our colleagues at Imperial College London (UK) are building on our work and pushing the envelope further by using DNA not just a storage media, but a computational substrate by showing that some SQL database operations that run in-silico (on a CPU) today can be translated efficiently into in-vitro biochemical reactions directly on DNA. Finally, we also have HelixWorks, a startup from Ireland that specializes is investigating novel enzymatic synthesis techniques for reducing the cost of generating DNA, as an industrial partner.

What results are expected and ultimately what will be the applications?

The ambitious end goal of the project is to build a DNA disk–a fully working end-to-end prototype that shows that DNA can indeed function as a replacement for current archival storage technology like tape. Application wise, archival storage is a billion dollar industry, and we believe that DNA is a fundamentally disruptive technology that has the potential to reshape this market. But we believe that our project have an impact on areas beyond archival storage.

First, our work on DNA computation opens up an entirely new field of research on near-molecule data processing that mirrors the current trend of moving computation closer to data to avoid time-consuming data movement. Second, most of the models and tools we develop for DNA storage are actually applicable for analyzing genetic data in other contexts. For instance, the algorithm we are developing for reading data back from DNA provides a scalable solution for sequence clustering–a classic computational genomics problem with several applications. Thus, our work will also contribute to advances in computational genomics.

Learn more about OligoArchive

Rémi Sharrock

C in your Browser

In the academic world, teaching and carrying out research often go hand-in-hand. This is especially true for Rémi Sharrock, a computer science researcher at Télécom Paris, who has developed a C Language learning program comprising 7 MOOCs. The teaching approach used for his online courses called for the development of innovative tools, drawing on the researcher’s expertise. Rémi Sharrock was rewarded for this work in November 2019 by the edX platform, a leading global MOOC provider, who presented him with the its 2019 edX Prize. He talked to us about the story behind this digital learning program developed in partnership with Dartmouth College in the United States.

 

What led you to undertake research in order to create an online learning program?

Rémi Sharrock: The original aim was to propose a new way of learning C language. To do so, we had to develop a number of tools that didn’t exist at the time. This work carried out with Dartmouth College gave rise to research opportunities. Our goal was always to facilitate  exchange with the learner, and to make it a central part of the learning process. The tools we developed made it possible to carry out learning activities directly on the user’s computer, with many features that had never been seen before.

What are some examples of the tools you developed?

RS: The idea of a MOOC is that it’s open to as many people as possible. We didn’t know what type of computer users would connect with, or what operating system or browser they would use. But regardless of their system, we had to be able to provide users with a high-quality learning experience. The first tool we developed for this was WebLinux. It met the challenge of being able to code in C Language with Linux from any computer, using any browser. We didn’t want to make learners download an application, since that could discourage beginners. WebLinux therefore allowed us to emulate Linux for everyone, directly on the web-based learning platform.

How did you do this from a technical perspective?

RS: Technically, we run Linux directly in the browser, without going through a server. To do so, we use an openRisc processor emulator that is run in the browser, and a Linux that is compatible with this type of processor. That allows us to do without servers that run Linux, and therefore operate on a large scale with limited server resources.

That’s an advantage in terms of access to education, but does the tool also facilitate educational activities?  

RS: For that part we had to develop an additional tool, called Codecast. It’s a C language emulator that runs on any browser and is synchronized with the professor’s audio explanation. It was a real challenge to develop this tool because we wanted to make it possible for anyone to run C language instructions directly on their browser, without having to go through a remote computer server, or use third party software on their computer. We created a specialized C language interpreter for the web, which works with all browsers. When you’re watching the professor’s course in the video, you can directly edit lines of code and run them in your browser, right from the course web page. And on top of that, when the teacher integrates an instruction to be learned and tested that he’s sent you as part of the lesson, you can pause the video, edit the instruction and try different things, then resume the video without any consequences.

You also responded to another challenge with this type of MOOC: assessing learners.

RS: Yes, with a third tool, Taskgrader. In a traditional classroom course, the teacher assesses codes proposed by students one by one, and corrects them. This is inconceivable with a MOOC since you have tens or hundreds of thousands of learners to correct.  Taskgrader makes it possible to automatically assess students’ codes in real time, without the professor having to look them over, by providing personalized feedback.

Do all these tools have applications outside the scope of the MOOC C language learning program?

RS: Codecast could be of interest to big community-driven development websites like  Github. Amateur and professional developers share bits of code for applications on the website. But cooperation is often difficult: to correct someone’s code you have to download the incorrect version, correct it, then send it back to the person who then has to download it again. An emulator in the browser would make it possible to work directly online in real time. And as for Taskgrader, it’s a valuable tool for all computer language teachers, even outside the world of MOOCs.

Is your research work in connection with these MOOCs over now that the learning program has been completed?  

RS: No, since we’ve also committed to a second type of research. We’ve teamed up with Cornell and Stanford universities to carry out large-scale sociological experiments on these MOOC learners in an effort to better understand our learner communities.

What kind of research are you conducting to that end?

RS: We have 160,000 learners in the MOOC program worldwide from a wide range of social, ethnic and demographic backgrounds. We wanted to find out whether there are differences in the way in which men and women learn, for example, or between older and younger people. We therefore implement the differences in the given courses according to individuals’ profiles, based on A/B testing – the sample of learners is split in two, and each group has a learning parameter that changes, such as the teacher’s age, voice or gender. This should eventually allow us to better understand learning processes and adapt them to provide each individual with a program that facilitates knowledge transfer.

connected devices

A dictionary for connected devices

The field of connected devices is growing at a staggering pace across all industries. There is a growing need to develop a communication standard, meaning a ‘common language’ that different smart systems could understand and interpret. To contribute to this goal, ETSI (European Telecommunications Standards Institute) is funding a European project in which Mines Saint-Étienne researchers Maxime Lefrançois and Antoine Zimmermann[1] are taking part.

 

In order to work together, connected devices must be able to communicate with one another. This characteristic, known as ‘semantic interoperability,’ is one of the key challenges of the digital transition. To be effective, semantic interoperability must be based on the adoption of an agreed-upon set of best practices. This would culminate in the creation of a standard adopted by the IoT community. At the European level, ETSI (European Telecommunications Standards Institute) is in charge of setting standards for information and communication technologies. “For example, ETSI standardized the SIM card, which acts as an identifier in mobile phone networks to this day,” explains Maxime Lefrançois. He and his colleague Antoine Zimmermann are researchers at Mines Saint-Étienne and specialize in the semantic web and knowledge representation. They are taking part in the STF 578 project on the interoperability of connected devices funded by ETSI, in partnership two researchers from Universidad Politécnica de Madrid.

“Instead of proposing a standard that strictly defines the content of communications between connected devices, we define and formally identify the concepts involved, through what is known as an ontology,” says Antoine Zimmermann. This provides IoT players with greater flexibility since the content of messages exchanged may use the language and format best suited to the device, as long as an explicit link is made with the concept identified in the reference ontology. The two researchers are working on the SAREF reference ontology (Smart Applications Reference Ontology), a set of ETSI specifications which include a generic base and specializations for the various sectors related to the IoT: energy, environment, building, agriculture, smart cities, smart manufacturing, industry and manufacturing, water, automotive, e-health, wearables.

“The SAREF standard describes smart devices, their functions and the services they provide, as well as the various properties of the physical systems these devices can control,” explains Maxime Lefrançois. For example, a light bulb can say, “I can provide light” by using a concept defined by SAREF. A system or application may then refer to the same lighting concept to tell the object to turn on. “Ultimately, this knowledge should be described following the same standard models within each industry to facilitate harmonization between industries.” adds the researcher. The aim of the project is therefore to develop a public web portal for the standard SAREF ontology to facilitate its adoption by companies and collect their feedback and suggestions for improvement.

A specially-designed ‘dictionary’

“The SAREF public web portal is a little bit like a ‘dictionary’ for connected devices,” explains Maxime Lefrançois. “If we take the example of a water heater that can measure energy consumption and can be remotely-controlled, SAREF will describe its possible actions, the services it can provide, and how it can be used to lower energy costs or improve household comfort.” But his colleague Antoine Zimmermann explains, “It isn’t a dictionary in the traditional sense. SAREF specifies in particular the technical and IT-related constraints we may encounter when communicating with the water heater.”

Imagine if one day all water heaters and heat pumps were connected to the IoT and could be remotely controlled. They could then theoretically be used as an energy resource that could ensure the stability and energy efficiency of the country’s electricity grid. If, in addition, there was a uniform way to describe and communicate with these devices, companies in the smart building and energy sectors would waste less time individually integrating products made by different manufacturers. They could then focus instead on developing innovative services connected to their core business, giving them a competitive advantage. “The goal of semantic interoperability is to develop a service for a certain type of smart equipment, and then reuse this service for all similar types of equipment,” says Maxime Lefrançois. “That’s the heart of SAREF”.

Read more on I’MTech: How the SEAS project is redefining the energy market

At present, the existing standards are compartmentalized by sector. The energy industry has standards for describing and communicating with the electrical equipment of a water tower, but the water tower must then implement different standards to interface with other equipment in the water distribution network. “There are several different consortia for each sector,” explain the researchers, “but we now have to bridge the gap between these consortia, in order to harmonize their standards.” Thus the need for a ‘dictionary,’ a common vocabulary that can be used by connected devices in all industries.

Take the example of automotive manufacturers who are developing new batteries for electric vehicles. Such batteries could theoretically be used by energy suppliers to regulate the voltage and frequency of the electricity grid. “The automotive and energy industries are two sectors that had absolutely no need to communicate until now,” says Maxime Lefrançois, “in the future, they may have to work together to develop a common language, and SAREF could be the solution.”

A multilingual ‘dictionary’

The IoT community is currently engaged in something of a ‘standards war’ in which everyone is developing their own specification and hoping that it will become the standard. Impetus from public authorities is therefore needed to channel the existing initiatives  — SAREF at the European level. “We can well imagine that in the future, there will only be a single, shared vocabulary for everyone,” says Antoine Zimmermann. “But we may find ourselves with different vocabularies being developed at the same time, which then remain. That would be problematic. This is how it is today, for example, with electrical outlets. A machine intended to be used in the United States will not work with European outlets and vice versa.”

“The development of the SAREF public web portal is an important step since it encourages companies to take part in creating this dictionary,” adds Maxime Lefrançois. The more companies are involved in the project, the more comprehensive and competitive it will be. “The value of a standard is related to the size of the community that adopts it,” he says.

“The semantic web is particularly useful in this respect,” says Antoine Zimmermann, “it allows everyone to agree. Companies are all engaged in digital transformation and use the web as a common platform to get in touch with clients and partners. They use the same protocols. We think the semantic web is also a good way to build these common vocabularies that will work in various sectors. We aren’t looking for the right solution, but to demonstrate best practices and make them more widespread so that companies look beyond their own community.” 

A collaborative ‘dictionary’

The researchers’ work also involves developing a methodology for building this standard: a company must be able to suggest a new addition to the vocabulary that is highly specific to a certain field, while ensuring that this contribution aligns with the standard models and best practices that have been established for the entire ‘dictionary.’

“And that’s the tricky part,” says Maxime Lefrançois. How can the SAREF public portal be improved and updated to make sure that companies use it? “We know how to write ‘dictionaries’ but supporting companies is no simple task.” Because there are a number of constraints involved: all these different vocabularies and jargons must be assimilated, and companies may not necessarily be familiar with them.

“So we have to reinvent collaborative support methods for this dictionary. That’s where DevOps approaches implemented for software development are useful,” he says. These approaches make it possible to automatically check the suggestions based on a set of quality criteria, then automatically make a new version of the portal available online if the criteria are  fulfilled. “The goal is to shorten SAREF development cycles while maintaining an optimal level of quality,” concludes the researcher.

There are other hurdles to overcome to get the connected devices themselves to ‘speak SAREF,’ due to the specific limitations of connected devices –  limited storage and computing capacity, low battery life, limited bandwidth, intermittent connectivity. The use of ontologies for communication and ‘reasoning’ was first thought up without these constraints, and must be reinvented for these types of ‘edge computing’ configurations. These issues will be explored in the upcoming ANR CoSWoT project (Constrained Semantic Web of Things) which will include researchers from LIRIS, Mines Saint-Étienne, INRAE (merger of INRA and IRSTEA), Université Jean-Monnet and the company Mondeca.

 

[1] Maxime Lefrançois and Antoine Zimmermann are researchers at the Laboratory Hubert Curien, a joint research unit between CNRS/Mines Saint-Étienne/Université Jean Monnet.

immune system

Understanding the resilience of the immune system through mathematical modeling

Gaining insight into how the immune system works using mathematics is the ultimate goal of the research carried out by IMT Atlantique researcher Dominique Pastor, along with his team. Although the study involves a great degree of abstraction, the scientists never lose sight of practical applications, and not only in relation to biology.

 

In many industries, the notion of “resilience” is a key issue, even though there is no clear consensus on the definition of the term. From the Latin verb meaning “to rebound,” the term does not exactly refer to the same thing as resistance or robustness. A resilient system is not unaffected by external events, but it is able to fulfill its function, even in a degraded mode, in a hostile environment. For example, in computer science, resilience means the ability to provide an acceptable level of services in the event of a failure

This capacity is also found in the human body ­— and in general, in all living beings. For example, when you have a cold, your abilities may be reduced, but in most cases you can keep living more or less normally.

This phenomenon is regularly observed in all biological systems, but remains quite complex. It is still difficult to understand how resilience works and the set of behaviors to which it gives rise.

A special case of functional redundancy: degeneracy

It was through discussions with Véronique Thomas-Vaslin, a biologist at Sorbonne University, that Dominique Pastor, a telecommunications researcher at IMT Atlantique, became particularly aware of this property of biological systems. Working with Roger Waldeck, who is also a researcher at IMT Atlantique, and PhD student Erwan Beurier, he carried out research to mathematically model this resilience, in order to demonstrate its basic principles and better understand how it works.

To do so, they drew on publications by other scientists, including American biologist Gerald Edelman (Nobel prize winner for medicine in 1972), underscoring another property of living organisms: degeneracy. (This term is usually translated in French as dégénérescence, which means ‘degeneration,’ but this word is misleading). “Degeneracy” refers to the ability of two structurally different elements to perform the same function. It is therefore a kind of functional redundancy, which also implies different structures. This characteristic can be found at multiple levels in living beings.

For example, amino acids, which are the building blocks of essential proteins, are produced from “messages” included in portions of DNA. More specifically, each message is called a “codon”: a sequence of three molecules, known as nucleotides. However, there are 4 possible nucleotides, meaning there are 64 possible combinations, for only 22 amino acids. That means that some codons correspond to the same amino acid: a perfect example of degeneracy.

My hunch is that that degeneracy is central to any resilient system,” explains Dominique Pastor. “But it’s just a hunch. The aim of our research is to formalize and test this idea based on mathematical results. This can be referred to as the mathematics of resilience.”

To this end, he relied on the work of French mathematician Andrée Ehresmann, Emeritus Professor at the University of Picardie Jules Verne, who established a mathematical model of degeneracy, known as the “Multiplicity Principle,” with Jean-Paul Vanbremeersch, an Amiens-based physician who specializes in gerontology.

Recreating resilience  in the form of mathematical modeling

Dominique Pastor and his team therefore started out with biologists’ concrete observations of the human body, and then focused on theoretical study. Their goal was to develop a mathematical model that could imitate both the degeneracy and resilience of the immune system in order to “establish a link between the notion of resilience, this Multiplicity Principle, and statistics.” Once this link was established, it would then be possible to study it and gain insight into how the systems work in real life.

The researchers therefore examined the performance of two categories of statistical testing, for a given problem, namely to detect a phenomenon. The first category is called “Neyman-Pearson testing,” and is optimal for determining whether or not an event has occurred. The second category, RDT, (Random Distortion Testing), is also optimal, but for a different task: detecting whether an event has moved away from an initial model.

The two types of procedures were not created with the same objective. However, the researchers  successfully demonstrated that RDT testing could also be used, in a “degenerative” manner, to detect a phenomenon, with a comparable performance to Neyman-Pearson testing. That means that in the theoretical case of an infinite amount of data, they can detect the presence or absence of a phenomenon with the same level of precision. The two categories therefore perform the same function, although they are structurally different. “We therefore made two sub-systems in line with the Multiplicity Principle,” concludes the IMT Atlantique researcher.

What’s more, the nature of RDT testing gives it an advantage over Neyman-Pearson testing since the latter only works optimally when real events follow a certain mathematical model.  If this is not the case — as so often happens in nature — it is more likely to be incorrect. RDT testing can adapt to a variable environment, since it is designed to detect such variations, and is therefore more robust. Combining the two types of testing can result in a system with the inherent characteristics of resilience, meaning the ability to function in a variety of situations.

From biology to cybersecurity

These findings are not intended to remain confined to a theoretical universe. “We don’t work with theory for the sake of theory,” says Dominique Pastor. “We never forget the practical side: we continually seek to apply our findings.” The goal is therefore to return to the real world, and not only in relation to biology. In this respect, the approach is similar to that used in research on neural networks – initially focused on understanding how the human brain works, it ultimately resulted in systems used in the field of computer science.

The difference is that neural networks are like black boxes: we don’t know  how they make their decisions,” explains the researcher. “Our mathematical approach, on the other hand, provides an understanding of the principles underlying the workings of another black box: the immune system.” This understanding is also supported by collaboration with David Spivak, a mathematician at MIT (United States), again, in the field of mathematical modeling of biological systems.

The first application Dominique Pastor is working on falls within the realm of cybersecurity. The idea is to imitate the resilient behavior of an immune system for protective purposes. For example, many industrial sites are equipped with sensors to monitor various factors (light, opening and closing of doors, filling a container etc.) To protect these devices, they could be combined with a system to detect external attacks. This could be made up of a network, which would receive data recorded by the sensors and run a series of tests to determine whether there has been an incident. Since these tests could be subject to attacks themselves, they would have to be resilient in order to be effective – hence the importance of using different types of tests, in keeping with the previously obtained results.

For now it is still too early to actually apply these theories. It remains to be proven that the Multiplicity Principle is a sufficient guarantee of resilience, given that this notion does not have a mathematical definition as of today. This is one of Dominique Pastor’s ambitions. The researcher admits that it is still his “pipe dream” and says, “My ultimate goal would still be to go back to biology. If our research could help biologists better understand and model the immune system, in order to develop better care strategies, that would be wonderful.”

plastics

A sorting algorithm to improve plastic recycling

Producing high-quality raw materials from waste is contingent on effective sorting. Plastics from waste electrical and electronic equipment (WEEE) are no exception. To help solve this problem, researchers at IMT Mines Alès have developed a selective automation algorithm designed for these plastics. It can be integrated in new industrial-scale sorting machines.

 

How will your coffee maker be reincarnated after it dies? This electrical appliance composed primarily of plastic, metal and glass falls into the category of waste electrical and electronic equipment (WEEE). Your smartphone and washing machine are also included in this category. After it is thrown away, the coffee maker will find itself drowning in what amounts to over 750,000 tons of WEEE collected every year in France, before it is recovered by a specialized recycling center. There, it is dismantled, crushed and separated from its ferrous and non-ferrous metals, such as copper or aluminum, until all that’s left of the machine is a heap of plastic. Plastic is the second-largest component of WEEE after steel, so recycling it is a major concern.

And successful recycling starts with effective sorting. 20% of plastic materials are recovered through flotation after being placed in a tank filled with water. But how are the remaining 80% be processed? “Samples measuring 1 cm² are placed on a converyor belt equipped with an infrared camera at the end, which scans the material and determines what type of plastic it’s made of,”  says Didier Perrin, a physical chemist at IMT Mines Alès. The radiation excites the atomic bonds of the molecules and creates a spectral signature that characterizes the plastic to be identified. A technique using a near infrared source (NIRS) is especially rapid but cannot be used to identify dark plastics, which absorb the radiation. But black plastic, which holds up over time better than colored plastic, represents nearly 50 % of the waste. “Accurate and effective identification of the material is therefore crucial to generate high-quality raw material to be recycled, combining purity and mechanical performance,” adds the researcher. However, this method does not always make it possible to determine the exact type of plastic contained within a sample.

An automated sorting algorithm

Researchers at IMT Mines Alès have therefore developed an automated method for sorting plastic by working with SUEZ and Pellenc ST, a company that develops smart, connected sorting machines. The focus of their collaboration was on establishing a classification of the plastics contained in WEEE. The researchers generated a database in which each plastic has its own clearly-defined spectral identity. WEEE were therefore divided into four major families: ABS (acrylonitrile butadiene styrene), a polymer commonly used in industry which represents 50 to 60% of plastic waste (cases, covers, etc.); HIPS (high-impact polystyrene), which are similar to ABS but less expensive and with lower mechanical performance (refrigerator racks, cups); polypropylene (material which is more ductile than ABS and HIPS (soft covers for food containers, cups)); and what is referred to as ‘crystal’ polystyrene (refrigerator interior, clear organic glass).

Their first step was to better recognize the plastics to be sorted. “We used a supervised learning method on the data measured in the laboratory and then analyzed the same samples in industrial conditions,” explains PhD student Lucie Jacquin. Nevertheless, it is not always easy to characterize the type of plastic contained in waste. First of all, plastic degrades over time, which modifies its properties and makes it difficult to identify. And second, industrial conditions — with 3,000 kg of waste analyzed per hour — often result in incomplete spectral measurements.

Beyond the uncertainties of the measurements, the most traditional sorting methods also have their flaws. For example, they are based on probabilistic classification algorithms, which are used to determine how similar a sample is to those in a reference database. Except that these algorithms do not distinguish between equiprobability and ignorance. In the event of equiprobability, the spectrum of a sample is 50% similar to the spectrum of plastic A and 50% similar to that of plastic B. In the event of ignorance, even though the spectrum of a sample is not similar to any element within the database, the algorithm gives the same result as in the event of equiprobability (50% A and 50% B). So how can it be determined whether the information provided by the algorithm reflects uncertainty or ignorance? The researchers’ aim is therefore to better manage uncertainty in measurements in real conditions.

Understanding the material to recycle it better

We approached this problem using modern uncertainty theories, which allow us to better represent uncertainty in the classification of a sample, based on the uncertainty in its spectrum obtained in real conditions. Belief functions can distinguish between equiprobability and ignorance, for example,” explains computer science researcher Abdelhak Imoussaten. The algorithm attempts to determine the class of plastic to which a sample belongs. When there is a doubt, it determines the set of classes of plastic to which it may belong and eliminates the others. For example, we can be sure that a sample is either ABS or HIPS, but definitely not polypropylene. “In this way, we use ‘cautious’ machine learning to control what the machine will send to the sorting bins,” adds Abdelhak Imoussaten. Since that’s the real goal: determining to which sorting bin these small bits of plastic will be sent in an automated way.

Each category of plastic accepts a certain quantity of other plastics without affecting the matrix of the recycled material,” says Didier Perrin. In practice, this means that it is possible to send a plastic to a sorting bin with some certainty, even if the exact type of plastic is unclear (A or B but not C). While completing his PhD at IMT Mines Alès under the supervision of Didier Perrin, Charles Signoret studied all the possible mixtures of the various plastics and their compatibility. For example, ABS may only contain 1% polypropylene in its structure in order to maintain its mechanical properties, but it may contain up to 8% HIPS.

While the presence of impurities is inevitable in recycling, the researchers consider a sorting  method to be effective when it results in materials with 5% impurities or less. One thing is certain: the collaborative work of the researchers, SUEZ and Pellenc ST has proved to be effective in terms of sorting quality. It has already resulted in a demonstration machine which will subsequently be implemented in the production of new sorting machines.

Improving the effectiveness of sorting systems is crucial to the economic viability of the recycling industry. The ADEME estimates that 1.88 million tons of household appliances are brought to the market every year in France. These products will eventually have to be sorted in order to provide high-quality material to produce future equipment for this ever-growing market. “Our goal is also to ensure that the term ‘recycled,’ when referring to plastics, does not mean low-quality, as has already been achieved with glass and steel, two recycled materials whose quality is no longer questioned,” concludes Didier Perrin.

 

Article written in French by Anaïs Culot for I’MTech

 

Hiboo

Tracking mobile industrial equipment in real time with Hiboo

The Hiboo start-up was incubated at Télécom Paris and provides a platform to help companies better manage their mobile assets: equipment, merchandise, vehicles etc. This solution is now widely used in the construction industry.

 

In 2016, the start-up Hiboo, which was incubated at Télécom Paris at the time, created a connected device in order to bring this type of equipment to the construction industry. But the industry was already facing an overwhelming amount of unused data and resolving this problem was not one of its top priorities. Instead, the sector sought to solve an ongoing problem – although it generated significant annual revenue, its profit margins remained low.

We started out with the idea that one of the best ways to optimize this profit margin was to better understand the equipment’s activity using the available data,” explains François Jacob, co-founder of Hiboo. The start-up therefore made a radical shift to become a super aggregator of data sources. This transformation gave rise to its current platform, which helps companies manage their operations more effectively.

Hiboo helps collect and sort data

On paper, all construction companies face the same problems when it comes to mobile industrial assets: renting equipment, inventory, time spent on site, consumption, identifying machine problems etc. But on site, they lack an overall vision, and taking inventory of their equipment takes a long time and it is not always very thorough. Hiboo collects information provided by three categories of equipment: connected vehicles, unpowered equipment, and non-connected equipment containing an onboard computer.

In the construction industry, companies may manage thousands of pieces of equipment at the same time. Such equipment may include some twenty brands of connected vehicles. However, if a company wants to understand how each brand fits into their overall equipment operations, users must log into each brand’s individual platform to retrieve this information, which is impossible to do on a daily basis.

Hiboo solves this problem by aggregating key data such as GPS coordinates, energy consumption and machine error codes by logging in to all of the manufacturers’ individual servers in the client’s place. The data are then harmonized before being automatically analyzed by ‘workers’. These robots isolate outliers, such as a vehicle that has consumed 2,500 liters of fuel in one day. The process is then checked by engineers at Hiboo who send a final report to the clients. Users may therefore access all operations inputs and outputs for connected equipment on a single website.

Solutions hidden in data

Hiboo also equips unpowered equipment such as crane parts, dumpsters and trailers with connected devices that communicate via low-frequency networks. They are energy-efficient, and make it possible to obtain GPS coordinates and track equipment’s activity over a number of years. The information is sent to Hiboo using traditional telephone networks. With the help of a partner, the start-up also equips non-connected vehicles with devices in order to collect the information obtained in their on-board computers. “So we provide equipment managers with a comprehensive solution for practically all of their assets,” adds François Jacob.

All of this data is made available to users on the Hiboo platform. But it can also be integrated in applications such as invoicing software. The start-up helped the Swiss company Bernasconi shorten its invoicing process by one week every month by eliminating paper invoices. And a major industrial equipment rental company was able to save up to 700 billable days a month by identifying the over-usage of its equipment. “By processing data from the field, we can help companies revolutionize asset management, maintenance, assignment, invoicing etc.” explains François Jacob.

A versatile technology

Hiboo  wishes to go further in leveraging data, in particular machine error codes and their severity levels. “Using this data and maintenance records, we want to provide predictive maintenance so that we can predict the probability of a machine breaking down,” explains François Jacob. This could involve a failure on a compressor, an oil leak, a motor with low voltage etc. To do so, the start-up team combines information about the errors with the computerized maintenance management systems (CMMS) already used by companies to monitor machines and keep them in good working order.

Although originally intended for the construction industry, Hiboo’s solution can be used for other applications, given its ability to control the flow of data between different networks. For example, the start-up will be covering the Dakar rally in 2020. “By connecting to Marlink, the satellite communication network used to track the rally participants, we can collect information about the various vehicles and track their performance on our platform,” explains François Jacob.

Learn more about Hiboo