Posts

cryptographie, nombres aléatoires, random numbers

Cryptography: what are the random numbers for?

Hervé Debar, Télécom SudParis – Institut Mines-Télécom and Olivier Levillain, Télécom SudParis – Institut Mines-Télécom

The original purpose of cryptography is to allow two parties (traditionally referred to as Alice and Bob) to exchange messages without another party (traditionally known as Eve) being able to read them. Alice and Bob will therefore agree on a method to exchange each message, M, in an encrypted form, C. Eve can observe the medium through which the encrypted message (or ciphertext) C is sent, but she cannot retrieve the information exchanged without knowing the necessary secret information, called the key.

This is a very old exercise, since we speak, for example, of the ‘Julius Caesar Cipher’. However, it has become very important in recent years, due to the increasing need to exchange information. Cryptography has therefore become an essential part of our everyday lives. Besides the exchange of messages, cryptographic mechanisms are used in many everyday objects to identify and authenticate users and their transactions. We find these mechanisms in phones, for example, to encrypt and authenticate communication between the telephone and radio antennas, or in car keys, and bank cards.

The internet has also popularized the ‘padlock’ in browsers to indicate that the communication between the browser and the server are protected by cryptographic mechanisms. To function correctly, these mechanisms require the use of random numbers, the quality (or more precisely, the unpredictability) thereof contributes to the security of the protocols.

Cryptographic algorithms

To transform a message M into an encrypted message C, by means of an algorithm A, keys are used. In so-called symmetric algorithms, we speak of secret keys (Ks), which are shared and kept secret by Alice and Bob. In symmetric algorithms, there are public (KPu) and private (KPr) key pairs. For each user, KPu is known to all, whereas KPr must be kept safe by its owner. Algorithm A is also public, which means that the secrecy of communication relies solely on the secrecy of the keys (secret or private).

Sometimes, the message M being transmitted is not important in itself, and the purpose of encrypting said message M is only to verify that the correspondent can decrypt it. This proof of possession of Ks or KPr can be used in some authentication schemes. In this case, it is important never to use the same message M more than once, since this would allow Eve to find out information pertaining to the keys. Therefore, it is necessary to generate a random message NA, which will change each time that Alice and Bob want to communicate.

The best known and probably most widely used example of this mechanism is the Diffie-Helman algorithm.  This algorithm allows a browser (Alice) and a website (Bob) to obtain an identical secret key K, different for each connection, by having exchanged their respective KPu beforehand. This process is performed, for example, when connecting to a retail website. This allows the browser and the website to exchange encrypted messages with a key that is destroyed at the end of each session. This means that there is no need to keep it (allowing for ease of use and security, since there is less chance of losing the key). It also means that not much traffic will be encrypted with the same key, which makes cryptanalysis attacks more difficult than if the same key were always used.

Generating random numbers

To ensure Eve is unable obtain the secret key, it is very important that she cannot guess the message NA. In practice, this message is often a large random number used in the calculations required by the chosen algorithm.

Initially, generating random variables was used for a lot of simulation work. To obtain relevant results, it is important not to repeat the simulation with the same parameters, but to repeat the simulation with different parameters hundreds or even thousands of times. The aim is to generate numbers that respect certain statistical properties, and that do not allow the sequence of numbers to be differentiated from a sequence that would be obtained by rolling dice, for example.

To generate a random number NA that can be used in these simulations, so-called pseudo-random generators are normally used, which apply a reprocessing algorithm to an initial value, known as the ‘seed’.  These pseudo-random generators aim to produce a sequence of numbers that resembles a random sequence, according to these statistical criteria. However, using the same seed twice will result in obtaining the same sequence twice.

The pseudo-random generator algorithm is usually public. If an attacker is able to guess the seed, he will be able to generate the random sequence and thus obtain the random numbers used by the cryptographic algorithms. In the specific case of cryptography, the attacker does not necessarily even need to know the exact value of the seed. If they are able to guess a set of values, this is enough to quickly calculate all possible keys and to crack the encryption.

In the 2000s, programmers used seeds that could be easily guessed, that were based on time, for example, making systems vulnerable. Since then, to avoid being able to guess the seed (or a set of values for the seed), operating systems rely on a mixture of the physical elements of the system (e.g. processing temperature, bus connections, etc.). These physical elements are impossible for an attacker to observe, and vary frequently, and therefore provide a good seed source for pseudo-random generators.

What about vulnerabilities?

Although the field is now well understood, random number generators are still sometimes subject to vulnerabilities. For example, between 2017 and 2021, cybersecurity researchers found 53 such vulnerabilities (CWE-338). This represents only a small number of software flaws (less than 1 in 1000). Several of these flaws, however, are of a high or critical level, meaning they can be used quite easily by attackers and are widespread.

A prime example in 2010 was Sony’s error on the PS3 software signature system. In this case, the reuse of a random variable for two different signatures allowed an attacker to find the manufacturer’s private key: it then became possible to install any software on the console, including pirated software and malware.

Between 2017 and 2021, flaws have also affected physical components, such as Intel Xeon processors, Broadcom chips used for communications and Qualcom SnapDragon processors embedded in mobile phones. These flaws affect the quality of random number generation.  For example, CVE-2018-5871 and CVE-2018-11290 relate to a seed generator whose periodicity is too short, i.e. that repeats the same sequence of seeds quickly. These flaws have been fixed and only affect certain functions of the hardware, which limits the risk.

The quality of random number generation is therefore a security issue. Operating systems running on newer processors (less than 10 years old) have random number generation mechanisms that are hardware-based. This generally ensures a good quality of the latter and thus the proper functioning of cryptographic algorithms, even if occasional vulnerabilities may arise. On the other hand, the difficulty is especially prominent in the case of connected objects, whose hardware capacities do not allow the implementation of random generators as powerful as those available on computers and smartphones, and which often prove to be more vulnerable.

Hervé Debar, Director of Research and Doctoral Training, Deputy Director, Télécom SudParis – Institut Mines-Télécom and Olivier Levillain, Assistant Professor, Télécom SudParis – Institut Mines-Télécom

This article has been republished from The Conversation under a Creative Commons license. Read the original article.

MuTAS, urban mobility

“En route” to more equitable urban mobility, thanks to artificial intelligence

Individual cars represent a major source of pollution. But how can you transition from using your own car when you live far from the city center, in an area with little access to public transport? Andrea Araldo, researcher at Télécom SudParis is undertaking a research project that aims to redesign city accessibility, to benefit those excluded from urban mobility.

The transport sector is responsible for 30% of greenhouse gas emissions in France. And when we look more closely, the main culprit appears clearly: individual cars, responsible for over half of the CO2 discharged into the atmosphere by all modes of transport.

To protect the environment, car drivers are therefore thoroughly encouraged to avoid using their car, instead opting for a means of transport that pollutes less. However, this shift is impeded by the uneven distribution of public transport in urban areas. Because while city centers are generally well connected, accessibility proves to be worse on the whole in the suburbs (where walking and waiting times are much longer). This means that personal cars appear to be the only viable option in these areas.

The MuTAS (Multimodal Transit for Accessibility and Sustainability) project, selected by the National Research Agency (ANR) as part of the 2021 general call for projects, aims to reduce these accessibility inequalities at the scale of large cities. The idea is to provide the keys to offering a comprehensive, equitable and multimodal range of mobility options, combining public transport with fixed routes and schedules with on-demand transport services, such as chauffeured cars or rideshares. These modes of transport could pick up where buses and trains leave off in less-connected zones. “In this way, it is a matter of improving accessibility of the suburbs, which would allow residents to leave their personal car in the garage and take public transport, thereby contributing to reducing pollution and congestion on the roads”, says Andrea Araldo, researcher at Télécom SudParis and head of the MuTAS project, but formerly a driving school owner and instructor!

Improving accessibility without sending costs sky-high

But how can on-demand mobility be combined with the range of public transport, without leading to overblown costs for local authorities? The budget issue remains a central challenge for MuTAS. The idea is not to deploy thousands of vehicles on-demand to improve accessibility, but rather to make public transport more equitable within urban areas, for an equivalent cost (or with a limited increase).

This means that many questions must be answered, while respecting this constraint. In which zones should on-demand mobility services be added? How many vehicles need to be deployed? How can these services be adapted to different times throughout the day? And there are also questions regarding public transport. How can bus and train lines be optimized, to efficiently coordinate with on-demand mobility? Which are the best routes to take? Which stations can be eliminated, definitively or only at certain times?

To resolve this complex optimization issue, Araldo and his teams have put forward a strategy using artificial intelligence, in three phases.

Optimizing a graph…

The first involves modeling the problem in the form of a graph. In this graph, the points correspond to bus stops or train stations, with each line represented by a series of arcs, each with a certain trip time. “What must be noted here is that we are only using real-life, public data,” emphasizes Araldo. “Other research has been undertaken around these issues, but at a more abstract level. As part of MuTAS, we are using openly available, standardized data, provided by several cities around the world, including routes, schedules, trip times etc., but also population density statistics. This means we are modeling real public transport systems.” On-demand mobility is also added to the graph in the form of arcs, connecting less accessible areas to points in the network. This translates the idea of allowing residents far from the city center to get to a bus or train station using chauffeured cars or rideshares.

L’attribut alt de cette image est vide, son nom de fichier est Schema-1024x462.png.
To optimize travel in a certain area, researchers start by modeling public transport lines with a graph.

…using artificial intelligence

This modeled graph acts as the starting point for the second phase. In this phase, a reinforcement learning algorithm is introduced, a method from the field of machine learning. After several iterations, this is what will determine what improvements need to be made to the network, for example, deactivating stations, eliminating lines, adding on-demand mobility services, etc. “Moreover, the system must be capable of adapting its structure dynamically, according to shifts in demand throughout the day,” adds the researcher. “The traditional transport network needs to be dense and extended during peak hours, but it can contract significantly in off-peak hours, with on-demand mobility taking over for the last kilometers, which is more efficient for lower numbers of passengers.”

And that is not the only complex part. Various decisions influence each other: for example, if a bus line is removed from a certain place, more rideshares or chauffeured car services will be needed to replace it. So, the algorithm applies to both public transport and on-demand mobility. The objective will therefore be to reach an optimal situation in terms of equitable distribution of accessibility.

But how can this accessibility be evaluated? There are multiple methods to do so, but researchers have chosen two adapted methods for graph optimization. The first is a ‘velocity score’, corresponding to the maximum distance that can be traveled from a departure point in a limited time (30 minutes for example). The second is a ‘sociality score’, representing the number of people that one can meet from a specific area, also within a limited time.

In concrete terms, the algorithm will take an indicator as a reference, i.e. a measure of the accessibility for the least accessible place in the area. The aim being to make transport options as equitable as possible, it will aim to optimize this indicator (‘max-min’ optimization), while respecting certain restrictions such as cost. To achieve this, it will make a series of decisions concerning the network, initially in a random way. Then, at the end of each iteration, by analyzing the flow of passengers, it will calculate the associated ‘reward’, the improvement in the reference indicator. The algorithm will then stop when the optimum is reached, or else after a pre-determined period.

This approach will allow it to establish knowledge of its environment, associating each network structure (according to the decisions made) with the expected reward. “The advantage of such an approach is that once the algorithm is trained, the knowledge base can be used for another network,” explains Araldo. “For example, I can use the optimization performed for Paris as a starting point for a similar project in Berlin. This represents a precious time-saver compared to traditional methods used to structure transport networks, in which you have to start each new project from zero.”

Testing results on (virtual) users in Ile-de-France

Lastly, the final phase aims to validate the results obtained using a detailed model. While the models from the first phase aim to reproduce reality, they only represent a simplified version. This is important, given that they will then be used for various iterations, as part of the reinforcement learning process. If they had a very high level of detail, the algorithm would require a huge amount of computing power, or too much processing time.

The third phase therefore involves first delicately modeling the transport network in an urban area (in this case, the Ile-de-France region), still using real-life data, but more detailed this time. To integrate all this information, researchers use a simulator called SimMobility, developed at MIT in a project to which Araldo contributed. The tool makes it possible to simulate the behavior of populations at an individual level, each person represented by an ‘agent’ with their own characteristics and preferences (activities planned during the days, trips to take, desire to reduce walking time or minimize number of changes, etc.). ‎It was based on the work of Daniel McFadden (Nobel Prize for Economics in 2000) and Moshe Ben-Akiva on ‘discrete choice models’, which makes it possible to predict choices between multiple modes of transport.

With the help of this simulator and public databases (socio-demographic studies, road networks, numbers of passengers, etc.), Araldo and his team, in collaboration with MIT, will generate a synthetic population, representing Ile-de-France users, with a calibration phase. Once the model faithfully reproduces reality, it will be possible to submit it to the new optimized transport system and simulate user reactions. “It is important to always remember that it’s only a simulation,” reminds the researcher. “While our approach allows us to realistically predict user behavior, it certainly does not correspond 100% to reality. To get closer, more detailed analysis and deeper collaborations with transport management bodies will be needed.”

Nevertheless, results obtained could serve to support more equitable urban mobility and in time, reduce its environmental footprint. Especially since the rise of electric vehicles and automation could increase the environmental benefits. However, according to Araldo, “electric, self-driving cars do not represent a miracle solution to save the planet. They will only prove to be a truly eco-friendly option as part of a multimodal public transport network.”

Bastien Contreras

cybersécurité, attaques informatiques, attacks

Governments, banks, and hospitals: all victims of cyber-attacks

Hervé Debar, Télécom SudParis – Institut Mines-Télécom

Cyber-attacks are not a new phenomenon. The first computer worm distributed via the Internet, known as the “Morris worm” after its creator, infected 10% of the 60,000 computers connected to the Internet at the time.

Published back in 1989, the novel The Cuckoo’s Egg was based on a true story of computer espionage. Since then, there have been any number of malicious events, whose multiple causes have evolved over time. The initial motivation of many hackers was their curiosity about this new technology that was largely out of the reach of ordinary people at the time. This curiosity was replaced by the lure of financial reward, leading firstly to messaging campaigns encouraging people to buy products online, and subsequently followed by denial-of-service attacks.

Over the past few years, there have been three main motivations:

  • Direct financial gain, most notably through the use of ransomware, which has claimed many victims.
  • Espionage and information-gathering, mostly state-sponsored, but also in the private sphere.
  • Data collection and manipulation (normally personal data) for propaganda or control purposes.

These motivations have been associated with two types of attack: targeted attacks, where hackers select their targets and find ways to penetrate their systems, and large-scale attacks, where the attacker’s aim is to claim as many victims as possible over an extended period of time, as their gains are directly proportional to their number of victims.

The era of ransomware

Ransomware is a type of malware which gains access to a victim’s computer through a back door before encrypting their files. A message is then displayed demanding a ransom in exchange for decrypting these files.

Kaseya cash register software

In July 2021, an attack was launched against Kaseya cash register software, which is used by several store chains. It affected the Cloud part of the service and shut down the payment systems of several retail chains.

The Colonial Pipeline attack

One recent example is the attack on the Colonial Pipeline, an oil pipeline which supplies the eastern United States. The attack took down the software used to control the flow of oil through the pipeline, leading to fuel shortages at petrol stations and airports.

This is a striking example because it affected a visible infrastructure and had a significant economic impact. However, other infrastructure – in banks, factories, and hospitals – regularly fall victim to this phenomenon. It should also be noted that these attacks are very often destructive, and that paying the ransom is not always sufficient to guarantee the recovery of one’s files.

Unfortunately, such attacks look set to continue, at least in the short-term, given the financial rewards for the perpetrators: some victims pay the ransom despite the legal and ethical questions this raises. Insurance mechanisms protecting against cyber-crime may have a detrimental effect, as the payment of ransoms only encourages hackers to continue. Governments have also introduced controls on cryptocurrencies, which are often used to pay these ransoms, in order to make payments more difficult. Paradoxically, however, payments made using cryptocurrency can be traced in a way that would be impossible with traditional methods of payment. We can therefore hope that this type of attack will become less profitable and riskier for hackers, leading to a reduction in this type of phenomenon.

Targeted, state-sponsored attacks

Infrastructure, including sovereign infrastructure (economy, finance, justice, etc.), is frequently controlled by digital systems. As a result, we have seen the development of new practices, sponsored either by governments or extremely powerful players, which implement sophisticated methods over an extended time frame in order to attain their objectives. Documented examples include the Stuxnet/Flame attack on Iran’s nuclear program, and the SolarWinds software hack.

SolarWinds

The attack targeting Orion and its SolarWinds software is a textbook example of the degree of complexity that can be employed by certain perpetrators during an attack. As a network management tool, SolarWinds plays a pivotal role in the running of computer systems and is used by many major companies as well as the American government.

The initial attack took place between January and September of 2019, targeting the SolarWinds compilation environment. Between the fall of 2019 and February 2020, the attacker interacted with this environment, embedding additional features. In February 2020, this interaction enabled the introduction of a Trojan horse called “Sunburst”, which was subsequently incorporated into SolarWinds’ updates. In this way, it became embedded in all of Orion’s clients’ systems, infecting as many as 18,000 organizations. The exploitation phase began in late 2020 when further malicious codes downloaded by Sunburst were injected, and the hacker eventually managed to breach the Office365 cloud used by the compromised companies. Malicious activity was first detected in December 2020, with the theft of software tools from the company FireEye.

This has continued throughout 2021 and has had significant impacts, underlining both the complexity and the longevity of certain types of attack. American intelligence agencies believe this attack to be the work of SVR, Russia’s foreign intelligence service, which has denied this accusation. It is likely that the strategic importance of certain targets will lead to future developments of this type of deep, targeted attack. The vital role played by digital tools in the running of our critical infrastructure will doubtless encourage states to develop cyber weapons, a phenomenon that is likely to increase in the coming years.

Social control

Revelations surrounding the Pegasus software developed by NSO have shown that certain countries can benefit significantly from compromising their adversaries’ IT equipment (including smartphones).

The example of Tetris

Tetris is the name given to a tool used (potentially by the Chinese government) to infiltrate online chat rooms and reveal the identities of possible opponents. This tool has been used on 58 sites and uses relatively complex methods to steal visitors’ identities.

“Zero-click” attacks

The Pegasus revelations shed light on what are known as “zero-click” attacks. Many attacks on messaging clients or browsers assume that an attacker will click a link, and that this click will then cause the victim to be infected. With zero-click attacks, targets are infected without any activity on their part. One ongoing example of this hack is the ForcedEntry or CVE-2021-30860 vulnerability, which has affected the iMessage app on iPhones.

Like many others, this application accepts data in a wide range of formats and must carry out a range of complex operations in order to present it to users in an elegant way, despite its reduced display format. This complexity has extended the opportunities for attacks. An attacker who knows a victim’s phone number can send them a malicious message, which will trigger an infection as it is processed by the phone. Certain vulnerabilities even make it possible to delete any traces (at least visible traces) that the message was received, in order to avoid alerting the target.

Despite the efforts to make IT platforms harder to hack, it is likely that certain states and private companies will remain capable of hacking into IT systems and connected objects, either directly (via smartphones, for example) or via the cloud services to which they are connected (e.g. voice assistants). This takes us into the world of politics, and indeed geopolitics.

The biggest problem with cyber-attacks remains identifying the origin of the attack and who was behind it. This is made even more difficult by attackers trying to cover their tracks, which the Internet gives them multiple opportunities to do.

How can you prevent an attack?

The best way of preventing an attack is to install the latest updates for systems and applications, and perhaps ensure that they are installed automatically. The majority of computers, phones and tablets can be updated on a monthly basis, or perhaps even more frequently. Another way is to activate existing means of protection such as firewalls or anti-virus software, which will eliminate most threats.

Saving your work on a regular basis is essential, whether onto a hard drive or in the Cloud, as is disconnecting from these back-ups once they have been completed. Back-up copies are only useful if they are kept separate from your computer, otherwise ransomware will attack your back-up drive as well as your main drive. Backing up twice, or saving key information such as the passwords to your main applications (messenger accounts, online banking, etc.) in paper form, is another must.

Digital tools should also be used with caution. Follow this simple rule of thumb: if it seems too good to be true in the real world, then there is every chance that it is also the case in the virtual world. By paying attention to any messages that appear on our screens and looking out for spelling mistakes or odd turns of phrase, we can often identify unusual behavior on the part of our computers and tablets and check their status.

Lastly, users must be aware that certain activities are risky. Unofficial app stores or downloads of executables in order to obtain software without a license often contain malware. VPNs, which are widely used to watch channels from other regions, are also popular attack vectors.

What should you do if your data is compromised?

Being compromised or hacked is highly stressful, and hackers constantly try to make their victims feel even more stressed by putting pressure on them or by sending them alarming messages. It is crucial to keep a cool head and find a second device, such as a computer or a phone, which you can use to find a tool that will enable you to work on the compromised machine.

It is essential to return to a situation in which the compromised machine is healthy again. This means a full system recovery, without trying to retrieve anything from the previous installation in order to prevent the risk of reinfection. Before recovery, you must analyze your back-up to make that sure no malicious code has been transferred to it. This makes it useful to know where the infection came from in the first place.

Unfortunately, the loss of a few hours of work has to be accepted, and you simply have to find the quickest and safest way of getting up and running again. Paying a ransom is often pointless, given that many ransomware programs are incapable of decrypting files. When decryption is possible, you can often find a free program to do it, provided by security software developers. This teaches us to back up our work more frequently and more extensively.

Finally, if you lack in-house cybersecurity expertise, it is highly beneficial to obtain assistance with the development of an approach that includes risk analyses, the implementation of protective mechanisms, the exclusive use of certified cloud services, and the performance of regular audits carried out by certified professionals capable of detecting and handling cybersecurity incidents.

Hervé Debar, Director of Research and PhDs, Deputy Director of Télécom SudParis.

This article has been republished from The Conversation under a Creative Commons licence. Read the original article (in French).

zero-click attacks

Zero-click attacks: spying in the smartphone era

Zero-click attacks exploit security breaches in smartphones in order to hack into a target’s device without the target having to do anything. They are now a threat to everyone, from governments to medium-sized companies.

“Zero-click attacks are not a new phenomenon”, says Hervé Debar, a researcher in cybersecurity at Télécom SudParis. “In 1988 the first computer worm, named the “Morris worm” after its creator, infected 6,000 computers in the USA (10% of the internet at the time) without any human intervention, causing damage estimated at several million dollars.” By connecting to messenger servers which were open access by necessity, this program exploited weaknesses in server software, infecting it. It could be argued that this was one of the very first zero-click attacks, a type of attack which exploits security breaches in target devices without the victim having to do anything.

There are two reasons why this type of attack is now so easy to carry out on smartphones. Firstly, the protective mechanisms for these devices are not as effective as those on computers. Secondly, more complex processes are required in order to present videos and images, meaning that the codes enabling such content to be displayed are often more complex than those on computers. This makes it easier for attackers to hack in and exploit security breaches in order to spread malware. As Hervé Debar explains, “attackers must, however, know certain information about their target – such as their mobile number or their IP address – in order to identify their phone. This is a targeted type of attack which is difficult to deploy on a larger scale as this would require collecting data on many users.”

Zero-click attacks tend to follow the same pattern: the attacker sends a message to their target containing specific content which is received in an app. This may be a sound file, an image, a video, a gif or a pdf file containing malware. Once the message has been received, the recipient’s phone processes it using apps to display the content without the user having to click on it. While these applications are running, the attacker exploits breaches in their code in order to run programs resulting in spy software being installed on the target device, without the victim knowing.

Zero-days: vulnerabilities with economic and political impact

Breaches exploited in zero-click attacks are known as “zero-days”, vulnerabilities which are unknown to the manufacturer or which have yet to be corrected. There is now a global market for the detection of these vulnerabilities: the zero-day market, which is made up of companies looking for hackers to identify these breaches. Once the breach has been identified, the hacker will produce a document explaining it in detail, with the company who commissioned the document often paying several thousand dollars to get their hands on it. In some cases the manufacturer themselves might buy such a document in an attempt to rectify the breach. But it may also be bought by another company looking to sell the breach to their clients – often governments – for espionage purposes. According to Hervé Debar, between 100 and 1,000 vulnerabilities are detected on devices each year. 

Zero-click attacks are regularly carried out for theft or espionage purposes. For theft, the aim may be to validate a payment made by the victim in order to divert their money. For espionage, the goal might be to recover sensitive data about a specific individual. The most recent example was the Pegasus affair, which affected around 50,000 potential victims, including politicians and media figures. “These attacks may be a way of uncovering secret information about industrial, economic or political projects. Whoever is responsible is able to conceal themselves and to make it difficult to identify the origin of the attack, which is why they’re so dangerous”, stresses Hervé Debar. But it is not only governments and multinationals who are affected by this sort of attack – small and medium-sized companies are too. They are particularly vulnerable in that, owing to a lack of financial resources, they don’t have IT professionals running their systems, unlike major organisations.

Also read on I’MTech Cybersecurity: high costs for companies

More secure computer languages

But there are things that can be done to limit the risk of such attacks affecting you. According to Hervé Debar, “the first thing to do is use your common sense. Too many people fall into the trap of opening suspicious messages.” Personal phones should also be kept separate from work phones, as this prevents attackers from gaining access to all of a victim’s data. Another handy tip is to back up your files onto an external hard drive. “By transferring your data onto an external hard drive, it won’t only be available on the network. In the event of an attack, you will safely be able to recover your data, provided you disconnected the disc after backing up.” To protect against attacks, organisations may also choose to set up intrusion detection systems (IDS) or intrusion prevention systems (IPS) in order to monitor flows of data and access to information.

In the fight against cyber-attacks, researchers have developed alternative computing languages. Ada, a programming language which dates back to the 1980s, is now used in the aeronautic industry, in railways and in aviation safety. For the past ten years or so the computing language Rust has been used to solve problems linked to the management of buffer memory which were often encountered with C and C++, languages widely used in the development of operating systems. “These new languages are better controlled than traditional programming languages. They feature automatic protective mechanisms to prevent errors committed by programmers, eliminating certain breaches and certain types of attack.” However, “writing programs takes time, requiring significant financial investment on the part of companies, which they aren’t always willing to provide. This can result in programming errors leading to breaches which can be exploited by malicious individuals or organisations.”

Rémy Fauvel

Facebook

Facebook: a small update causes major disruption

Hervé Debar, Télécom SudParis – Institut Mines-Télécom

Late on October 4, many users of Facebook, Instagram and WhatsApp were unable to access their accounts. All of these platforms belong to the company Facebook and were all affected by the same type of error: an accidental and erroneous update to the routing information for Facebook’s servers.

The internet employs various different types of technology, two of which were involved in yesterday’s incident: BGP (border gateway protocol) and DNS (domain name system).

In order to communicate, each machine must have an IP address. Online communication involves linking two IP addresses together. The contents of each communication are broken down into packets, which are exchanged by the network between a source and a destination.

How BGP (border gateway protocol) works

The internet is comprised of dozens of “autonomous systems”, or AS, some very large, and others very small. Some AS are interconnected via exchange points, enabling them to exchange data. Each of these systems is comprised of a network of routers, which are connected using either optical or electrical communication links. Communication online circulates using these links, with routers responsible for transferring communications between links in accordance with routing rules. Each AS is connected to at least one other AS, and often several at once.

When a user connects their machine to the internet, they generally do so via an internet service provider or ISP. These ISPs are themselves “autonomous systems”, with address ranges which they allocate to each of their clients’ machines. Each router receiving a packet will analyse both the source and the destination address before deciding to transfer the packet to the next link, following their routing rules.

In order to populate these routing rules, each autonomous system shares information with other autonomous systems describing how to associate a range of addresses in their possession with an autonomous system path. This is done step by step through the use of the BGP or border gateway protocol, ensuring each router has the information it needs to transfer a packet.

DNS (domain name system)

The domain name system was devised in response to concerns surrounding the lack of transparency with IP addresses for end users. For available servers on the internet, this links “facebook.com” with the IP address “157.240.196.35”.

Each holder of a domain name sets up (or delegates) a DNS server, which links domain names to IP addresses. They are considered to be the most reliable source (or authority) for DNS information, but are also often the first cause of an outage – if the machine is unable to resolve a name (i.e. to connect the name requested by the user to an address), then the end user will be sent an error message.

Each major internet operator – not just Facebook, but also Google, Netflix, Orange, OVH, etc. – has one or more autonomous systems and coordinates the respective BGP in conjunction with their peers. They also each have one or more DNS servers, which act as an authority over their domains.

The outage

Towards the end of the morning of October 4, Facebook made a modification to its BGP configuration which it then shared with the autonomous systems it is connected to. This modification resulted in all of the routes leading to Facebook disappearing, across the entire internet.

Ongoing communications with Facebook’s servers were interrupted as a result, as the deletion of the routes spread from one autonomous system to the next, since the routers were no longer able to transfer packets.

The most visible consequence for users was an interruption to the DNS and an error message, followed by the DNS servers of ISPs no longer being able to contact the Facebook authoritative server as a result of the BGP error.

This outage also caused major disruption on Facebook’s end as it rendered remote access and, therefore, teleworking, impossible. Because they had been using the same tools for communication, Facebook employees found themselves unable to communicate with each other, and so repairs had to be carried out at their data centres. With building security also online, access proved more complex than first thought.

Finally, with the domain name “facebook.com” no longer referenced, it was identified as free by a number of specialist sites for the duration of the outage, and was even put up for auction.

Impact on users

Facebook users were unable to access any information for the duration of the outage. Facebook has become vitally important for many communities of users, with both professionals and students using it to communicate via private groups. During the outage, these users were unable to continue working as normal.

Facebook is also an identity provider for many online services, enabling “single sign-on”, which involves users reusing their Facebook accounts in order to access services offered by other platforms. Unable to access Facebook, users were forced to use other login details (which they may have forgotten) in order to gain access.

Throughout the outage, users continued to request access to Facebook, leading to an increase in the number of DNS requests made online and a temporary but very much visible overload of DNS activity worldwide.

This outage demonstrated the critical role played by online services in our daily lives, while also illustrating just how fragile these services still are and how difficult it can be to control them. As a consequence, we must now look for these services to be operated with the same level of professionalism and care as other critical services.

Banking, for example, now takes place almost entirely online. A breakdown like the one that affected Facebook is less likely to happen to a bank given the standards and regulations in place for banking, such as the Directive On Network And Service Securitythe General Data Protection Regulation or PCI-DSS.

In contrast, Facebook writes its own rules and is partially able to evade regulations such as the GDPR. Introducing service obligations for these major platforms could improve service quality. It is worth pointing out that no bank operates a network as impressive as Facebook’s infrastructure, the size of which exacerbates any operating errors.

More generally, after several years of research and standardisation, safety mechanisms for BGP and DNS are now being deployed, the aim being to prevent attacks which could have a similar impact. The deployment of these security mechanisms will need to be accelerated in order to make the internet more reliable.

Hervé Debar, Director of Research and PhDs, Deputy director, Télécom SudParis – Institut Mines-Télécom

This article has been republished from The Conversation under a Creative Commons licence. Read the original article.

IMPETUS: towards improved urban safety and security

How can traffic and public transport be managed more effectively in a city, while controlling pollution, ensuring the safety of users and at the same time, taking into account ethical issues related to the use of data and mechanisms to ensure its protection? This is the challenge facing IMPETUS, a €9.3 million project receiving funding of €7.9 million from the Horizon 2020 programme of the European Union[1]. The two-year project launched in September 2020 will develop a tool to increase cities’ resilience to security-related events in public areas. An interview with Gilles Dusserre, a researcher at IMT Mines Alès, a partner in the project.

What was the overall context in which the IMPETUS project was developed?

Gilles Dusserre The IMPETUS project was the result of my encounter with Matthieu Branlat, the scientific coordinator of IMPETUS, who is a researcher at SINTEF (Norwegian Foundation for Scientific and Industrial Research) which supports research and development activities. Matthieu and I have been working together for many years. As part of the eNOTICE European project, he came to take part in a use case organized by IMT Mines Alès on health emergencies and the resilience of hospital organizations. Furthermore, IMPETUS is the concrete outcome of efforts made by research teams at Télécom SudParis and IMT Mines Alès for years to promote joint R&D opportunities between IMT schools.

What are the security issues in smart cities?

GD A smart city can be described as an interconnected urban network of sensors, such as cameras and environmental sensors; it offers a multitude of valuable big data. In addition to better managing traffic and public transport and controlling pollution, this data allows for better police surveillance, adequate crowd control. But these smart systems increase the risk of unethical use of personal data, in particular given the growing use of AI (artificial intelligence) combined with video surveillance networks. Moreover, they increase the attack surface for a city since several interconnected IoT (Internet of Things) and cloud systems control critical infrastructure such as transport, energy, water supply and hospitals (which play a central role in current problems). These two types of risks associated with new security technologies are taken very seriously by the project: a significant part of its activities is dedicated to the impact of the use of these technologies on operational, ethical and cybersecurity aspects. We have groups within the project and external actors overseeing ethical and data privacy issues. They work with project management to ensure that the solutions we develop and deploy adhere to ethical principles and data privacy regulations. Guidelines and other decision-making tools will also be developed for cities to help them identify and take into account the ethical and legal aspects related to the use of intelligent systems in security operations.

What is the goal of IMPETUS?

GD In order to respond to these increasing threats for smart cities, the IMPETUS project will develop an integrated toolbox that covers the entire physical and cybersecurity value chain. The tools will advance the state of the art in several key areas such as detection (social media, web-based threats), simulation and analysis (AI-based tests) and intervention (human-machine interface and eye tracking, optimization of the physical and cyber response based on AI). Although the toolbox will be tailored to the needs of smart city operators, many of the technological components and best practices will be transferable to other types of critical infrastructure.

What expertise are researchers from IMT schools contributing to the project?  

GD The work carried out by Hervé Debar‘s team at Télécom SudParis, in connection with researchers at IMT Mines Alès, resulted in the creation of the overall architecture of the IMPETUS platform, which will integrate the various modules of smart city as proposed in the project. Within this framework, the specification of the various system components, and the system as a whole, will be designed to meet the requirements of the final users (cities of Oslo and Padua), but also to be scalable to future needs.

What technological barriers must be overcome?

GD The architecture has to be modular, so that each individual component can be independently upgraded by the provider of the technology involved. The architecture also has to be integrated, which means that the various IMPETUS modules can exchange information, thereby providing significant added value compared to independent smart city and security solutions that work as silos.  

To provide greater flexibility and efficiency in terms of collecting, analyzing, storing and access to data, the IMPETUS platform architecture will combine IoT and cloud computing approaches. Such an approach will reduce the risks associated with an excessive centralization of large amounts of smart city data and is in line with the expected changes in communication infrastructure, which will be explored at a later date.  

This task will also develop a testing plan. The plan will include the prerequisites, the execution of tests, and the expected results. The acceptance criteria will be defined based on the priority and percentage of successful test cases. In close collaboration with the University of Nimes, IMT Mines Alès will work on innovative approach to environmental risks, in particular related to chemical or biological agents, and to hazard assessment processes.

The consortium includes 17 partners and 11 EU member states and associated countries. What are their respective roles?

GD The consortium was formed to bring together a group of 17 organizations that are complementary in terms of basic knowledge, technical skills, ability to create new knowledge, business experience and expertise. The consortium comprises a complementary group of academic institutions (universities) and research organizations, innovative SMEs, industry representatives, NGOs and final users.

The work is divided into a set of interdependent work packages. It involves interdisciplinary innovation activities that require a high level of collaboration. The overall strategy consists of an iterative exploration, an assessment and a validation, involving the final users at every step.

[1] This project receives funding from Horizon 2020, the European Union’s Framework Programme for Research and Innovation (H2020) under grant agreement N° 883286. Learn more about IMPETUS.

El Niño

El Niño: communities in the face of weather’s bad boy

South America must regularly face a climate event with far-ranging effects:  El Niño, which leads to localized flooding. This type of catastrophe also results in changes in the behavior of local communities – a topic which has been little studied. Yet these changes provide a good example of individuals’ resilience to crises. By studying consumption habits in the regions affected by El Niño, Vincent Gauthier, a researcher at Télécom SudParis, seeks to understand how communities react to this event.

El Niño is a recurring climate event, which takes place every two to seven years on the equatorial Pacific coast of South America. It leads to localized heavy precipitation with serious consequences. “The  2017 El Niño phenomenon was especially violent and was characterized by two periods of heavy rainfall, resulting in human casualties and extensive destruction of physical structures,” says Vincent Gauthier, a researcher at Télécom SudParis who studies complex networks and is analyzing the impact of the 2017 episode on the behavior of the local community.  

Peru was strongly impacted by the most recent El Niño phenomenon, especially in the north of the country and on its Pacific coast, which includes the Lima region. The episodes of rainfall gave rise to two periods of flooding: the first in February and the second in early April. Vincent Gauthier’s research seeks to understand how economic behavior changes before, during and after these periods.

To study these changes, the researcher uses data about consumption in the region. “Our approach is to analyze banking transaction data, with different granularity levels,” he explains. Studies were carried out in partnership with the Pacific University in Lima and led to the publication of a research article in the journal Plos One.

At the countrywide level, the results are conclusive: during each period of heavy rainfall there is a significant drop in the number and volume of transactions overall, therefore indicating that individuals consume less during the weather event. Transactions return to normal in the days following the rainfall, indicating that the overall impact is fairly limited in duration.   

Resilience to El Niño

The study was then carried out in a specific way in the region of Lima, which includes the capital and surrounding rural areas. This made it possible to categorize areas according to dynamic changes in consumption.  Unsurprisingly, the areas recording the most significant drops in transactions were the most affected by the rainfall. However, certain areas recorded rises in consumption before and during the episode, a behavior which may reflect a trend of purchasing as a precautionary measure.

To better understand such variations, Vincent Gauthier established a retail network model. This representation indicates not only consumers’ purchases, but also consumption paths. Such a model shows the various connections between stores, based on how busy they are, their ranking and the type of products sold. For example, a consumer who carries out a transaction at a pharmacy and then another at a supermarket strengthens the link between these two types of stores within the network. This makes it possible to study which links are the strongest in the event of a disturbance.  

During periods of heavy rainfall, the size of the network was strongly impacted,” says the researcher. “The connections were therefore reduced to stores that sell food, medical supplies and fuel. These connections represent the core of the network and if this core collapses, so does the whole system,” explains Vincent Gauthier. Modeling and studying resilience therefore allow us to understand the vulnerability and risks to this core network.

Using this approach, it can be seen that the first episode of rainfall had a stronger impact than the second one on the size of the core network, as well as on the time to it took to rebuild a wider network. Yet, the second period of rainfall was more violent from a weather perspective. This counterintuitive observation may be explained by better community preparedness for the second period of heavy rainfall and flooding. This difference in behavior highlighted by modeling is a marker of the resilience of the Peruvian people.

Understanding people through their purchases

To put these models in place, researchers used all the metadata associated with banking transactions. “Each transaction produces data accompanied by nomenclatures, which contain information about the type of store in which it was carried out, for example supermarkets, restaurants, pharmacies or service stations, “ says Vincent Gauthier. “This nomenclature also contains the date of purchase date and the anonymized identity of the person who made the purchase,” he continues.

This means that each individual’s purchasing path can be traced over time to provide an overview of his or her specific economic behavior during various periods. This analysis makes it possible to determine which stores are most often visited after one another by consumers, which is influenced both by the geographical proximity of the businesses to one another and similar interests among consumers.

By analyzing this data, stores can be ranked according to the number and volume of transactions carried out there, then divergence measurements can be taken to identify changes in these rankings,” explains the researcher. The divergence measurements focus on differences in stores’ rankings at the time of the El Niño phenomenon compared to the original distribution. Such differences can also be seen during festive events, when there is a greater number of transactions  in certain kinds of stores. “We therefore categorized stores based on the variation in their ranking during the El Nino phenomenon,” says Vincent Gauthier.

This approach allows researchers to create a profile of various stores over time so that they could  see how their ranking varies at the time of events. For example, the ranking of restaurants fell sharply during the short periods corresponding to times of heavy rainfall, while the ranking of stores selling medical supplies increased for a relatively long period of time. Supermarkets were the type of store whose rankings were generally the most stable.

Better preparing for crises

Future climate change will lead to an increase in extreme phenomena. Community resilience to these events will become an important issue to understand,” says Vincent Gauthier. The research carried out in relation to El Niño offers insights into community preparedness. It provides valuable knowledge for regions who are not used to dealing with extreme climate events, but who may have to face them in the years to come.  

That would make it possible to identify what services to develop and logistics to put in place in order to mitigate the effects of future crises, by organizing supply and inventory as well as keeping essential services open during crises. For example, we observed serious gasoline supply problems, although the demand for this product was high during the crisis and in its aftermath, and significant  delays in consumption in geographic areas that were less exposed to the crisis,” says the researcher.

Beyond the climate issue, the wider issue of preparedness and resilience to crisis was studied. Understanding how the consumption network varies, what parts must be strengthened, or on the other hand, what parts are secondary, makes it possible to better focus efforts in an exceptional situation. The study of the current health crisis is a part of this work. “We’re studying the effects of the Covid-19 pandemic on the behavior of the Peruvian people, by analyzing consumption data as well as mobility data.”  The analysis of mobility patterns could have a major impact on decisions to make in the event of a lockdown. “The methodology for the Covid-19 health crisis will be a bit different since the impact will be measured over a longer term unlike the crisis caused by El Niño where the underlying processes were essentially transitional,”  concludes Vincent Gauthier.

Antonin Counillon

e-VITA

e-VITA, a virtual coach for seniors

Virtual coaching can play a crucial role in maintaining healthy and active ageing through early detection of risks and intervention tailored to the individual needs of senior citizens. However, current technologies do not meet these requirements. Instead they offer limited interaction and are often intrusive. The 22 European and Japanese partners of the e-VITA project will develop a “multi-modal personal voice coach” to assist and safeguard the elderly person at home. With a budget of €4m funded by the European Union and of an equivalent amount funded by the Japanese MIC (Ministry of Internal Affairs and Communications), the project began in January 2021 for a duration of 3 years. Interview with Jérôme Boudy, researcher at Télécom SudParis, and project partner.

How did the European e-VITA project come about?

Jérôme Boudy – In a context of ageing populations, the idea of this project gradually took shape from 2016 onwards. Initially, there were ongoing projects such as EMPATHIC, of which Télécom SudParis is a partner, followed by a collaboration with Brazil, and finally the e-VITA (European-Japanese virtual coach for smart ageing) project with Japan, which aims to develop tools to ensure active and healthy ageing (AHA) through the early detection of the risks associated with old age. 

Read more on I’MTech: AI to assist the elderly

What is the goal of e-VITA?

JB – The aim is to keep the elderly at home in a secure environment. Founded on international cooperation between Europe and Japan, e-VITA offers an innovative approach to “virtual coaching” that addresses the crucial areas of active and healthy ageing: cognition, physical activity, mobility, mood, social interaction, leisure… enabling older people to better manage their own health and daily activities.

What method will be used?

JB – By taking  into account different cultural factors in European countries and in Japan, in particular the acceptability of interfaces used preferentially in these countries (smartphones, 3D holograms, social robots, etc.) e-VITA will develop an automatic multi-modal human-machine interface. Based on Natural Language Processing (NLP) and automatic spoken dialog management, it will also be equipped with several complementary non-verbal modalities such as recognition of a person’s gestures, emotions, and situation.

This “virtual coach” will detect potential risks in the user’s daily environment and how these risks could be prevented by collecting data from external sources and non-intrusive sensors. It will provide individualized profiling and personalized recommendations based on big data analytics and socio-emotional informatics. Interoperability and data confidentiality will be guaranteed through FIWARE and a federated data AI platform.

What expertise will Télécom SudParis and IMT Atlantique researchers involved in e-VITA bring to the table?

JB – Researchers from IMT schools will mainly ensure the interoperability and processing of the data provided by the different sensors, as well as the automatic monitoring of emotions on the face. In addition, our two living labs – Experiment’HaaL for IMT Atlantique and Evident for Télécom SudParis –  will be made available to project partners. Finally, we will be in charge of the management of the “dissemination and exploitation” work package.

The project brings together a large number of partners. What are their roles in this project?

JBThe consortium brings together 12 partners in Europe and 10 in Japan, each with their respective complementary roles. Siegen University (Germany) and Tohoku University, are co-ordinating the project for Europe and for Japan, which brings together three major groups: end users responsible for needs specification and field assessment, such as APHP (France), AGE Platform Europe (Belgium), IRNCA (Italy), Caritas Germany, NCGG and IGOU (Japan); Academics and research organizations specializing in AI algorithms (automatic learning, fusion, expression recognition, etc.): alongside the IMT schools are Fraunhofer and INFAI (Germany), UNIVPM (Italy), Tohoku University, AIST, Waseda University (Japan)… ; and lastly, industrialists in charge of technical definition and process integration, mainly SMEs: IXP (Germany), Ingegneria Informatica (Italy), Delta Dore (France), Gatebox and NEU (Japan), and a single large group: Misawa (Japan)

What are the expected results?

JB – The creation of a “multi-modal personal voice coach” whose job is to assist, accompany and safeguard the elderly at home, and the operation of this system through several physical interfaces (smart-phones, robots, etc…) thanks to the integration of start-up incubators in our living labs and structures.

The coaching system will be installed into the living environments of healthy elderly people in France, Germany, Italy, and Japan to evaluate its feasibility and effectiveness. The results of the e-VITA project also include new standards and policies beyond technology, and will therefore be explored and transferred across Europe, Japan and worldwide.

What are the next big steps for the e-VITA project?

JB – The next step is the phase of specifying user needs according to cultural factors, and defining the architecture of the targeted system, which requires the organization of several workshops.

Find out more about e-VITA

Interview by Véronique Charlet

Hack drone attaque UAVs

Hacked in mid-flight: detecting attacks on UAVs

A UAV (or drone) in flight can fall victim to different types of attacks. At Télécom SudParis, Alexandre Vervisch-Picois is working on a method for detecting attacks that spoof drones concerning their position. This research could be used for both military and civilian applications.

He set out to deliver your package one morning, but it never arrived. Don’t worry, nothing has happened to your mailman. This is a story about an autonomous drone. These small flying vehicles are capable of following a flight path without a pilot, and are now ahead of the competition in the race for the fastest delivery.

While drone deliveries are technically possible, for now they remain the stuff of science fiction in France. This is due to both legal reasons and certain vulnerabilities in these systems. At Télécom SudParis, Alexandre Vervisch-Picois, a researcher specialized in global navigation satellite systems (GNSS), and his team are working with Thales to detect what is referred to as “spoofing” attacks. In order to prevent these attacks, researchers are studying how they work, with the goal of establishing protocol to help detect them.

How do you spoof a drone?

In order to move around independently, a drone must know its position and the direction in which it is moving. It therefore receives continuous signals from a satellite constellation which enables it to calculate the coordinates of its position. These can then be used to follow a predefined flight path by moving through a succession of waypoints until it reaches its destination. However, the drone’s heavy reliance on satellite geolocation to find its way makes it vulnerable to cyber attacks. “If we can succeed in getting the drone to believe it is somewhere other than its actual position, then we can indirectly control its flight path,” Alexandre Vervisch-Picois explains. This flaw is all the more critical given that the drones’ GPS receivers can be easily deceived by false signals transmitted at the same frequency as those of the satellites.

This is what the researchers call a spoofing attack. This type of cyber attack is not new. It was used in 2011 by the Iranian army to capture an American stealth drone that flew over its border. The technique involves transmitting a sufficiently powerful false radio frequency to replace the satellite signal picked up by the drone. This spoofing technique doesn’t cancel the drone’s geolocation capacities as a scrambler would. Instead, it forces the GPS receiver to calculate an incorrect position, causing it to deviate from its flight path. “For example, an attacker who succeeds in identifying the next waypoint can then determine a wrong position to be sent in order to lead the drone right to a location where it can be captured,” the researcher explains.

Resetting the clocks

Several techniques can be used to identify these attacks, but they often require additional costs, both in terms of hardware and energy.Through the DIGUE project (French acronym for GNSS Interference Detection for Autonomous UAV)[1] conducted with Thales Six, Alexandre Vervisch-Picois and his team have developed a method for detecting spoofing attempts. “Our approach uses the GPS receivers present in the drones, which makes this solution less expensive,” says the researcher. This is referred to as the “clock bias” method. Time is a key parameter in satellite position calculations. The satellites have their time base and so does the GPS receiver. Therefore, once the GPS receiver has calculated its position, it measures the “bias”, which is the difference between these two time bases.  However, when a spoofing attack occurs, the researchers observed variations in this calculation in the form of a jump. The underlying reason for this jump is that the spoofer has its own time base, which is different from that of the satellites. “In practice, it is impossible for the spoofer to use the same clock as a satellite. All it can do is move closer to the time base, but we always notice a jump,” Alexandre Vervisch-Picois explains. To put it simply, satellites and spoofer are not set to the same time.

One advantage of this method is that it does not require any additional components or computing power to retrieve the data, since they are already present in the drone. It also does not require expensive signal processing analyses in order to study the information received by the drone–which is another defense method used to determine whether or not a signal originated from a satellite.

But couldn’t the attacker work around this problem by synchronizing with the satellites’ time setting? “It is very rare but still possible in the case of a very sophisticated spoofer. This is a classic example of measures and countermeasures, exemplified in interactions between a sword and a shield. In response to an attack, we set up defense systems and the threats become more sophisticated to bypass them,” the researcher explains. This is one reason why research in this area has so much to offer.

After obtaining successful results in the laboratory, the researchers are now planning to develop an algorithm based on time bias monitoring. This could be implemented on a flying drone for a test with real conditions.

What happens after an attack is detected?

Once the attack has been detected, the researchers try to locate the source of the false signal in order to find the attacker. To do so, they propose using a fleet of connected drones. The idea is to program movements within the fleet in order to determine the angle of arrival for the false signal. One of the drones would then send a message to the relevant authorities in order to stop the spoofer. This method is still in its infancy and is expected to be further developed with Thales in a military context with battlefield situations in which the spoofer must be eliminated. But in the context of a parcel delivery, what could be used to defend a single drone? “There could be a protocol involving rising to a higher altitude to move out of the spoofer’s range, which can reach up to several kilometers. But it would certainly not be as easy to escape its influence,” the researcher says. Another alternative could be to use signal processing methods, but these solutions would increase the costs associated with the device. “If too much of the drone’s energy is required for its protection, we need to ask whether this mode of transport is helpful and perhaps consider other more conventional methods, which are less burdensome to implement,” says Alexandre Vervisch-Picois.

[1] Victor Truong’s thesis research

Anaïs Culot

How to better track cyber hate: AI to the rescue

The widescale use of social media, sometimes under cover of anonymity, has liberated speech and led to a proliferation of ideas, discussions and opinions on the internet. It has also led to a flood of hateful, sexist, racist and abusive speech. Confronted with this phenomenon, more and more platforms today are using automated solutions to combat cyber hate. These solutions are based on algorithms that can also introduce biases, sometimes discriminating against certain communities, and are still largely perfectible. In this context, French researchers are developing ever more efficient new models to detect hate speech and reduce the bias.

On September 16 this year, internet users launched a movement calling for a one-day boycott of Instagram. Supported by many American celebrities, the “Stop Hate for Profit” day aimed to challenge Facebook, the mother company of the photo and video sharing app, on the proliferation of hate, propaganda and misinformation on its platforms. Back in May 2019, during its bi-annual report on the state of moderation on its network, Facebook announced significant progress in the automated detection of hate content. According to the company, between January and April 2019, more than 65% of these messages were detected and moderated before users even reported them, compared with 38% during the same period in 2018.

Strongly encouraged to combat online hate content, in particular by the “Avia law” (named after the member of parliament for Paris, Lætitia Avia), platforms use various techniques such as detection by keywords, reporting by users and solutions based on artificial intelligence (AI). Machine learning allows predictive models to be developed from corpora of data. This is where biases can be damaging. “We realized that the automated tools themselves had biases against gender or the user’s identity and, most importantly, had a disproportionately negative impact on certain minority groups such as Afro-Americans,” explains Marzieh Mozafari, PhD student at Télécom SudParis. On Twitter, for example, it is difficult for AI-based programs to take into account the social context of tweets, the identity and dialect of the speaker and the immediate context of the tweet all at once. Some content is thus removed despite being neither hateful nor offensive.

So how can we minimize these biases and erroneous detections without creating a form of censorship? Researchers at Télécom SudParis have been using a public dataset collected on Twitter, distinguishing between tweets written in Afro-American English (AAE) and Standard American English (SAE), as well as two reference databases that have been annotated (sexist, racist, hateful and offensive) by experts and through crowdsourcing. “In this study, due to the lack of data, we mainly relied on cutting-edge language processing techniques such as transfer learning and the BERT language model, a pre-trained, unsupervised model”, explain the researchers.

Developed by Google, the BERT (Bidirectional Encoder Representations from Transformers) model uses a vast corpus of textual content, containing, among other things, the entire content of the English version of Wikipedia. “We were able to “customize” BERT [1] to make it do a specific task, to adjust it for our hateful and offensive corpus”, explains Reza Farahbakhsh, a researcher in data science at Télécom SudParis. To begin with, they tried to identify word sequences in their datasets that were strongly correlated with a hateful or offensive category. Their results showed that tweets written in AAE were almost 10 times more likely to be classed as racist, sexist, hateful or offensive compared to tweets written in SAE. “We therefore used a reweighting mechanism to mitigate biases based on data and algorithms,” says Marzieh Mozafari. For example, the number of tweets containing “n*gga” and “b*tch” is 35 times higher among tweeters in AAE than in SAE and these tweets will often be wrongly identified as racist or sexist. However, this type of word is common in AAE dialects and is used in everyday conversation. It is therefore likely that they will be considered hateful or offensive when they are written in SAE by an associated group.

In fact, these biases are also cultural: certain expressions considered hateful or offensive are not so within a certain community or in a certain context. In French, too, we use certain bird names to address our loved ones! Platforms are faced with a sort of dilemma: if the aim is to perfectly identify all hateful content, too great a number of false detections could have an impact on users’ “natural” ways of expressing themselves,” explains Noël Crespi, a researcher at Télécom SudParis. After reducing the effect of the most frequently used words in the training data through the reweighting mechanism, this probability of false positives was greatly reduced. “Finally, we transmitted these results to the pre-trained BERT model to refine it even further using new datasets,” says the researcher.

Can automatic detection be scaled up?

Despite these promising results, many problems still need to be solved in order to better detect hate speech. These include the possibility of deploying these automated tools for all languages spoken on social networks. This issue is the subject of a data science challenge launched for the second consecutive year: the HASOC (Hate Speech and Offensive Content Identification in Indo-European Languages), in which a team from IMT Mines d’Alès is participating. “The challenge aims to accomplish three tasks: determine whether or not content is hateful or offensive, classify this content into one of three categories: hateful, offensive or obscene, and identify whether the insult is directed towards an individual or a specific group,” explains Sébastien Harispe, a researcher at IMT Mines Alès.

We are mainly focusing on the first three tasks. Using our expertise in natural language processing, we have proposed a method of analysis based on supervised machine learning techniques that take advantage of examples and counter-examples of classes to be distinguished.” In this case, the researchers’ work focuses on small datasets in English, German and Hindi. In particular, the team is studying the role of emojis, some of which can have direct connotations with hate expressions. The researchers have also studied the adaptation of various standard approaches in automatic language processing in order to obtain classifiers able to efficiently exploit such markers.

They have also measured their classifiers’ ability to capture these markers, in particular through their performance. “In English, for example, our model was able to correctly classify content in 78% of cases, whereas only 77% of human annotators initially agreed on the annotation to be given to the content of the data set used,” explains Sébastien Harispe. Indeed, in 23% of cases, the annotators expressed divergent opinions when confronted with dubious content that probably needed to have been studied with account taken of the contextual elements.

What can we expect from AI? The researcher believes we are faced with a complex question: what are we willing to accept in the use of this type of technology? “Although remarkable progress has been made in almost a decade of data science, we have to admit that we are addressing a young discipline in which much remains to be developed from a theoretical point of view and, especially, for which we must accompany the applications in order to allow ethical and informed uses. Nevertheless, I believe that in terms of the detection of hate speech, there is a sort of glass roof created by the difficulty of the task as it is translated in our current datasets. With regard to this particular aspect, there can be no perfect or flawless system if we ourselves cannot be perfect.

Besides the multilingual challenge, the researchers are facing other obstacles such as the availability of data for model training and the evaluation of results, or the difficulty in assessing the ambiguity of certain content, due for example to variations in writing style. Finally, the very characterization of hate speech, subjective as it is, is also a challenge. “Our work can provide material for the humanities and social sciences, which are beginning to address these questions: why, when, who, what content? What role does culture play in this phenomenon? The spread of cyber hate is, at the end of the day, less of a technical problem than a societal one” says Reza Farahbakhsh.

[1] M. Mozafari, R. Farahbakhsh, N. Crespi, “Hate Speech Detection and Racial Bias Mitigation in Social Media based on BERT model”, PLoS ONE 15(8): e0237861. https://doi.org/10.1371/journal.pone.0237861

Anne-Sophie Boutaud

Also read on I’MTech