Posts

Data collection protection, GDPR impact

GDPR: Impact on data collection at the international level

The European data protection regulation (GDPR), introduced in 2018, set limits on the use of trackers that collect personal data. This data is used to target advertising to users. Vincent Lefrère, associate professor in digital economy at Institut Mines-Télécom Business School, worked with Alessandro Acquisti from Carnegie Mellon University to study the impact of the GDPR on tracking users in Europe and internationally.

What was your strategy for analyzing the impact of GDPR on tracking users in different countries?

Vincent Lefrère: We conducted our research on online media such as Le Monde in France or the New York Times in the United States. We looked at whether the introduction of the GDPR has had an impact on the extent to which users are tracked and the amount of personal data collected.

How were you able to carry out these analyses at the international level?

VL: The work was carried out in partnership with researchers at Carnegie Mellon University in the United States, in particular Alessandro Acquisti, who is one of the world’s specialists in personal digital data. We worked together to devise the experimental design and create a wider partnership with researchers at other American universities, in particular the Minnesota Carlson School of Management and Cornell University in New York.

How does the GDPR limit the collection of personal data?

VL: One of the fundamental principles of the GDPR is consent. This makes it possible to require websites that collect data to obtain users’ consent  before tracking them. In our study, we never gave our consent or explicitly refused the collection of data. That way, we could observe how a website behaves in relation to a neutral user. Moreover, one of the important features of GDPR is that it applies to all parties who wish to process data pertaining to European citizens. As such, the New York Times must comply with the GDPR when a website visitor is European. 

How did you compare the impact of the GDPR on different media?

VL: We logged into different media sites with IP addresses from different countries, in particular with French and American IP addresses.

We observed that American websites limit tracking more than European websites, and therefore better comply with the GDPR, but only when we were using a European IP address.  It would therefore appear that the GDPR has been more dissuasive on American websites for these users. However, the American websites increased the tracking of American users, for whom the GDPR does not apply.  One hypothesis is that this increase is used to offset the loss of data from European users.

How have online media adapted to the GDPR?

VL: We were able to observe a number of effects. First of all, online media websites have not really played along. Since mechanisms of consent are somewhat vague,  the formats developed in recent years have often encouraged users to accept personal data collection rather than reject it. There are reasons for this: data collection has become crucial to the business model of these websites, but little has been done to offset the loss of data resulting from the introduction of the GDPR, so it is understandable that they have stretched the limits of the law in order to continue offering high quality content for free. With the recent update by the French National Commission on Information Technology and Liberties (CNIL) to fight against this, consent mechanisms will become clearer and more standardized.  

In addition, the GDPR has limited tracking of users by third parties, and replaced it with tracking by first parties. Before, when a user logged into a news site, other companies such as Google, Amazon or Facebook could collect their data directly on the website. Now, the website itself tracks data, which may then be shared with third parties.

Following the introduction of the GDPR, the market share of Google’s online advertising service increased in Europe, since Google is one of the few companies who could pay the quota for the regulation, meaning it could pay the price of ensuring compliance. This is an unintended, perverse  consequence: smaller competitors have disappeared and there has been a concentration of ownership of data by Google.  

Has the GDPR had an effect on the content produced by the media?

VL: We measured the quantity and quality of content produced by the media. Quantity simply reflects the number of posts. The quality is assessed by the user engagement rate, meaning the number of comments or likes, as well as the number of pages viewed each time a user visits the website.

In the theoretical framework for our research, online media websites use targeted advertising to generate revenue. Since the GDPR makes access to data more difficult, it could decrease websites’ financing capacity and therefore lead to a reduction in content quality or quantity. By verifying these aspects, we can gain insights into the role of personal data and targeted advertising in the business model for this system.   

Our preliminary results show that after the introduction of the GDPR, the quantity of content produced by European websites was not affected, and the amount of engagement remained stable. However, European users reduced the amount of time they spent on European websites in comparison to American websites. This could be due to the the fact that certain American websites may have prohibited access to European users, or that American websites covered European topics less since attracting European users had become less profitable. These are hypotheses that we are currently discussing.

We are assessing these possible explanations by analyzing data about the newspapers’ business models, in order to estimate how important personal data and targeted advertising are to these business models.  

By Antonin Counillon

SONATA

SONATA: an approach to make data sound better

Telecommunications must transport data at an ever-faster pace to meet the needs of current technologies. But this data can be voluminous and difficult to transport at times. Communication channels are congested and transmission limits are reached quickly. Marios Kountouris, a telecommunications researcher at EURECOM, has recently received ERC funding to launch his SONATA project. It aims to shift the paradigm for processing information to speed up its transmission and make future networks more efficient.

We are close to the fundamental limit for transmitting data, from one point to another,” explains Marios Kountouris, a telecommunications researcher at EURECOM. Most of the current research in this discipline focuses on how to organize complex networks and on improving the algorithms that optimize these networks. Few projects, however, focus on improving the transfer of data between transmitters and receivers. This is precisely the focus of Marios Kountouris’ SONATA project, funded by a European ERC consolidator grant.

Telecommunications are generally based on Shannon’s information theory, which was established in the 1950s,” says the researcher. In this theory, a transmitter simply sends information through a transmission channel, which models it and transfers it to a receiver which then reconstructs it. The main obstacle to get around is the noise accompanying the signal when it passes through the transmission channel. This constraint can be overcome by algorithm-based signal processing and by increasing throughput. “This usually takes place in the same way, regardless of the message being transmitted. Back in the early days, and until recently, this was the right approach,” says the researcher.

Read more on I’MTech: Claude Shannon, a legacy transcending digital technology

Transmission speed for real-time communication

Today, there is an increasing amount of communication between machines that reason in milliseconds. “Certain messages must be transmitted quickly or they’re useless,” says Marios Kountouris. For example, in the development of autonomous cars, if the message collected relates to the detection of a pedestrian on the road so as to make the vehicle brake, it is only useful for a very short period of time. “This is what we call the age, or freshness of information, which is a very important parameter in some cases,” explains Marios Kountouris.

Yet, most transmission and reconstruction is slowed down by surplus information accompanying the message. In the previous example, if the system for detecting pedestrians is a camera that captures images with details about all the surrounding objects, a great deal of the information in the transmission and processing will not contribute to the system’s purpose. For the researcher, “the sampling, transmission and reconstruction of the message must no longer be carried out independently of another. If excess, redundant or useless data accompanies this process, there can be communication bottlenecks and security problems.”  

The semantics of messages

For real-time communication, the semantics of the message  — its meaning and usefulness— take on particular importance. Semantics make it possible to take into account the attributes of the message and adjust the format of its transmission depending on its purpose. For example, if a temperature sensor is meant to activate the heating system automatically when the room temperature is below 18° C, the attribute of the transmitted message is simply a binary breakdown of temperature: above or below 18°C.

Through the SONATA project, Marios Kountouris seeks to develop a new communication paradigm that takes the semantic value of information into account. This would make it possible to synchronize different types of information collected at the same time through various samples and make more optimal decisions. It would also significantly reduce the volume of transported data as well as the associated energy and resources required.

The success of this project depends on establishing semantic metrics that are concrete, informative and traceable,” explains the researcher. Establishing the semantics of a message means preprocessing sampling by the transmitter depending on how it is used by the receiver. The aim is therefore to identify the most important, meaningful or useful information in order to determine the qualifying attributes of the message. “Various semantic attributes can be taken into account to obtain a conformal representation of the information, but they must be determined in advance, and we have to be careful not to implement too many attributes at once,” he says.

The goal, then, is to build communication networks with key stages for processing the semantics associated with information. First, semantic filters must be used to avoid unnecessary redundancy when collecting information. Then, semantic preprocessing must be carried out in order to associate the data with its purposes. Signal reconstruction by the receiver would also be adapted to its purposes. All this would be semantically-controlled, making it possible to orchestrate the information collected in an agile way and reuse it efficiently, which is especially important when networks become more complex.

This is a new approach from a structural perspective and would help create links between communication theory, sampling and optimal decision-making. ERC consolidator grants fund high-risk, high-reward projects that aim to revolutionize a field, which is why SONATA has received this funding. “The sonata was the most sophisticated form of classical music and was pivotal to its development. I hope that SONATA will be a major step forward in telecommunications optimization,” concludes Marios Kountouris.

By Antonin Counillon