Skip navigation
Part II Chapter 10

Privacy

Hero image of Web Almanac characters with cameras, phones, and microphones acting like paparazzi while another character pulls back a shower curtain to reveal a web page behind it.

Introduction

This chapter of the Web Almanac gives an overview of the current state of privacy on the web. This topic has been increasing in popularity recently and has raised awareness on the users’ side. The need for guidelines has been met with various regulations (such as GDPR in Europe, LGPD in Brazil, CCPA in California to name but a few). These aim to increase the accountability of data processors and their transparency towards users. In this chapter, we discuss the prevalence of online tracking with different techniques and the adoption rate of cookie consent banners and privacy policies by websites.

Online tracking

Third-party trackers collect user data to build up profiles of the user’s behavior to be monetized for advertising purposes. This raises privacy concerns with users on the web, which resulted in the emergence of various tracking protections. However, as we will see in this section, online tracking is still widely used. Not only does it have a negative impact on privacy, online tracking has a huge impact on the environment and avoiding it can lead to better performance.

We examine the prominence of the most common types of third-party tracking, namely by means of third-party cookies and the use of fingerprinting. Online tracking is not limited to just these two techniques, new ones keep arising to circumvent existing countermeasures.

Third-party trackers

We use WhoTracksMe’s tracker list to determine the percentage of websites that issue a request to a potential tracker. As shown in the following figure, we have found that at least one potential tracker is present on roughly 93% of websites.

Websites including at least one potential tracker.
Figure 10.1. Websites including at least one potential tracker.

We examined the most widely used trackers and plot the prevalence of the 10 most popular ones.

Top 10 Potential Trackers.
Figure 10.2. Top 10 Potential Trackers.

The largest player on the online tracking market is without doubt Google, with eight of its domains present in the top 10 potential trackers and prevalent on at least 70% of websites. They are followed by Facebook and Cloudflare–though the latter is probably more reflective of the popularity of them as a hosting site.

WhoTracksMe’s tracker list also defines categories that the trackers belong to. If we remove CDNs and Hosting sites from our statistics, under the assumption they may not track—or at least that that is not their primary function—then you get a slightly different view of the top 10.

Top 10 Trackers.
Figure 10.3. Top 10 Trackers.

Here Google still makes up seven out of the top 10 domains. The following figure shows the distribution of the different categories for the 100 largest potential trackers by category.

Categories of the 100 most popular potential trackers.
Figure 10.4. Categories of the 100 most popular potential trackers.

Nearly 60% of the most popular trackers are advertising-related. This could be due to the profitability of the online advertising market being perceived to be related to the amount of tracking.

Cookies

We looked into the most popular cookies being set on websites in HTTP’s response header, according to their name and domain.

Domain Cookie Name Websites
doubleclick.net test_cookie 24%
facebook.com fr 10%
youtube.com VISITOR_INFO1_LIVE 10%
youtube.com YSC 10%
doubleclick.net IDE 9%
doubleclick.net unknown 9%
youtube.com GPS 9%
doubleclick.net unknown 8%
google.com NID 6%
doubleclick.net unknown 6%
Figure 10.5. Top cookies on desktop sites.
Domain Cookie Name Websites
doubleclick.net test_cookie 32%
doubleclick.net IDE 21%
facebook.com fr 10%
youtube.com VISITOR_INFO1_LIVE 10%
youtube.com YSC 10%
google.com NID 10%
youtube.com GPS 8%
doubleclick.net DSID 7%
yandex.ru yandexuid 6%
yandex.ru i 6%
Figure 10.6. Top cookies on mobile sites.

As you can see, Google’s tracking domain “doubleclick.net” sets cookies on roughly a quarter of websites on a mobile client and a third of all websites on a desktop client. Again, nine out of the ten most popular cookies on desktop client and seven out of ten on mobile are set by a Google domain. This is a lower bound for the number of websites the cookie is set on, since we are only counting cookies set via an HTTP header–a large number of tracking cookies are set by using third-party scripts.

Fingerprinting

Another widely-used tracking technique is fingerprinting. This consists of collecting different kinds of information about the user with the goal of building a unique “fingerprint” for them. Different types of fingerprinting are used on the web by trackers. Browser fingerprinting use characteristics specific to the browser of the user, relying on the fact that the chance of another user having the exact same browser set-up is fairly small if there are a large enough number of variables to track. In our crawl, we examined the presence of the FingerprintJS library, which provides browser fingerprinting as a service.

Figure 10.7. Websites using FingerprintJS.

Although the library is present on only a small percentage of websites, the persistent nature of fingerprinting means even small usage can have a big impact. Furthermore, FingerprintJS is not the only attempt at fingerprinting. Other libraries, tools and native code can also serve this purpose, so this is just one example.

Cookie consent banners have become common now. They increase transparency towards cookies and often allowing users to specify their cookie choices. While a lot of websites opt for using their own implementation of cookie banners, third-party solutions called Consent Management Platforms have recently emerged. The platforms provide an easy way for websites to collect user’s consent for different types of cookies. We see that 4.4% of websites use a consent management platform to manage cookie choices on desktop clients, and 4.0% on mobile clients.

Figure 10.8. Websites using a consent management platform.
Figure 10.9. Popularity of consent management platform.

When looking at the popularity of the different consent management solutions, we can see that Osano and Quantcast Choice are the leading platforms.

IAB Europe, the Interactive Advertising Bureau, is a European association for digital marketing and advertising. They proposed a Transparency Consent Framework (TCF) as a GDPR-compliant solution to obtain users’ consent about their digital advertising preferences. The implementation provides an industry standard for communication between publishers and advertisers about consumer consent.

Figure 10.10. Adoption rate of TCF banner.

While our results show that the TCF banner is not yet the “industry standard”, it is a step in the right direction. Considering the main target group of IAB Europe is in fact European publishers, and our crawl is global, having an adoption rate on 1.5% of websites on desktop client and 1.4% on mobile is not too bad.

Privacy Policies

Privacy policies are widely used by websites to meet legal obligations and increase transparency towards users about data collection practices. In our crawl, we searched for keywords indicating the presence of a privacy policy text on each visited website.

Figure 10.11. Websites that have a privacy policy.

The results show that almost half of the websites in the dataset have included a privacy policy, which is positive. However, studies have shown that the majority of internet users do not bother reading privacy policies and when they do, they lack understanding due to the length and complexity of most privacy policy texts. Still having a policy at all is a step in the right direction!

Conclusion

This chapter has shown that third-party tracking remains prominent on both desktop and mobile clients, with Google tracking the largest percentage of websites. Consent Management Platforms are used on a small percentage of websites; however a lot of websites implement their own cookie consent banners.

Lastly, roughly half of the websites include a privacy policy, which benefits greatly transparency towards users about data processing practices. This is undoubtedly a step forward but there is a lot still to be done. Outside of this analysis we know that privacy policies are hard to read and understand and cookie consent banners manipulate users into consent.

For the web to truly respect users, privacy has to be a part of conception, not an afterthought. Regulation is a good thing in this regards, and it is reassuring to see an increase in privacy regulation worldwide. Privacy by design should be the norm, rather than deploying policies and tools in order to meet minimum legal requirements and avoid financial penalties.

Author

Citation

BibTeX
@inbook{WebAlmanac.2020.Privacy,
author = "Dimova, Yana and Satyagraha, Laurent Devernay and Ostapenko, Max and Pollard, Barry",
title = "Privacy",
booktitle = "The 2020 Web Almanac",
chapter = 10,
publisher = "HTTP Archive",
year = "2020",
language = "English",
url = "https://almanac.httparchive.org/en/2020/privacy"
}