Titel

From Whonix
Jump to navigation Jump to search

<translate>


Data Collection Techniques

Some of the actual techniques employed by the data miners on the Web are briefly introduced below.

Cookies

Cookies are used to identify and remember a web surfer. Without cookies, certain services would be complicated or impossible to implement. If a user requests a page from a webserver, it cannot readily match requests of previous pages requested from this server to that same user: HTTP is a stateless protocol.

Nevertheless, some services require a sort of memory. Shopping portals are an example: a server has to remember what goods were placed into the virtual shopping cart. This "memory" is usually written in the form of cookies, i.e. small text files which are being sent to you by the server upon every page request. When your browser contacts the server again, it also automatically sends back the cookie stored earlier. The server thereby allocates the right shopping cart to you.

But cookies can also be abused to track your steps on the Internet. This works exceptionally well with web portals (e.g. Yahoo) and search engines (e.g. Google), for you use these a lot in order to reach other websites. With cookies, a web host can record large parts of your surfing behavior over a span of years and easily relate it to you as a person with your "accumulated" profile data. Most Internet users have collected hundreds of cookies from various websites on their PC without their knowledge. The following example shows you just a small amount of cookies you get if you request www.nytimes.com:

Cookies set by nytimes.com

Websites embedding several external ad and tracking services are not unusual and are becoming increasingly common. A study by University of California, Berkeleyarchive.org iconarchive.today icon done in 2011 analyzed the top 100 websites and found 5,675 cookies. Of these, 4,914 cookies were set by third parties, i.e. not by the website the user visited intentionally. While surfing on these 100 websites, data was transmitted to 600 servers with the help of those cookies.

Modern browsers usually integrate an optional function to block cookies, but it has to be activated by the user first. In Tor Browser (which comes with Whonix) it has been activated by default, and in the future, more functions to administrate your cookie collection and preferences may be available.

Evercookies

More than 80% of users disapprove of tracking while surfing the web. Many surfers use browser settings which prevent long-term tracking. Therefore, ad and tracking networks are moving on to use more sophisticated methods to distinguish each user.

  • Flash-Cookies (LSOs) have been deployed for the last several years to recover deleted cookies with the same identification properties. Clearspring Technologies, Inc. had been using this technique successfully (until it got sued in 2010) and boasts of its precise data collection of 200 million Internet users.
  • In a study of the University of California, Berkeleyarchive.org iconarchive.today icon the methods of Space Pencil, Inc., aka KISSmetrics, were exposed which, in addition to cookies and flash cookies, used cache cookies via ETags, DOMStorage and IE-userData in order to distinguish each user. KISSmetrics got sued as well and is going to dispense with using Etags. It is also going to respect the "Do Not Track"archive.org iconarchive.today icon HTTP header.
  • The tracking service Yahoo! Web Analyticsarchive.org iconarchive.today icon is making claims about being able to set cookies on 99.9% of the users. This indicatesarchive.org iconarchive.today icon that cookie-generating JavaScriptarchive.org iconarchive.today icon and/or Flash cookies are deployed.
  • Samy Kamkar shows further possible methods to mark Internet users individually in his article, "evercookie - never forget"archive.org iconarchive.today icon.

Tor Browser (which comes with Whonix) resists evercookies.

Active Web Contents

Web content accessible by browser plugins such as Flash, Java, ActiveX and Silverlight renders the Web more dynamic and colorful but also more dangerous, for they allow websites to execute code on your PC. If executed, these plugins are able to read many details about your computer and network configuration and send it to a remote server. By certain techniques, they can also read and edit files on your machine and in an extreme case even gain complete control over it.

Especially beware of signed Java applets: by accepting its signature, and by extension the applet, the visited webserver automatically receives all user rights on your machine. In particular, it may then read your #IP address, your #MAC Address and even hard disk contents. It does not help to only surf websites you deem trustworthy either. This concept is outdated since nowadays even numerous large and authentic websites are being hacked and filled with malicious code. Only blocking, deactivating, and removing these plugins provide real security.

In Whonix, learning the IP address wouldn't give an adversary any meaningful results. It is either a local IP address shared among all Whonix users or the IP of a Tor exit relay, which is safe as well. The #MAC Address is a virtual one which is shared among all Whonix users as well, and is therefore meaningless. Active content, although it couldn't reveal your real #IP address, is deactivated in Tor Browser by default. See Browser Plugins for a detailed anonymity, security, and privacy discussion of browser plugins in Whonix.

JavaScript

The browser is a bit better protected against attacks on your privacy using JavaScript ("scripts", "active scripting") than against those using the aforementioned plugins. but also it is not completely safe, though. Do not confuse JavaScript with Java or the active Java plugin, respectively, which is a completely different thing despite the similar name (see above).

It is possible to compromise the browser or operating system using software exploits or a maliciously-crafted website. An attacker can, for example, inject malicious JavaScript code by Cross Site Scriptingarchive.org iconarchive.today icon and thus try phishing for login creditials, bank accounts or other sensitive data.

Using JavaScript, it is possible for web masters to access lots of information about your browser, your desktop settings, and your hardware. All this information may be accumulated into an individual fingerprint of a particular user. A user may be recognized by this fingerprint. The anonymity test ip-check.infoarchive.org iconarchive.today icon shows only some examples of values which may be gathered (JavaScript needs to be enabled). It demonstrates the labeling of users by JavaScript, too (same effect as cookies).

ip-check.info returns some false values and confuses TBB usersarchive.org iconarchive.today icon (warchive.org iconarchive.today icon)

Therefore, we recommend you only activate JavaScript if needed and to block it by default. However, there are fingerprinting concerns about disabling JavaScript. See Tor Browser disabling Javascript anonymity set reductionarchive.org iconarchive.today icon

Session Replay Scripts

Enabling Javascript does more than reveal additional information about a user's system and increase the probability of a successful browser exploit. It can also lead to a complete, literal recording of the entire browsing session if the user is unlucky enough to browse one of nearly 500 sites in the Alexa top 50,000archive.org iconarchive.today icon:

You may know that most websites have third-party analytics scripts that record which pages you visit and the searches you make. But lately, more and more sites use “session replay” scripts. These scripts record your keystrokes, mouse movements, and scrolling behavior, along with the entire contents of the pages you visit, and send them to third-party servers. Unlike typical analytics services that provide aggregate statistics, these scripts are intended for the recording and playback of individual browsing sessions, as if someone is looking over your shoulder.

The law as it stands allows corporate entities to embed Javascript functions on sites in order to record highly personal information. This includes what is typed, exact movements of the mouse, and even "co-browsing", whereby an unseen intruder can watch what it is done in real time, without any form of notification. There are few limits to the data harvested; name, email, phone number, address, social security numbers and date of birth are all considered fair game by companies like FullStory, Hotjar, and Smartlook. Many offer the option to explicitlyarchive.org icon linkarchive.org icon recordings to real identities.

Although full or part redactions are attempted on passwords, credit card numbers, CVC numbers, and credit card expiry dates, sensitive information was found to leak in many instances, such as: [1]

  • Passwords entered into registration forms.
  • Leaking of credit card details on payment pages, even in real time.
  • Leaking of specific medical conditions and prescriptions.


The same tracking companies often use insecure HTTP pages to deliver the recording playbacks or publisher page contents, providing an enticing man-in-the-middle attack opportunity for advanced adversaries. [2] Fortunately, disabling Javascript is sufficient to prevent this activity completely, and ad-blocking lists are also useful in preventing data exfiltration. [3]

Conclusion

You should not surf the Internet without a well secured browser, as your PC is otherwise in danger of being exploited quite soon. Instead of configuring the browser yourself, which takes quite some experience and is prone to error, you may use Tor Browser (which comes with Whonix). It is a secured version of Mozilla Firefox that not only blocks most dangerous technologies by default but additionally is equipped with further ample security mechanisms. Most websites will still be reachable. YouTube videos and videos of other such portals which are rendered by Flash may be downloaded with special software and then viewed with a video player. Websites which demand usage of active plugins should be avoided if at all possible; otherwise see Browser Plugins.

Fingerprinting of Browser (HTTP) Header

With every request for a webpage, browsers send information within the framework of the HTTP protocol that can be analyzed by the visited site: language, browser name and version, operating system and version, supported character sets, files, codecs and the last visited webpage. Sending these headers is usually not necessary for rendering websites, but it can be exploited for reidentifying, profiling, and analyzing websurfers. The EFF's Panopticlickarchive.org iconarchive.today icon project demonstrated browser fingerprinting. Most surfers are traceable by a unique browser fingerprint.

Of late, different filter applications and services have been developed that allow hiding or changing problematic browser headers (e.g. Privoxy). Unfortunately, these applications cannot filter encrypted connections: once you load a presumably "secure" website (HTTPS) all filtering fails. Plus, these programs allow every user to define the header data himself. But setting an individual browser type in itself is what renders you perfectly trackable. [TorBrowser] always sends the same profile for encrypted connections too. This guarantees that websites may at most realize that it is a Tor Browser user visiting; it will not be able to tell who the user is or what the user is visiting.

Tor Blog: EFF's Panopticlick and Torbuttonarchive.org iconarchive.today icon

Fingerprinting defense is not perfect yet in any browser. There are still open bugs. (See tbb-linkabilityarchive.org iconarchive.today icon and tbb-fingerprintingarchive.org iconarchive.today icon.)

Browser History and Cache

A publication from the University of Californiaarchive.org iconarchive.today icon provides an analysis of the top 50,000 websites. 1% of these websites collect information about web surfers via history sniffing. Using malicious JavaScript code and CSS hacks, information about previously visited websites were collected. Webmasters who are not familiar with sniffing technologies can use services like Tealium or Beencounter for real-time history sniffing.

Collected information is not only used for advertisements. It can be used for de-anonymization of surfers too. A publication from Isecarchive.org iconarchive.today icon shows one possible way. Using the browser history the visited groups of the social network site Xing were collected. Because there are generally no two people who are members of the exact same set of groups in a social network, it was possible to get the real names and e-mail addresses of the users.

By certain trickery, websites can tell which websites are saved in your browser history. For this, the visited website embeds special formatting commands (CSS Stylesheets) that contain external links "of interest" on the pages you visit. If you have visited one of the external websites before, your browser will react by executing a command defined in the format, e.g. download a small picture from the website. The website can thereby largely guess the contents of your browser history.

From the contents of your browser cache one can conclude already cached, thus previously visited, websites. Together with every website an ETag is sent by the server and stored in the browser cache. If the website was called again, the Etag is sent first to ask for changes. This tag may contain a unique user ID. KISSmetricsarchive.org iconarchive.today icon was using ETags in this way to identify visitors of some TOP100 websites.

Additionally, the time required for loading a website changes when part of it is already in the browser cache. By subtle placement of the images on the website, the server can analyze the cache one by one.

At the moment, there is no reliable protection against the analysis of browser histories apart from deactivating this feature, which has been made the default in Tor Browser.

Unlike deactivating your browser history, deactivating your cache would have tremendous effect on your surfing speed, which is why we don't recommend it. In Tor Browser a protective mechanism has been integrated instead which bypasses the cache for third party content. Also, the cache is deleted automatically when you close the browser. A website can thus only gain information about itself, not about other websites.

Webbugs and Banner Ads

Very likely, you will find one or more #cookies in your browser from data miners such as doubleclick.comarchive.org iconarchive.today icon, advertisement.comarchive.org iconarchive.today icon or Googlearchive.org iconarchive.today icon, although you may have never even visited their websites. This is due to the fact that these enterprises use, on other web sites, a simple trick to nevertheless plant cookies in your browser and watch your browsing: Webbugs.

"Webbugs" are usually pictures of 1 pixel by 1 pixel which are therefore invisible to the viewer. However, they can also be coded into banner ads embedded in a website. The website contains a picture (webbug) that is loaded from another server running a statistics service (such as Doubleclick or Google Analytics). Thereby the statistics service may set or edit a cookie in your browser unnoticeably. The browser will then send this cookie back to the statistics service with every new request for a site where any webbug of this service is embedded. If the service is used on many different websites, it can now track large parts of your browsing session. If the owner of the statistics service moreover collaborates with the owner of your preferred search engine, he gets an almost complete picture of your Internet activities.

The privacy functions of most current browsers of simply rejecting cookies and/or deleting them upon browser shutdown do not achieve optimal protection from web bugs.

To prevent session tracking, all cookies should be blocked by default if possible and only allowed in if needed for the duration of the session. Tor Browser is therefore preconfigured to deny all cookies but allow single websites at the expense of two mouse clicks. We recommend allowing cookies only on a temporary basis, so that they will be automatically blocked again after the session.

Another nasty feature of webbugs is, that they send, besides cookies, also your #IP address to the statistics service upon request. Even with a very good browser configuration, including switching off cookies and using webbug filters, you are never able to reliably prevent this. The only effective methods of protection against this are anonymization services like Tor.

TCP Timestamps

The Transmission Control Protocol (TCP) is a session-layer protocol for transferring data between computers. It is necessary for using Internet protocols like http (WWW), smtp (E-Mail) and ftp. When your computer sends a request for a web site, for example, this data is sent within many small TCP packets. Besides that request data, a TCP packet also contains some optional information fields in the header (metadata). One of those optional fields is the TCP timestamp. The value of this timestamp is proportional to the current time of your computer and is incremented according to your computer's internal clock.

The timestamp may be used by the client and/or server machine for performance metrics and optimization. However, an Internet server may recognize and track your computer by observing those timestamps: By measuring the clock skew of the timestamps, it may calculate an individual clock skew profilearchive.org iconarchive.today icon for your computer. Moreover, it may estimate the time when your machine was last booted. These tricks work even if you have otherwise perfectly anonymised your Internet connections.

If you are using Whonix, however, you are protected against being observed this way. The clock in Whonix-Workstation does not match your clock on the host and the clock in Whonix-Workstation is set securely by sdwdate over https, which will result in slightly different results compared to using the more accurate NTP.

If you are using Tor, you are also being protected against being profiled in another way: The Tor relays automatically replace your potentially insecure TCP packets with their own.

source 1: Tor trac #8169 replace TCP timestamparchive.org iconarchive.today icon source 2: Tor wiki FAQarchive.org iconarchive.today icon

IP Address

The IP address is given to you by your provider on dialing into the Internet. The provider usually saves it for months or even years together with your customer data and your online time. It is your distinct identifier on the Internet which is sent along whenever you make a direct connection to any Internet service. The IP address tells the server where to send his response. As long as your IP does not change, it is easy to monitor when and what website you have contacted. The IP also reveals your provider, your location (many times) and sometimes (in case of a company or computer center) even what terminal you are on. In many cases, an IP address relates directly to one person.

All that your IP-address is revealing:

  • Your current whereabouts

The country and the city/region where you are. With the help of databases free of chargearchive.org iconarchive.today icon or with costsarchive.org iconarchive.today icon even districts and office buildings can be identified. This is called geolocation.

  • Your Internet-provider

Personal data can be retrieved using your provider.

  • Your access technology

With the help of databases one can find out whether you are using, for instance, DSL, a modem or a mobile device to surf the Web.

  • Your company / your authority

In the case where you are surfing from within the network of a company or an authority, its name can be found out.

Some of the information that is given away by your IP or browser can be reviewed on ip-check.infoarchive.org iconarchive.today icon.

While the traces mentioned so far can be blurred without any special services needed, the same cannot be said about your IP address. That is why Tor has been developed: In order to blur any connection between your IP and the websites you visit. Whonix connects to the service Tor network. How Tor worksarchive.org iconarchive.today icon

MAC Address

The MAC address (MAC=Media-Access-Control, sometimes also called Ethernet-ID, Airport-ID or physical address) is the hardware address of each individual network device. Each computer may have several of such physical or virtual network devices (bound to a cable (LAN), wireless (WLAN), mobile (GPRS, UMTS), virtual (VPS), ...). The MAC address serves as a unique identifier for the respective device in a local area network. On the Internet, it is neither used nor transmitted. Also, your access provider may only see it if your computer is not connected to the Internet over a router, but directly, for example by a modem. You may moreover change the MAC address yourselfarchive.org iconarchive.today icon.

There is an extra chapter about the MAC address later in documentation in the Computer Security Education.

HTML5 Canvas Image Data

Websites routinely request browser configuration settings in order to help select the best page format for the visitor. One of those variables is HTML5 canvas image data, because it relates to graphical rendering. Canvas is a drawable region in HTML code with height and width attributes, and Javascript code can access this area though a large set of drawing functions related to animation, games, images and so on.[4]

When combined with other exposed browser settings, this can be enough to uniquely identify an individual, even without access to the specific IP address. [5]

The Tor Project provides a good explanation of this fingerprinting method: [6]

After plugins and plugin-provided information, we believe that the HTML5 Canvasarchive.org iconarchive.today icon is the single largest fingerprinting threat browsers face today. Studiesarchive.org iconarchive.today icon showarchive.org iconarchive.today icon that the Canvas can provide an easy-access fingerprinting target: The adversary simply renders WebGL, font, and named color data to a Canvas element, extracts the image buffer, and computes a hash of that image data. Subtle differences in the video card, font packs, and even font and graphics library versions allow the adversary to produce a stable, simple, high-entropy fingerprint of a computer. In fact, the hash of the rendered image can be used almost identically to a tracking cookie by the web server.

The Tor Browser is patched to prompt before returning valid image data to the Canvas APIs. By default, if the site hasn't been given previous permission to extract canvas image data, then white image data is returned to the Javscript APIs. Third parties are not allow to extract canvas image data though.

If users are browsing and are prompted with a message like the following, they are recommended to select n.

This website (github.com) attempted to extract HTML5 canvas image data, which may be used to uniquely identify your computer.

Should Tor browser allow this website to extract HTML5 canvas image data? 

License

Gratitude is expressed to JonDosarchive.org icon for permissionarchive.org icon to use material from their website. The DataCollectionTechniques page contains content from the JonDonym documentation DataCollectionTechniquesarchive.org iconarchive.today icon page.

Notification image

We believe security software like Whonix needs to remain open source and independent. Would you help sustain and grow the project? Learn more about our 13 year success story and maybe DONATE!

</translate>

  1. https://freedom-to-tinker.com/2017/11/15/no-boundaries-exfiltration-of-personal-data-by-session-replay-scripts/archive.org iconarchive.today icon
  2. Reinforcing the perception that the private sector really is a comfortable and principal ally in the surveillance-industrial complex.
  3. For instance, the EasyList and EasyPrivacy blocking lists that are available in popular extensions. However, they did not block all the major companies at the time of writing.
  4. https://en.wikipedia.org/wiki/Canvas_elementarchive.org iconarchive.today icon
  5. https://tor.stackexchange.com/questions/4029/html-5-canvas-imagedata-extraction-what-does-it-actually-meanarchive.org iconarchive.today icon
  6. https://www.torproject.org/projects/torbrowser/design/archive.org iconarchive.today icon