The World Wide Web And Your Privacy
For sure, you can no longer imagine day-to-day life without the World Wide Web. Those numerous services such as online banking, travel information, encyclopedias, and the like mean great convenience in solving common tasks. Furthermore, you probably surf entertainment and shopping portals, stay in touch with friends over social networks or share your common interests with others in forums. To access the Web you are offered dozens of stable, highly functional, yet easy to use applications: Web Browsers. The most popular browsers are Microsoft's Internet Explorer, Mozilla Firefox, Opera, Apple's Safari and Google Chrome.
Any communication on the Internet leaves all kinds of digital traces behind which can be automatically acquired, saved and analyzed. Some companies have thus specialized in creating individual user profiles from surfing-related data. These databases are of high economic value since they allow an enterprise to comprehensively profile their customers: that means you. This process is called data accumulation or data enhancement in computer lingo, "data mining".
There are many reasons why one should avoid leaving digital traces when surfing: part of the data collected finds its way into credit scoring systems which are used to evaluate loan requests, to create individually priced offers or to decide on eligibility for C.O.D. service. Employers may create or request a character profile of their job applicants from traces on the Net prior to hiring them. Freedom of opinion is limited by governments or institutions that track individuals and what they read and say. These entities may even deny access to certain web pages or services. Companies may recognize employees of other businesses or even those of their competition and subsequently harass them with promotional calls or email spam. Browser-related data exposes vulnerabilities in the surfing machine. An hacker may subsequently establish communications with the computer directly and attack it.
Further problematic is that these traces are collected, saved, sent and processed without your consent and most likely without your knowledge. Decisions about you may be based on this information.
In the following section we briefly introduce three popular companies collecting and mining data.
It is generally known that Google's business model is based on the collection of data and the analysis thereof. But many users have no conception regarding the comprehensiveness of personal profiles and the worth of these data.
Economic figures: According to an estimation done by experts there are ca. 1.5 million servers working for Google in different data centers. Every three months that amount is growing by 100,000. The annual costs of this infrastructure are approximately 2 billion dollar. Google's overall revenue is about 30 billion dollar every year with an annually profit of 7 billion dollar. 96 percent of Google's revenue is generated by personalized advertisements. (as at 2009)
The whole infrastructure may be used for free. It is not paid by money but by data. According to the Electronic Frontier Foundation (EFF) Google is logging the traffic which can be linked to a particular person unambigously examining various characteristics. This affects the deployment of the search engine, of Google services like YouTube or Google Earth and applies as well to flashing advertisements on other web sites and of course to tracking tools like Google Analytics. The basic data is combined to comprehensive profiles of individual users surfing the web. Due to its popularity Google is almost able to capture the whole searching and surfing behavior. In Germany 89 percent of search requests go directly to Google. Besides, 85 percent of german web sites are contaminated with elements (Google Analytics, flashing of advertisements et cetera) allowing Google to track users across multiple web pages.
How exact and comprehensive Google's personal profiles are is hard to say. As a basis to estimate this one can use the data the company is providing to its advertisement partners. The following figure shows the aggregated statistics of a not further mentioned web site:
Besides age and gender Google is able to estimate the education level and income of almost all web surfers. Additionally, there are their interests, political orientation and contact addresses (e-mail, instant messaging) that are not mentioned here but nevertheless collected by Google as well. As the Wall Street Journal writes in an analysis there are even ways to assess the likelihood of a payment by credit card. The researchers Bin Cheng and Paul Francis from Max Planck Institute for Software Systems show that it is possible to ferret out gay users by anaylizing clicks on advertisements. Their method can be adapted to any kind of questions and may for instance be used to deliver individual advertisements.
The tracking and observation of users can be detected especially good in the case of retargeting. If one does not buy anything while visiting a web shop one is often overwhelmed with advertisements of similar products in the aftermath. Google is offering a special AdSense program with retargeting.
The company RapLeaf is collecting data profiles via e-mail addresses. The data is not used for personalized advertisements. Rather, it is just sold. As a potential buyer one is passing a list of e-mail addresses to RapLeaf and gets the profiles back (according to the comprehensiveness intented) after paying the bill. The following is a short abridgment out of the price list (as at 2011):
- Age, Gender and Location: 0 Cent (loss leader)
- Household income: 1 Cent per e-mail address
- Marital Status: 1 Cent per e-mail address
- Presence of Children: 1 Cent per e-mail address
- Home Market Value: 1 Cent per e-mail address
- Loan-to-Value Ratio: 1 Cent per e-mail address
- Available Credit Cards: 1 Cent per e-mail address
- Cars in Household: 1 Cent per e-mail address
- Likely Smartphone User: 3 Cent per e-mail address
- Occupation and Education: 2 Cent per e-mail address
- Blogger: 3 Cent per e-mail address
- Charitable Donor: 3 Cent per e-mail address
- High-End Brand Buyer: 3 Cent per e-mail address
- Interested in Books/Magazines: 3 Cent per e-mail address
The data is gathered by correlating e-mail usage with surfing behavior or obtained via the time and time again happening data leaks at online merchants' platforms. One of the main investors is Peter Thiel, who founded PayPal and is co-determining the development of Facebook in the background in a significant way. It has to be assumed that RapLeaf uses data collections of these internet companies as well. Furthermore, data stemming from Twitter and other data bases offering commercial access is included into the processing.
Economic figures: The "social net" Facebook is supposed to have 600 million users worldwide. The stock exchange value of the company is estimated to 50 billion dollar and the profit was 353 million dollar in 2010. The earnings amount to 4-5 dollar for every user in one year. (as at 2011)
Facebook is for free and the user's data is the basis for the commercial success as well. Data that is entered intentionally and in a controlled way is only playing a minor role here. Far more important is the information extracted out of the users' behavior. This data and the derived information is not controlled by users. But they have agreed to a commercial use of them when they registered themselves.
Connections between friends
The contact relationships of users are analyzed with different goals in mind. First of all they are used for friend-2-friend advertisements. Many users are not aware of the fact that they serve as an advertising medium. The targeted advertising gets improved by identifying opinion leaders in contact networks that get addressed (e.g. in order to publish sponsored stories).
Another way of analyzing contacts is demonstrated by Gaydar. Looking at the contacts in Facebook profiles MIT students extracted the sexual orientation of the respective account owner. Such kind of information could influence a career.
Analyzing gaming behavior
Facebook offers different games for its members like "Farmville" or "Mafia Wars". The moves of the participants are analyzed and character traits are derived. The profiles are then commercialized. Companies can buy profiles of potential applicants.
Publishing private pictures is one of the most popular activities on Facebook. Meanwhile a software for facial recognition is deployed. According to Facebook several million persons are identified on uploaded pictures daily.
With it a huge database is created which shall be used commercially. In the future, customers entering a shop could get identified by a camera and the salesman could access a comprehensive personal profile (for money).
"Show us 14 photos of yourself and we can identify who you are. You think you don't have 14 photos of yourself on the internet? You've got Facebook photos!" said Eric Schmidt (then CEO of Google) on the Techonomy conference August 4, 2010.
Secret services, army and law enforcement agencies
"Mir sind keine Datenschutzbestimmungen von Facebook bekannt, die diesen Namen verdienen. Es handelt sich um Nutzungsregelungen, die grob nach dem Muster ablaufen: Du Nutzer bist für alles verantwortlich, was Du bei uns machst. Und wir dürfen mit den Daten dann alles machen, was uns gefällt." (Dr. Thilo Weichert, data protection commissioner of Schleswig-Holstein (Germany))
Facebook is going to be a "net within the net" with orwellian visions.
Economic figures: Twitter has an estimated amount of 200 million users. Every single user causes costs of about 1 dollar a year. (as at 2010)
Contrary to Google and Facebook Twitter did not manage it to use the data itself profitably. The earnings out of the advertisements fell far short of expectations.
Twitter's business model is selling access to its database. Twitter is providing 40 parameters (content, location, date, account, used software, language, retweets...) for every Tweet. Paying 60,000 dollar a year one can access 5 percent of the Tweets, paying 300,000 dollar a year one can access the whole database. Twitters database is a rich source of information that may not be found somewhere else for market research purposes, advertisements or secret services.
Gratitude is expressed to JonDos for permission to use material from their website. (w) (w)  The TheWorldWideWebAndYourPrivacy page contains content from the JonDonym documentation TheWorldWideWebAndYourPrivacy page.
Impressum | Datenschutz | Haftungsausschluss
This is a wiki. Want to improve this page? Help is welcome and volunteer contributions are happily considered! See Conditions for Contributions to Whonix, then Edit! IP addresses are scrubbed, but editing over Tor is recommended. Edits are held for moderation.Whonix (g+) is a licensee of the Open Invention Network. Unless otherwise noted, the content of this page is copyrighted and licensed under the same Libre Software license as Whonix itself.