The World Wide Web And Your Privacy
- 1 Introduction
- 2 Google
- 3 RapLeaf
- 4 Facebook
- 5 Twitter
- 6 Footnotes
- 7 License
It is difficult for most to imagine day-to-day life without the Internet. Online services provide convenience and the ability to solve common tasks and problems. Indeed, the reader may have today used online banking services, purchased goods or services from websites, researched a specific topic, sourced travel information, browsed entertainment or news sites, communicated with friends on social media, or shared opinions in forums with like-minded people.
To access the Internet, the user is offered dozens of stable, highly functional, and yet easy-to-use applications: web browsers. In 2018, the most popular browser by far is Chrome, with nearly 80 per cent of browser market share. Lagging well behind is Mozilla's Firefox, Microsoft's Internet Explorer / Edge, Apple's Safari, and Opera. 
|Without precautions, any communication over the Internet leaves a raft of digital traces which can be automatically acquired, saved, attributed, and analyzed.|
Unsurprisingly, some companies have entered the legal and regulatory void and specialized in harvesting all accessible data. An untold number of individual user profiles are generated, storing data from browsing, communications and other activities. These databases have a high commercial value since they allow an enterprise to comprehensively profile the behavior and interests of their customers, which allows for targeted advertising. 
Modern surveillance capitalism dictates that you are the product, leading to profiles being on-sold to other third parties. In computer lingo, this process is called data accumulation or data enhancement, but the layperson knows it as "data mining". There are many reasons to avoid leaving digital traces when browsing. Consider the following examples:
- Some of the data collected is used by credit scoring systems which are used to evaluate loan requests, to create individually priced offers or to decide on eligibility for "Cash/Collect on Delivery" service.
- Employers may create or request a character profile of their job applicants from traces found on the Internet prior to hiring them.
- Freedom of opinion is limited by governments or institutions that track individuals and what they read and say. These entities may even censor (block) certain web pages or services.
- Companies may recognize employees of other businesses or even those of their competition and subsequently harass them with promotional calls or email spam.
- Browser-related data exposes vulnerabilities in the user's computer, smartphone or peripheral electronic devices. A hacker may subsequently establish direct communications with the computer or device and attack it.
- Perhaps worst of all, digital traces are collected, saved, sent and processed without the explicit or implicit consent of the user, and often without their knowledge.
The key message is that decisions about an individual's life are frequently based on the digital litter they unknowingly scatter throughout the Internet. In the following section, the user is briefly introduced to several popular companies that are notorious for collecting and mining data.
It is generally known that Google's business model is based on the collection of data and the analysis thereof. However, many users have no concept of the comprehensiveness of Google's surveillance, nor the extensive (and highly profitable) personal profiles generated from harvested data.
In general, Google is dismissive of the universal right to privacy, explaining their unsavory history of working hand-in-glove with government. The then CEO Eric Schmidt declared in 2009: 
"If you have something that you don't want anyone to know, maybe you shouldn't be doing it in the first place. If you really need that kind of privacy, the reality is that search engines including Google do retain this information for some time and it's important, for example, that we are all subject in the United States to the Patriot Act and it is possible that all that information could be made available to the authorities."
Eric Schmidt is also supremely confident of the extent of Google's data collection, boasting in 2010: 
With your permission, you give us more information about you, about your friends, and we can improve the quality of our searches. We don't need you to type at all. We know where you are. We know where you've been. We can more or less know what you're thinking about.
Ever the accommodating corporate partner, Google is also fond of censoring user content when government pressure is applied. This includes the recent decisions to tweak their search algorithm to bury left-wing or independent media sites and articles, and the removal of the RT broadcaster from their premium Youtube video inventory.   Popular social media platforms Facebook and Twitter also have a history of kowtowing to government demands in the era of McCarthyite and terrorist hysteria; for example see here and here.
Google Data Collection Techniques
The following is an inexhaustive list of Google techniques to collect personalized data: 
- Pixel tags: Often used in combination with cookies, pixel tags are placed on websites or within the body of an email for the purpose of tracking activity on websites, or when emails are opened or accessed. 
- Data aggregators: Google collects and aggregates data about users through developer tools such as Google Analytics, Google Fonts and Google APIs. This enables IP address tracking across successive sites (cross-domain web tracking). Developers are encouraged to use these tools and communicate IP addresses to Google.
- Google search engine: Google records what a user types into the web address field, sending that information to Google servers which populate search suggestions. If the user is signed into a Google account, those searches are saved in the account's web history. Searches are saved for years, if not indefinitely.
- Google Chrome: A host of user information is captured by Google's browser which is usually set to opt-in (with opt-out often impossible). For instance, even in Incognito mode, the history of programs used to stream media is captured. Chrome enables enhanced user tracking through web advertising, as indicated by Google's 2011 patent on this very issue.
- Other browsers: Google has previously bypassed privacy settings set by other browsers (for example, Safari's cookie blocking mechanism) in order to track online activities.
- Personal information - name, email address, and telephone number.
- Device-specific information - hardware model, operating system, unique device identifiers, mobile network information.
- Log information - search queries, telephony log information (phone number, calling-party number, forwarding numbers, time and date of calls, duration of calls, SMS routing information and types of calls), IP address, browser-specific cookies, and device event information (crashes, system activity, hardware settings, browser type, browser language, the date and time of your request and referral URL).
- Location information - IP address, GPS, and other sensors providing information on nearby services such as Wi-Fi access points and cell towers. It was recently discovered Google continues to track users even after they opt-out of Location History. 
- Unique application numbers - information on application types and version numbers.
- Local storage - storing personal information locally with local browser storage (like HTML5) and application data caches.
- Google Play: A host of clandestine trackers are placed in Android's apps that are downloaded from Google Play. Researchers recently discovered 44 trackers in more than 300 apps for the Android OS that have been collectively downloaded billions of times.
- Gmail: Email content is processed and read (scanned) by a computer for targeted advertising purposes and spam prevention. Under Google policies, there is an unlimited period of data retention and the potential for unintended secondary use of this information. Google has already admitted users have "no reasonable expectation" of confidentiality regarding personal emails.
- Google Street View: Google's online map service has extensively captured pictures of people's private homes, and collected a trove of data on unencrypted public and private WiFi networks world-wide. Google initially defied critics and stated they would not destroy the data until forced by regulators, despite the activity constituting widespread wiretapping.
- Satellite imaging: Google has acquired Skybox Imaging, which runs a network of high-definition, modern satellites which are used to collect and analyze micro-geographical and human data.  
- Student Chromebook users: Google has decided to track and build behavioral profiles of students using school-supplied Chromebooks and Google education apps, leading to the capture of their entire Internet browsing history.  
In summary, Google has built the infrastructure to build a complete profile of a user by name, which combines detailed hardware and software identifiers with everything written in email, every website visited, every search conducted, and nearly all interactions occurring within Google ecosystems and applications.
As well as creating databases on entire populations, Google has previously acquired companies that have received capital from the investment arm of the CIA (In-Q-Tel), and cooperated with the NSA's PRISM surveillance program so they had direct access to company servers. Google also readily responds to US government requests for data on Google users worldwide.
Google is the greatest corporate threat to privacy and liberty worldwide, perhaps explaining why their motto was changed from "Don't be evil" to "Do the right thing" in 2015, since their blatant data-mining practices and hostility to anonymity were no longer defensible. 
Google Revenue Model
Google is a multi-national Internet company based in California, which is itself part of the larger Alphabet parent company as of November 2016. Although it focuses on online search, advertising, cloud computing, and other digital products and services, it's primary source of revenue is from advertising. In 2016, Google earned $89.5 billion in global revenue and had a net income of $19.5 billion. Advertising revenue comprised the lion's share at nearly $80 billion.   This proportion is similar to 2009 figures, when 96 percent of Google's revenue was generated by personalized advertisements.
The advertising revenue itself is a product of profiling captured by browsing, search engine, and other data. For instance, in November 2016 Google was ranked first among the most visited websites with 246 million unique visitors, and around 63 percent of market share among the major US search engine providers.  In 2017, Google's share of the global search engine market is around 77 percent, with the number of annual Google searches in the realm of 1-2 trillion . Other estimates provide a daily figure of 4.5 billion Google searches. 
As at 2009, it was estimated by experts that there were around 1.5 million servers working for Google in different data centers, with a growth rate of an extra 100,000 every three months. The annual costs of this infrastructure are approximately 2 billion dollars. Some later estimates from 2013 are lower at a little more than 1 million servers, but nevertheless the infrastructure is extensive. 
The whole infrastructure may be used for "free" in the monetary sense, but the real cost is the loss of control over extremely personal data. According to the Electronic Frontier Foundation (EFF), Google is logging the traffic which can be unambiguously linked to a particular person and examining various characteristics. This in turn affects the deployment of the search engine and Google services like YouTube or Google Earth. It similarly applies to pop-up/flashing advertisements on other web sites and of course to tracking tools like Google Analytics.
The accumulation of basic data over time leads to comprehensive profiles of individual users browsing the Internet. Due to its popularity and extensive server network, Google is almost able to capture the entire searching and browsing behavior of individuals. In Germany, 89 percent of search requests go directly to Google. Furthermore, 85 percent of German web sites have embedded elements (Google Analytics, flashing advertisements, Google+ widgets and so on) which allow Google to track users across multiple web pages.
It is impossible to know how exact and comprehensive Google's personal profiles are. One rough estimate can be obtained by using the data the company is providing to its advertisement partners. For example, the following figure shows the aggregated statistics of an unnamed website.
Figure: Google Visitor Statistics
In addition to age and gender, Google is able to estimate the education level and income of almost all Internet surfers. Additionally, they gather data in relation to interests, political orientation and contact addresses (e-mail, instant messaging) just to name a few variables. As the Wall Street Journal noted in an analytical piece, there are even ways to assess the likelihood of a person paying by credit card.
Researchers Bin Cheng and Paul Francis from the Max Planck Institute for Software Systems have shown that it is even possible to ferret out gay users by analyzing clicks on advertisements. This method can obviously be adapted to identify people with different interests, allowing the delivery of more individualized advertisements. "Big data" is also used for retargeting users who do not buy anything when visiting an online store. The strategy is to overwhelm those users with advertisements of similar products in the aftermath of their decision; Google is even offering a special AdSense program with retargeting features.
The company RapLeaf is collecting data profiles via e-mail addresses. The data is not used for personalized advertisements, but rather, it is just sold. Potential buyers pass a list of e-mail addresses to RapLeaf and receive the profiles back after paying the bill. The cost is dependent on how comprehensive the profiling is. For example, the following is a short abridgment out of the 2011 price list:
- Age, Gender and Location: 0 Cent (loss leader)
- Household income: 1 Cent per e-mail address
- Marital Status: 1 Cent per e-mail address
- Presence of Children: 1 Cent per e-mail address
- Home Market Value: 1 Cent per e-mail address
- Loan-to-Value Ratio: 1 Cent per e-mail address
- Available Credit Cards: 1 Cent per e-mail address
- Cars in Household: 1 Cent per e-mail address
- Likely Smartphone User: 3 Cent per e-mail address
- Occupation and Education: 2 Cent per e-mail address
- Blogger: 3 Cent per e-mail address
- Charitable Donor: 3 Cent per e-mail address
- High-End Brand Buyer: 3 Cent per e-mail address
- Interested in Books/Magazines: 3 Cent per e-mail address
The data is gathered by correlating e-mail usage with browsing behavior or via data leaks which frequently occur when using online merchants' platforms. Also, data arising from the use of Twitter and other commercially available databases from large Internet companies is included in the processing. It can be reasonably assumed that RapLeaf also uses the data collection services of other profiling companies as well. One of the major RapLeaf investors is Peter Thiel, who founded PayPal and is significantly contributing to Facebook development in the background.
Thirteen years after its founding, Facebook has around 2 billion total users and 1.3 billion daily active users across all devices.  The amount of activity conducted on the site is incredible: 
- Daily, around 300 million photos are uploaded.
- In 2013, 4.75 billion pieces of content were shared daily.
- Every 60 seconds, 510,000 comments are posted and 293,000 statuses are updated.
- In 2012, one in every 5 page views in the US occurred on Facebook.
- Also in 2012, five new profiles were created every second.
Facebook Revenue Model
Facebook is a veritable paradise for data-miners and advertisers. In fact, 42% of marketers report Facebook is critical or important to their business.  In early 2016, Facebook confirmed it had three million active advertisers, and 70% of those were outside the US. 
The majority of Facebook's revenue relies on click-through rates for various advertised products and services, and the ability to build an extensive profile of each user, thus allowing targeted advertisements and creation of valuable data-sets.  At the end of 2016, Facebook's revenue jumped to $8.8 billion, with advertisement revenue comprising 94% of the total ($8.3 billion)  The estimated net worth of the company is $500 billion. 
Facebook has nearly perfected an expansive ecosystem that entices users to comprehensively populate their own monetized, digital profile over time. Extensive profiling and tracking is made possible via user profiles, user groups, personal timelines, comments, network connections with other individuals, photos, software applications, games, "like" and "subscribe" buttons, reading and sharing of news feeds, instant messaging, video/voice chat, and cross-domain tracking via "like" and "share" buttons on more than 10 million third-party websites (in 2014). 
While Facebook is extremely profitable, it comes with great societal, emotional and political costs. Not only is Facebook a toxic host sucking users' digital wells dry, but it has also enforced a real user name policy to aid profiling, and regularly censors users and news it finds unpalatable. Further, it is regularly targeted by advanced adversaries because the treasure trove of personal data makes it an attractive target. 
Chronic use of Facebook is linked with negative physiological effects like jealousy, stress, and social media addiction.  Facebook is also fond of conducting unannounced psychological experiments, such as feeding users selective (biased) articles via its News Feed algorithm to skew opinions.  In another case, Facebook manipulated the emotions of users by secretly changing information that was posted on 100s of thousands of home pages. 
Perhaps worst of all, any data harvested by Facebook is likely to be kept indefinitely. Facebook keeps broadening the scope of their data collection over time, and is now threatening the privacy of individuals who avoid their services altogether. For example, automatic facial recognition software has already been applied to uploaded photos, which means that if a person's photo is also tagged by name, Facebook now has a permanent and identifiable face-print linked to a unique individual.  The significant privacy concerns posed by Facebook are explored in further depth below.
Facebook's CEO, Mark Zuckerberg, has shown complete disregard for the privacy rights of users in the past. When the fledgling social media enterprise was first founded in 2004, a then 19 year old Zuckerberg revealed his true thoughts in a number of instant messages to Harvard friends: 
Zuckerberg jokes in another exchange that 4,000 people have submitted emails, pictures and addresses to his budding Harvard social network. "People just submitted it ... I don't know why ... They 'trust me' ... dumb fucks."
Facebook would not confirm or deny that the messages were authentic when asked on Friday, but Zuckerberg told the New Yorker in September 2010 that he absolutely regretted sending them.
"If you're going to go on to build a service that is influential and that a lot of people rely on, then you need to be mature, right? I think I've grown and learned a lot," Zuckerberg told the magazine in 2010.
It is true that Zuckerberg has matured, but only in the sense that he has perfected a data-mining business empire of unparalleled proportions.
Connections Between Friends
The contact relationships of users are analyzed with different goals in mind. First of all, they are used for friend-to-friend advertisements. Users are often blissfully unaware that they serve as an advertising medium. Targeted advertising is improved by identifying opinion leaders in contact networks, so sponsored stories can be published.
Another way of analyzing contacts is demonstrated by Gaydar. Looking at the contacts in Facebook profiles, MIT students were able to extract the sexual orientation of the respective account owner. This kind of information can significantly damage a person's career or even place them in jeopardy in certain jurisdictions. Further, similar analysis can be used to determine political orientation, interests and other variables by public or private entities.
Analyzing Gaming Behavior
Facebook offers different games for its members like "Farmville" or "Mafia Wars". The moves of the participants are analyzed and character traits are derived. Eventually, the user profiles are commercialized, allowing companies to buy the profiles of potential applicants.
Publishing private pictures is one of the most popular activities on Facebook. However, most do not realize that facial recognition software has been deployed, allowing Facebook to identify several million people daily: 
Every time one of its 1.65 billion users uploads a photo to Facebook and tags someone, that person is helping the facial recognition algorithm. The tag shows the algorithm what someone looks like from different angles and in different lights, Frankle says. If you give Facebook a face to identify, it has fewer photos to parse through, because it's only looking at photos of you and your friends.
Facebook, according to the company, is able to accurately identify a person 98 percent of the time. Compare that with the FBI's facial recognition technology, Next Generation Identification, which according to the FBI, identifies the correct person in the list of the top 50 people only 85 percent of the time.
With such a huge database of verifiable faceprints, Facebook will be sure to eventually commercialize this database. In one possible dystopian future, customers may enter a shop and be automatically identified by camera, thus allowing the salesperson to immediately access a comprehensive personal profile (if Facebook services have been purchased). As the then CEO of Google, Eric Schmidt, noted in 2010: 
Show us 14 photos of yourself and we can identify who you are. You think you don't have 14 photos of yourself on the internet? You've got Facebook photos!
It is logical that a data warehouse of people's entire lives should become a prime target for attack from agencies who crave access to it. Public authorities and secret services are already accessing the information gathered by Facebook to snoop in users' private lives.
"Mir sind keine Datenschutzbestimmungen von Facebook bekannt, die diesen Namen verdienen. Es handelt sich um Nutzungsregelungen, die grob nach dem Muster ablaufen: Du Nutzer bist für alles verantwortlich, was Du bei uns machst. Und wir dürfen mit den Daten dann alles machen, was uns gefällt." (Dr. Thilo Weichert, data protection commissioner of Schleswig-Holstein (Germany))
In summary, if Facebook is analyzed objectively, then it is clear it has already morphed into a "net within the net", with horrible Orwellian visions.
Twitter was founded in March, 2006 and is based in San Franciso, California. In 2017, Twitter is valued at $16 billion and it is a very active platform: 
- There are around 330 million users.
- 100 million users are active daily.
- 500 million tweets are sent daily.
- Most users prefer mobile platforms (80%).
- Nearly 80% of all accounts are outside the US.
Twitter Revenue Model
Twitter's revenue in 2016 was $2.5 billion, but they made a net loss of $457 million.  The majority of this stemmed from advertising revenue (85%), which in turn is sourced primarily from mobile advertising; 88% of total advertising revenue.   
Computer systems are already aggregating trillions of tweets from the microblogging site, sorting and sifting through countless conversations, following the banter and blustering, ideas and opinions of its 288 million users in search of commercial opportunities.
Selling data is as yet a small part of Twitter’s overall income $70m out of a total of $1.3bn last year, with the lion’s share of cash coming from advertising, but the social network has big plans to increase that. Its acquisition of Chris Moody’s analytics company Gnip for $130m last April is a sign of that intent.
Twitter can match users against a company's customer database for targeted advertising. There are various matching methods available, including by using emails. For example, if an auto company knows that a user is interested in buying a new car, then Twitter can be used to send a direct advertisement. The data profiles are also onsold to other social networks like Facebook and photo-sharing site Tumblr.  
In an identical fashion to Facebook and Google, Twitter has also adopted tracking of Internet browsing via "tweet" widgets embedded in millions of websites.
- Basic account and contact information like name, username, email address, or phone number.
- Any additional information provided by the user such as address book contacts.
- Obviously public tweets, following, lists, profile and other visible information.
- Location information via GPS, wireless networks, cell towers, and IP address.
- Interactions with links across Twitter services such as email notifications, third-party services, and client applications.
- Website usage data via persistent and session cookies.
- Interaction with Twitter content, even if a user has not created an account. "Log Data" includes: 
- IP address.
- Browser type.
- Operating system.
- Referring web page.
- Pages visited.
- Mobile carrier.
- Device information.
- Search terms.
- Cookie information.
If the reader is in any doubt that Twitter users are the actual commodity, consider this: 
In conclusion, Twitter is yet another US-based data hoarder and trader preying on a multitude of Internet users. Yet, Twitter is statistically small fry in comparison to Facebook and Google, since it has not yet learned to fully leverage their personal data-sets to turn a consistent profit. However, with losses growing smaller by the quarter and rapid growth in the user base, they may soon cement their position as the third major player in the digital data trade.
- Opera is owned by a Chinese consortium - Golden Brick Capital - as of 2016.
- The corollary is that government bodies pursue the same profiling behavior for targeting persons or entities of interest.
- Of interest is that in 2007, Google's cookies had a life span of more than 32 years. Google historically has shared information with law enforcement and government agencies, without review or approval of any court.
- Google's insistence on real-name policies for Gmail and Youtube accounts, along with strict measures to prevent signing up via Tor, have significantly contributed to user profiling. Google has also dropped its ban on personally-identifiable information in advertisement services.
- Meaning Google applications continue to store time-stamped location data without user input.
- The satellites are allegedly powerful enough to see what is on your desk from orbit.
- Ironically, in this case the "right thing" is whatever government tells them it is; usually unfettered access to their extensive data records and assistance with military projects (such as the recent Pentagon contract for advanced AI technology for drones).
- Facebook was founded in 2004.
- For example, Facebook forms a core part of the NSA's PRISM program.
- Twitter have also decided to censor tweets they do not like, for instance, those pertaining to political figures or Wikileaks.
- Snowden records also show that advanced adversaries monitor Twitter and collect profiles.
- This is allegedly deleted after 18 months.
Gratitude is expressed to JonDos for permission to use material from their website. (w) (w)  The TheWorldWideWebAndYourPrivacy page contains content from the JonDonym documentation TheWorldWideWebAndYourPrivacy page.
This is a wiki. Want to improve this page? Help is welcome and volunteer contributions are happily considered! See Conditions for Contributions to Whonix, then Edit! IP addresses are scrubbed, but editing over Tor is recommended. Edits are held for moderation.