Actions

Stylometry

From Whonix

Whonix does not obfuscate a user's writing style. Consequently, unless precautions are taken (see below), users are at risk from stylometric analysis based on their linguistic style. Research suggests only a few thousand words (or less) may be enough to positively identify an author and there are a host of software tools available to conduct this analysis.

This technique is used by advanced adversaries to attribute authorship to anonymous documents, online texts (web pages, blogs etc.), electronic messages (emails, tweets, posts etc.) and more. The field is dominated by A.I. techniques like neural networks and statistical pattern recognition, and is critical to privacy and security. Current anonymity and circumvention systems are focused on location-based privacy, but ignore leakage of identification via the content of data which has a high accuracy in authorship recognition (90%+ probability). [1]

There are multiple ways to conduct statistical analysis on "anonymous" texts, including: [1] [2]

  • Keystroke fingerprinting, for example in conjunction with Javascript.
  • Stylistic flourishes.
  • Abbreviations.
  • Spelling preferences and misspellings.
  • Language preferences.
  • Word frequency.
  • Number of unique words.
  • Regional linguistic preferences in slang, idioms and so on.
  • Sentence/phrasing patterns.
  • Word co-location (pairs).
  • Use of formal/informal language.
  • Function words.
  • Vocabulary usage and lexical density.
  • Character count with white space.
  • Average sentence length.
  • Average syllables per word.
  • Synonym choice.
  • Expressive elements like colors, layout, fonts, graphics, emoticons and so on.
  • Analysis of grammatical structure and syntax.

Fortunately research suggests that if users purposefully obfuscate their linguistic style or imitate the style of other known authors, this is largely successful in defeating all stylometric analysis methods so they are no better than randomly guessing the correct author of a document. However, using automated methods like machine translation services do not appear to be a viable method of circumvention. [1]

Mitigating the threat of stylometric analysis is further documented on this page: Surfing Posting Blogging.

See also:

Interesting things for further research, development:

Footnotes[edit]


No user support in comments. See Support. Comments will be deleted after some time. Specifically after comments have been addressed in form of wiki enhancements. See Wiki Comments Policy.


Anonymous user #1

5 months ago
Score 0 You

Abbreviations Misspelling Vocabulary Density Percentage of unique words. Use of formal/Informal language Most frequent word Function words POS Tags (VERB, NOUN) Vocabulary usage and lexical density Character count with white space Average sentence length Average syllables per word


Title or Cover Page

Table of Contents

Abstract

Introduction

Research Question

Research Design or Methodology

Preliminary Results

TimeTable

Thesis OutLine or Structure

Significance of Implication of the Study

List of References
Add your comment
Whonix welcomes all comments. If you do not want to be anonymous, register or log in. It is free.


Random News:

There are five different options for subscribing to Whonix source code changes.


https | (forcing) onion

Follow: Twitter | Facebook | gab.ai | Stay Tuned | Whonix News

Share: Twitter | Facebook

This is a wiki. Want to improve this page? Help is welcome and volunteer contributions are happily considered! Read, understand and agree to Conditions for Contributions to Whonix ™, then Edit! Edits are held for moderation.

Copyright (C) 2012 - 2019 ENCRYPTED SUPPORT LP. Whonix ™ is a trademark. Whonix ™ is a licensee of the Open Invention Network. Unless otherwise noted, the content of this page is copyrighted and licensed under the same Freedom Software license as Whonix ™ itself. (Why?)

Whonix ™ is a derivative of and not affiliated with Debian. Debian is a registered trademark owned by Software in the Public Interest, Inc.

Whonix ™ is produced independently from the Tor® anonymity software and carries no guarantee from The Tor Project about quality, suitability or anything else.

By using our website, you acknowledge that you have read, understood and agreed to our Privacy Policy, Cookie Policy, Terms of Service, and E-Sign Consent. Whonix ™ is provided by ENCRYPTED SUPPORT LP. See Imprint.