Actions

Metadata

From Whonix

Introduction[edit]

Ambox warning pn.svg.png Warning: Office documents, pictures, videos and other files contain significant information in the meta tags that may de-anonymize the author. Before they are uploaded to the Internet or shared, this metadata should be removed.

For more information about metadata, refer to the Metadata anonymisation toolkit v2 (MAT2) Debian package or the MAT2 homepage. Additional information can be found on the Warning page; see Whonix ™ does not clear Document Metadata.

Metadata Risk[edit]

Metadata attached to files cannot be used to de-anonymize the user if the guidelines in this section are followed. However, whistleblowers should be aware of a host of other metadata and techniques that can be used to narrow the search for (or identify) leakers, including: [1]

  • A list of persons who searched for, accessed or printed relevant documents.
  • Persons inserting hardware devices like USBs into corporate computers, or those taking screenshots.
  • Location data for handheld devices.
  • Downloads and use of Tor Browser, Tails, Whonix or other anonymity, privacy, security and encryption (or related) software which is relatively unpopular.
  • Inspection of ISP/corporate metadata associated with:
    • Usernames, email addresses, physical addresses, phone numbers and credit card numbers.
    • Internet IP addresses and log on data.
    • Clearnet browsing and use of the Tor network.
    • All communications metadata, including the type, source and destination, and the file size and duration of the communication. This includes emails and (encrypted) messaging.
  • Via search warrants, sourcing all data from Google, Facebook and other corporate accounts; for example, all Gmail messages, Google History, web browser activity based on web browser cookies, and backups of (Android) phones.
  • Other information discovered after forensic analysis of personal computers, external HDDs/SSDs, phones and other devices.

Be aware that most whistleblowers are identified by events and patterns of behavior that happen before they decide to blow the whistle or contact the media.

Guidelines[edit]

General Principles[edit]

  • Always think twice before uploading/sharing anything.
  • Only upload/share files which were either created or downloaded inside the Whonix-Workstation ™ and personally stripped of metadata.
  • Before uploading/sharing photos or videos, it is safest to utilize a separate camera that is only used for anonymous purposes (unless the user is an expert).

Specific File Format Data Leakage[edit]

  • Anonymous photo sharing requires consideration of both metadata and fingerprintable camera anomalies.
  • Files created by editing software -- such as Microsoft Word, LibreOffice, Excel and so on -- can leak information about incremental edits and updates. Re-saving a final copy of the document might be enough to mitigate this risk, but further research is required.
  • If JPEG images are stored in PDFs in their complete form without modification, EXIF data can be leaked.
  • It is possible for adversaries to link 'anonymous' audio recordings to specific hardware (microphone) that is used, as well as fingerprint embedded audio acoustics associated with particular speakers -- the same operational security advice recommended for photographs must be followed.
  • This is an inexhaustive list of file format leak problems and the user should understand that file format specifications are not designed with potential adversaries in mind. [2]

Scrubbing Metadata[edit]

Generally speaking, the only reliable way to scrub any type of document and avoid unintended leaks is to first use Imagemagick to convert them to images, then import them into a new PDF before distribution. This technique is reportedly used by advanced adversaries. [2]

This recommendation comes with an important caveat: untrusted files that are downloaded cannot be sanitized in this way, since malicious data can be crafted to remain intact even if processed by a format encoder. Therefore, the best way to interact with these files is to utilize the Whonix-Workstation ™ and sanitize them with the pre-installed MAT2 program. [3]

Failure to remove metadata does not always lead to de-anonymization, but it still may result in identity correlation to the same pseudonym. Consider the following example:

  • A video is created with media software and uploaded to a popular video portal under pseudonym A.
  • Another video is created using the same software and computer and uploaded under pseudonym B.
  • An adversary who checks the metadata of both video files would quickly correlate both pseudonyms.

Warning on Leaking Original Source Documents[edit]

Ambox warning pn.svg.png In recent times, leakers of high-value or high-security source documents have been identified (and jailed) via embedded steganographic messages or the zero-width space (homoglyph substitution) technique. [4]

It is highly unlikely that file cleaners will defeat these advanced fingerprinting methods. Persons who are considering leaking valuable, original source documents should adopt a far safer approach to avoid the threat of embedded signatures. Recommendations include: [5]

  • Manually retype the related disclosures in a basic text editor which can easily be stripped of meta-data.
  • Only leak short excerpts so the amount of information shared is kept to a minimum.
  • At all times, avoid releasing the original documents in their raw form.
  • Source the same documents from multiple leakers to confirm the content is identical byte-wise.

Specific cleaning tools do exist that strip non-whitelisted characters from the text. However, this is the least preferred approach for "safely" sharing documents if personal liberty is at stake.

MAT2: Metadata Anonymisation Toolkit v2[edit]

At the time of writing, the latest version of MAT2 currently supports the following file formats: [6]

  • Audio Video Interleave (.avi)
  • Electronic Publication (.epub)
  • Free Lossless Audio Codec (.flac)
  • Graphics Interchange Format (.gif)
  • Hypertext Markup Language (.html)
  • Portable Network Graphics (PNG)
  • JPEG (.jpeg, .jpg, ...)
  • MPEG Audio (.mp3, .mp2, .mp1, .mpa)
  • MPEG-4 (.mp4)
  • Office Openxml (.docx, .pptx, .xlsx, ...)
  • Ogg Vorbis (.ogg)
  • Open Document (.odt, .odx, .ods, ...)
  • Portable Document Fileformat (.pdf)
  • Tape ARchive (.tar, .tar.bz2, .tar.gz)
  • Torrent (.torrent)
  • Windows Media Video (.wmv)
  • ZIP (.zip)

Take careful note of MAT2's limitations: [6]

MAT2 only removes metadata from your files, it does not anonymise their content, nor can it handle watermarking, steganography, or any too custom metadata field/system.

If you really want to be anonymous, use file formats that do not contain any metadata, or better: use plain-text.

Use Instructions[edit]

MAT2 does not have a GUI option and must be run from the command line. For a list of available MAT2 options, launch a terminal in Whonix-Workstation ™ and run.

mat2

Note: MAT2 does not clean files in-place. Instead, once 'dirty' files (with removable metadata) are cleaned, the clean files are created in the same directory with the .cleaned extension. For example, "myfile.png" will lead to a new version named "myfile.cleaned.png".

Other Tools[edit]

  • Exiftool - a Perl application for editing metadata in a wide variety of files.
  • exiv2 - a C++ application to manage image metadata.
  • jhead - a JPEG header manipulation tool.
  • pdfparanoia - a tool to remove watermarks from academic papers.
  • pdf-redact-tools - another tool to securely redact and strip metadata from PDF documents.

See Also[edit]

License[edit]

Gratitude is expressed to JonDos for permission to use material from their website. (w) (w) [7] The Metadata page contains content from the JonDonym documentation Anonymizing Documents and Pictures page.

Footnotes[edit]

  1. https://theintercept.com/2019/08/04/whistleblowers-surveillance-fbi-trump/
  2. 2.0 2.1 https://speakerdeck.com/ange/an-overview-of-pdf-potential-leaks
  3. Refer to the Metadata anonymisation toolkit v2 website for further information.
  4. In the latter method, the leaker is unable to see additional zero-width or zero-width non-joiner characters which are used to fingerprint text. Even a single type of zero-width character provides enough bits of entropy to fingerprint the relevant text.
  5. https://www.zachaysan.com/writing/2017-12-30-zero-width-characters
  6. 6.0 6.1 https://packages.debian.org/buster/mat2
  7. Broken link: https://anonymous-proxy-servers.net/forum/viewtopic.php?p=31220#p31220

No user support in comments. See Support. Comments will be deleted after some time. Specifically after comments have been addressed in form of wiki enhancements. See Wiki Comments Policy.


Add your comment
Whonix welcomes all comments. If you do not want to be anonymous, register or log in. It is free.


Random News:

Please contribute by helping to answer Whonix questions.


https | (forcing) onion

Follow: Twitter | Facebook | gab.ai | Stay Tuned | Whonix News

Share: Twitter | Facebook

This is a wiki. Want to improve this page? Help is welcome and volunteer contributions are happily considered! Read, understand and agree to Conditions for Contributions to Whonix ™, then Edit! Edits are held for moderation.

Copyright (C) 2012 - 2019 ENCRYPTED SUPPORT LP. Whonix ™ is a trademark. Whonix ™ is a licensee of the Open Invention Network. Unless otherwise noted, the content of this page is copyrighted and licensed under the same Freedom Software license as Whonix ™ itself. (Why?)

Whonix ™ is a derivative of and not affiliated with Debian. Debian is a registered trademark owned by Software in the Public Interest, Inc.

Whonix ™ is produced independently from the Tor® anonymity software and carries no guarantee from The Tor Project about quality, suitability or anything else.

By using our website, you acknowledge that you have read, understood and agreed to our Privacy Policy, Cookie Policy, Terms of Service, and E-Sign Consent. Whonix ™ is provided by ENCRYPTED SUPPORT LP. See Imprint.