For more information about metadata, refer to the Metadata anonymisation toolkit v2 (MAT2) [archive] Debian package or the MAT2 homepage [archive]. Additional information can be found on the Warning page; see Whonix ™ does not clear Document Metadata.
Metadata attached to files cannot be used to de-anonymize the user if the guidelines in this section are followed. However, whistleblowers should be aware of a host of other metadata and techniques that can be used to narrow the search for (or identify) leakers, including: 
- A list of persons who searched for, accessed or printed relevant documents.
- Persons inserting hardware devices like USBs into corporate computers, or those taking screenshots.
- Location data for handheld devices.
- Downloads and use of Tor Browser, Tails, Whonix or other anonymity, privacy, security and encryption (or related) software which is relatively unpopular.
- Inspection of ISP/corporate metadata associated with:
- Usernames, email addresses, physical addresses, phone numbers and credit card numbers.
- Internet IP addresses and log on data.
- Clearnet browsing and use of the Tor network.
- All communications metadata, including the type, source and destination, and the file size and duration of the communication. This includes emails and (encrypted) messaging.
- Via search warrants, sourcing all data from Google, Facebook and other corporate accounts; for example, all Gmail messages, Google History, web browser activity based on web browser cookies, and backups of (Android) phones.
- Other information discovered after forensic analysis of personal computers, external HDDs/SSDs, phones and other devices.
Be aware that most whistleblowers are identified by events and patterns of behavior that happen before they decide to blow the whistle or contact the media.
- Always think twice before uploading/sharing anything.
- Only upload/share files which were either created or downloaded inside the Whonix-Workstation ™ and personally stripped of metadata.
- Before uploading/sharing photos or videos, it is safest to utilize a separate camera that is only used for anonymous purposes (unless the user is an expert).
Specific File Format Data Leakage
- Anonymous photo sharing requires consideration of both metadata and fingerprintable camera anomalies.
- Files created by editing software -- such as Microsoft Word, LibreOffice, Excel and so on -- can leak information about incremental edits and updates. Re-saving a final copy of the document might be enough to mitigate this risk, but further research is required.
- If JPEG images [archive] are stored in PDFs in their complete form without modification, EXIF data can be leaked.
- It is possible for adversaries to link 'anonymous' audio recordings to specific hardware (microphone) that is used, as well as fingerprint embedded audio acoustics associated with particular speakers -- the same operational security advice recommended for photographs must be followed.
- This is an inexhaustive list of file format leak problems and the user should understand that file format specifications are not designed with potential adversaries in mind. 
Generally speaking, the only reliable way to scrub any type of document and avoid unintended leaks is to first use Imagemagick to convert them to images, then import them into a new PDF before distribution. This technique is reportedly used by advanced adversaries. 
This recommendation comes with an important caveat: untrusted files that are downloaded cannot be sanitized in this way, since malicious data can be crafted to remain intact even if processed by a format encoder. Therefore, the best way to interact with these files is to utilize the Whonix-Workstation ™ and sanitize them with the pre-installed MAT2 [archive] program. 
Failure to remove metadata does not always lead to de-anonymization, but it still may result in identity correlation to the same pseudonym. Consider the following example:
- A video is created with media software and uploaded to a popular video portal under pseudonym A.
- Another video is created using the same software and computer and uploaded under pseudonym B.
- An adversary who checks the metadata of both video files would quickly correlate both pseudonyms.
Warning on Leaking Original Source Documents
It is highly unlikely that file cleaners will defeat these advanced fingerprinting methods. Persons who are considering leaking valuable, original source documents should adopt a far safer approach to avoid the threat of embedded signatures. Recommendations include: 
- Manually retype the related disclosures in a basic text editor which can easily be stripped of meta-data.
- Only leak short excerpts so the amount of information shared is kept to a minimum.
- At all times, avoid releasing the original documents in their raw form.
- Source the same documents from multiple leakers to confirm the content is identical byte-wise.
Specific cleaning tools do exist that strip non-whitelisted characters from the text. However, this is the least preferred approach for "safely" sharing documents if personal liberty is at stake.
MAT2: Metadata Anonymisation Toolkit v2
At the time of writing, the latest version of MAT2 currently supports the following file formats: 
- Audio Video Interleave (.avi)
- Electronic Publication (.epub)
- Free Lossless Audio Codec (.flac)
- Graphics Interchange Format (.gif)
- Hypertext Markup Language (.html)
- Portable Network Graphics (PNG)
- JPEG (.jpeg, .jpg, ...)
- MPEG Audio (.mp3, .mp2, .mp1, .mpa)
- MPEG-4 (.mp4)
- Office Openxml (.docx, .pptx, .xlsx, ...)
- Ogg Vorbis (.ogg)
- Open Document (.odt, .odx, .ods, ...)
- Portable Document Fileformat (.pdf)
- Tape ARchive (.tar, .tar.bz2, .tar.gz)
- Torrent (.torrent)
- Windows Media Video (.wmv)
- ZIP (.zip)
Take careful note of MAT2's limitations: 
MAT2 only removes metadata from your files, it does not anonymise their content, nor can it handle watermarking, steganography, or any too custom metadata field/system.
If you really want to be anonymous, use file formats that do not contain any metadata, or better: use plain-text.
MAT2 does not have a GUI option and must be run from the command line. For a list of available MAT2 options, launch a terminal in Whonix-Workstation ™ and run.
Note: MAT2 does not clean files in-place. Instead, once 'dirty' files (with removable metadata) are cleaned, the clean files are created in the same directory with theextension. For example, "myfile.png" will lead to a new version named "myfile.cleaned.png".
Users also report that MAT2 is broken if bubblewrap is installed, since it is automatically used for MAT2 sandboxing which is currently incompatible with Whonix ™
hidepid settings.    If this error is encountered, it can be bypassed with the
- Exiftool [archive] - a Perl application for editing metadata in a wide variety of files.
- exiv2 [archive] - a C++ application to manage image metadata.
- jhead [archive] - a JPEG header manipulation tool.
- pdfparanoia [archive] - a tool to remove watermarks from academic papers.
- pdf-redact-tools [archive] - another tool to securely redact and strip metadata from PDF documents.
Gratitude is expressed to JonDos [archive] for permission [archive] to use material from their website. (w [archive]) (w [archive])  The Metadata page contains content from the JonDonym documentation Anonymizing Documents and Pictures [archive] page.
- https://theintercept.com/2019/08/04/whistleblowers-surveillance-fbi-trump/ [archive]
- https://speakerdeck.com/ange/an-overview-of-pdf-potential-leaks [archive]
- Refer to the Metadata anonymisation toolkit v2 website for further information.
- In the latter method, the leaker is unable to see additional zero-width or zero-width non-joiner characters which are used to fingerprint text. Even a single type of zero-width character provides enough bits of entropy to fingerprint the relevant text.
- https://www.zachaysan.com/writing/2017-12-30-zero-width-characters [archive]
- https://packages.debian.org/buster/mat2 [archive]
- https://forums.whonix.org/t/install-bubblewrap-by-default-to-make-use-of-mat2s-sandboxing/8177 [archive]
- https://0xacab.org/jvoisin/mat2/issues/120 [archive]
- https://github.com/containers/bubblewrap/issues/198 [archive]
- Broken link: https://anonymous-proxy-servers.net/forum/viewtopic.php?p=31220#p31220 [archive]
This is a wiki. Want to improve this page? Help is welcome and volunteer contributions are happily considered! Read, understand and agree to Conditions for Contributions to Whonix ™, then Edit! Edits are held for moderation. Policy of Whonix Website and Whonix Chat and Policy On Nonfreedom Software applies.
Copyright (C) 2012 - 2020 ENCRYPTED SUPPORT LP. Whonix ™ is a trademark. Whonix ™ is a licensee [archive] of the Open Invention Network [archive]. Unless otherwise noted, the content of this page is copyrighted and licensed under the same Freedom Software license as Whonix ™ itself. (Why?)