What is Metadata?
|Warning: Office documents, pictures, videos and other files contain significant information in the meta tags that may de-anonymize the author. Before they are uploaded to the Internet, this metadata should be removed.|
For more information about metadata, refer to the Metadata Anonymisation Toolkit (w) website. Further information can also be found on the Warning page, see Whonix doesn't clear the metadata of your documents.
Metadata cannot be used to de-anonymize the user if the guidelines in this section are followed.
- Always think twice before uploading anything.
- Only upload files which were either created or downloaded inside the Whonix-Workstation and personally stripped of metadata.
- Before uploading photos or videos, it is safest to utilize a separate camera that is only used for anonymous purposes (unless the user is an expert).
Specific File Format Data Leakage
- Anonymous photo sharing requires the user to consider both metadata and fingerprintable camera anomalies.
- Files created by editing software - such as Microsoft Word, LibreOffice, Excel and so on - can leak information about incremental edits and updates. Re-saving a final copy of the document might be enough to mitigate this risk, but further research is required.
- If JPEG images are stored in PDFs in their complete form without modification, EXIF data can be leaked.
- This is an inexhaustive list of file format leak problems and the user should understand that file format specifications are not designed with potential adversaries in mind. 
Generally speaking, the only reliable way to scrub any type of document and avoid unintended leaks is to first use Imagemagick to convert them to images, then import them into a new PDF before distribution. This technique is reportedly used by advanced adversaries.  This recommendation comes with an important caveat: untrusted files that are downloaded cannot be sanitized in this way, since malicious data can be crafted to remain intact even if processed by a format encoder. Therefore, the best way to interact with these files is to utilize the Whonix-Workstation and sanitize them with the pre-installed MAT program. 
Failure to remove metadata does not always lead to de-anonymization, but it still may result in identity correlation to the same pseudonym. Consider the following example:
- A video is created with media software and uploaded to a popular video portal under pseudonym A.
- Another video is created using the same software and computer and uploaded under pseudonym B.
- An adversary who checks the metadata of both video files would quickly correlate both pseudonyms.
Warning on Leaking Original Source Documents
|In recent times, leakers of high-value or high-security source documents have been identified (and jailed) via embedded steganographic messages or the zero-width space (homoglyph substitution) technique. |
It is highly unlikely that file cleaners will defeat these advanced fingerprinting methods. Persons who are considering leaking valuable, original source documents should adopt a far safer approach to avoid the threat of embedded signatures. Recommendations include: 
- Manually retype the related disclosures in a basic text editor which can easily be stripped of meta-data.
- Only leak short excerpts so the amount of information shared is kept to a minimum.
- At all times, avoid releasing the original documents in their raw form.
- Source the same documents from multiple leakers to confirm the content is identical byte-wise.
Specific cleaning tools do exist that strip non-whitelisted characters from the text. However, this is the least preferred approach for "safely" sharing documents if a user's freedom depends upon it.
MAT - Metadata Anonymisation Toolkit
|MAT maintenance and development is currently on hold, due to the author's ill health. As a consequence, the current version does not work with Python3 and may have unresolved bugs. Use at your own risk!|
At the time of writing, the latest version of MAT currently supports the following file formats: 
- Portable Network Graphics (.png).
- JPEG (.jpg, .jpeg, …).
- TIFF (.tif, tiff, …).
- Open Documents (.odt, .odx, .ods, …).
- Office OpenXml (.docx, .pptx, .xlsx, …).
- Portable Document Fileformat (.pdf).
- Tape ARchives (.tar, .tar.bz2, …).
- MPEG AUdio (.mp3, .mp2, .mp1, …).
- Ogg Vorbis (.ogg, …).
- Free Lossless Audio Codec (.flac).
- Torrent (.torrent).
Users should carefully note MAT's limitations: 
Why MAT is not the ultimate solution?
Mat only removes metadata from your files, it does not anonymise their content, nor handle watermarking, steganography, or any overly customized metadata field/system. Also please note that MAT does its best to scrub as much metadata as possible, it is not really efficient at scrubbing embedded media inside complex formats. For examples, images embedded inside PDF may not be cleaned!
After starting MAT, add the files to be cleaned to the list. The "dirty" state indicates that the file contains removable metadata. After cleaning, the cleaned files will be created in the same directory as the original files with the extension
- Exiftool - a Perl application for editing metadata in a wide variety of files.
- exiv2 - a C++ application to manage image metadata.
- jhead - a JPEG header manipulation tool.
- pdfparanoia - a tool to remove watermarks from academic papers.
The MAT public domain screenshot is sourced from awxcnx.de.
Gratitude is expressed to JonDos for permission to use material from their website. (w) (w)  The Metadata page contains content from the JonDonym documentation Anonymizing Documents and Pictures page.
This is a wiki. Want to improve this page? Help is welcome and volunteer contributions are happily considered! See Conditions for Contributions to Whonix, then Edit! IP addresses are scrubbed, but editing over Tor is recommended. Edits are held for moderation.
- Refer to the Metadata Anonymisation Toolkit website for further information. (w)
- In the latter method, the leaker is unable to see additional zero-width or zero-width non-joiner characters which are used to fingerprint text. Even a single type of zero-width character provides enough bits of entropy to fingerprint the relevant text.
- Broken link: https://anonymous-proxy-servers.net/forum/viewtopic.php?p=31220#p31220