Actions

Metadata


MAT - Metadata Anonymisation Toolkit

What is Metadata?[edit]


For more information about metadata, refer to the Metadata Anonymisation Toolkit (w) website. Further information can also be found on the Warning page, see Whonix doesn't clear the metadata of your documents.

Guidelines[edit]

Metadata cannot be used to de-anonymize the user if the guidelines in this section are followed.

General Principles

  • Always think twice before uploading anything.
  • Only upload files which were either created or downloaded inside the Whonix-Workstation and personally stripped of metadata.
  • Before uploading photos or videos, it is safest to utilize a separate camera that is only used for anonymous purposes (unless the user is an expert).


Specific File Format Data Leakage

  • Anonymous photo sharing requires the user to consider both metadata and fingerprintable camera anomalies.
  • Files created by editing software - such as Microsoft Word, LibreOffice, Excel and so on - can leak information about incremental edits and updates. Re-saving a final copy of the document might be enough to mitigate this risk, but further research is required.
  • If JPEG images are stored in PDFs in their complete form without modification, EXIF data can be leaked.
  • This is an inexhaustive list of file format leak problems and the user should understand that file format specifications are not designed with potential adversaries in mind. [1]


Scrubbing Metadata

Generally speaking, the only reliable way to scrub any type of document and avoid unintended leaks is to first use Imagemagick to convert them to images, then import them into a new PDF before distribution. This technique is reportedly used by advanced adversaries. [1] This recommendation comes with an important caveat: untrusted files that are downloaded cannot be sanitized in this way, since malicious data can be crafted to remain intact even if processed by a format encoder. Therefore, the best way to interact with these files is to utilize the Whonix-Workstation and sanitize them with the pre-installed MAT program. [2]

Failure to remove metadata does not always lead to de-anonymization, but it still may result in identity correlation to the same pseudonym. Consider the following example:

  • A video is created with media software and uploaded to a popular video portal under pseudonym A.
  • Another video is created using the same software and computer and uploaded under pseudonym B.
  • An adversary who checks the metadata of both video files would quickly correlate both pseudonyms.


Warning on Leaking Original Source Documents


It is highly unlikely that file cleaners will defeat these advanced fingerprinting methods. Persons who are considering leaking valuable, original source documents should adopt a far safer approach to avoid the threat of embedded signatures. Recommendations include: [4]

  • Manually retype the related disclosures in a basic text editor which can easily be stripped of meta-data.
  • Only leak short excerpts so the amount of information shared is kept to a minimum.
  • At all times, avoid releasing the original documents in their raw form.
  • Source the same documents from multiple leakers to confirm the content is identical byte-wise.


Specific cleaning tools do exist that strip non-whitelisted characters from the text. However, this is the least preferred approach for "safely" sharing documents if a user's freedom depends upon it.

MAT - Metadata Anonymisation Toolkit[edit]


At the time of writing, the latest version of MAT currently supports the following file formats: [5]

  • Portable Network Graphics (.png).
  • JPEG (.jpg, .jpeg, …).
  • TIFF (.tif, tiff, …).
  • Open Documents (.odt, .odx, .ods, …).
  • Office OpenXml (.docx, .pptx, .xlsx, …).
  • Portable Document Fileformat (.pdf).
  • Tape ARchives (.tar, .tar.bz2, …).
  • MPEG AUdio (.mp3, .mp2, .mp1, …).
  • Ogg Vorbis (.ogg, …).
  • Free Lossless Audio Codec (.flac).
  • Torrent (.torrent).


Users should carefully note MAT's limitations: [6]

Why MAT is not the ultimate solution?

Mat only removes metadata from your files, it does not anonymise their content, nor handle watermarking, steganography, or any overly customized metadata field/system. Also please note that MAT does its best to scrub as much metadata as possible, it is not really efficient at scrubbing embedded media inside complex formats. For examples, images embedded inside PDF may not be cleaned!

Use Instructions[edit]

Start MAT.

If you are using Qubes-Whonix, complete the following steps.

Qubes App Launcher (blue/grey "Q") -> Domain: Whonix-Workstation AppVM (commonly named anon-whonix) -> Add more shortcuts -> Select MAT from the Available applications list -> Press the > button -> OK

Qubes App Launcher (blue/grey "Q") -> Domain: anon-whonix -> MAT

If you are using a graphical Whonix-Workstation, complete the following steps.

Start Menu -> Applications -> Utilities -> Metadata Anonymisation Toolkit

If you are using a terminal Whonix-Workstation, press on expand on the right.

On the command line, run.

mat

Or for the graphical user interface, run.

mat-gui


After starting MAT, add the files to be cleaned to the list. The "dirty" state indicates that the file contains removable metadata. After cleaning, the cleaned files will be created in the same directory as the original files with the extension .cleaned

Other Tools[edit]

  • Exiftool - a Perl application for editing metadata in a wide variety of files.
  • exiv2 - a C++ application to manage image metadata.
  • jhead - a JPEG header manipulation tool.
  • pdfparanoia - a tool to remove watermarks from academic papers.

See Also[edit]

License[edit]

The MAT public domain screenshot is sourced from awxcnx.de.

Gratitude is expressed to JonDos for permission to use material from their website. (w) (w) [7] The Metadata page contains content from the JonDonym documentation Anonymizing Documents and Pictures page.


Random News:

Please help in testing new features and bug fixes in Whonix.


https | (forcing) onion

Share: Twitter | Facebook

This is a wiki. Want to improve this page? Help is welcome and volunteer contributions are happily considered! See Conditions for Contributions to Whonix, then Edit! IP addresses are scrubbed, but editing over Tor is recommended. Edits are held for moderation.

Whonix is a licensee of the Open Invention Network. Unless otherwise noted, the content of this page is copyrighted and licensed under the same Libre Software license as Whonix itself. (Why?)

  1. 1.0 1.1 https://speakerdeck.com/ange/an-overview-of-pdf-potential-leaks
  2. Refer to the Metadata Anonymisation Toolkit website for further information. (w)
  3. In the latter method, the leaker is unable to see additional zero-width or zero-width non-joiner characters which are used to fingerprint text. Even a single type of zero-width character provides enough bits of entropy to fingerprint the relevant text.
  4. https://www.zachaysan.com/writing/2017-12-30-zero-width-characters
  5. https://mat.boum.org/
  6. https://mat.boum.org/
  7. Broken link: https://anonymous-proxy-servers.net/forum/viewtopic.php?p=31220#p31220