What file format(s) should I use?
This answer is one of four answers to related FAQs:
- What standard(s) should I follow?
- How should audio and video be encoded for preservation?
- What file format(s) should I use?
- How do I preserve digital media, like CD, DVD, DAT and all the different kinds of digital videotape?
The encoding answer covers a lot of ground, from compression to how to archive born-digital content. While a great deal can be said about what makes one format preferable to another, and about the general risks and characteristics of files, we don’t have to say any of that here, for two reasons:
- this answer is about what PrestoCentre recommends, not about the rationale;
- there is extensive information about the preservation considerations (the sustainability) of all common (and many uncommon) encodings and file types under the general category of formats on the US Library of Congress preservation website.
Therefore this answer should be short!
Audio: The file format for audio is WAVE (.wav), and the Broadcast version of WAVE is recommended for its extra metadata. Broadcast WAVE also has the extension (.wav) but is commonly referred to as BWF. All BWF files should be usable by all applications that can use standard WAVE files. The metadata for broadcast wave files is supported by all standard professional audio edit software, and by all service providers who deal in professional formats. The latest and recommended version of BWF is version 2. Version 1 fixed a problem with large files, and version 2 adds the the metadata for ‘loudness’ standardised within broadcasting (ITU-R BS.1770, EBU R128).
Video: There are four main options:
- MXF which is a professional (and maybe professional only) standard within both broadcasting and digital cinema. It is also used by the Library of Congress and by many other institutions which have used the robotic SAMMA system for digitisation of analogue video cassettes.
- FFV1 is an open-source, licence-free form of coding using lossless compression, applied to individual frames. It is currently undergoing development by the Internet Engineering Task Force (IETF) to improve its specification and support tools. https://datatracker.ietf.org/wg/cellar/charter/ FFV1 is the ‘new kid on the block’ but is gaining strong support in the heritage, university and general non-commercial world, particularly with improved specification and support tools as a result of the IETF work. Commonly FFV1 encoding is wrapped in the Matroska wrapper file format, as it is also open source and licence-free.
- MOV is the Quicktime wrapper, associated with Apple computers and also with the MPEG-4 standard. It can hold uncompressed video as well as a wide range of compressed encodings, and it supports time code.
- AVI is the wrapper from the Microsoft camp, developed in the early 1990’s. It supports uncompressed video and a range of compressed encodings. It does not support time code. In the US it has been used by NARA, the National Archives and Records Agency — with no problems because their analogue originals did not contain timecode.
PrestoCentre recommends MXF and MOV. The reason for not recommending AVI are given by the Library of Congress, which quotes a Wikipedia article Audio Video Interleave listing areas (aspect ratio coding, time code, variable frame rate, MPEG-4 encoding) where AVI is does not support the full range of digital video archiving requirements.
While most people using JPEG2000 for video will put it in an MXF wrapper (so that audio, timecode and metadata can all be in the one wrapper), for still images where JPEG2000 is very widely used, in particular to hold the results of book and document scanning, it is common for the resultant JPEG2000 encoded image data to be held in a JPEG2000 file format.
Film: While modern scanning equipment will produce many output formats, the requirement of ‘saving the best available’ dictates taking an uncompressed image output from the scanner. That can be saved in various wrappers, but in practice two are generally used:
- DPX is standard in digital cinema production and in digital restoration as the working format, but it is perfectly usable as an archive format. Audio and video are separate, and indeed each image is in a separate file, but the DPX standard includes metadata to bundle the whole collection of files in a meaningful way, understood by professional-level edit systems.
- MXF is the standard for distribution in digital cinema, wrapping JPEG2000 encodings. There are various kinds of JPEG2000 (J2K). The EDCine project recommended lossless J2K for the archive master, a slightly compressed lossy J2K encoding as a mezzanine, and then the distribution digital cinema package DCP could readily be produced, as well as more heavily compressed versions for other uses. MXF is the choice when there is a need to keep a single file as ‘the object’, rather than a collection of files as is the case with DPX.
DCP is the format (again, a bundle of files, not a single file) defined by the Digital Cinema Initiative DCI as the distribution format for digital cinema. It is now reaching archives, and so the requirement to ‘store the original artefact’ can be seen as requiring making the DCP an archive element — the ‘archive original’ if not the ‘archive master’. Unfortunately the digital rights managment lock-ups associated with DCP distribution create archive problems, that are really just emerging in 2013. Problems with DCP (and their solutions) may need to be a separate FAQ.
If the PrestoCentre recommended file formats don’t work for your situation, you can do two things:
- tell us why, so we can make our information more comprehensive. There should be a comment box on this page.
- consider alternatives not listed above. The Library of Congress has a comprehensive list of formats for ‘moving images’, covering codecs and wrappers.