How should audio, video and film be encoded for preservation?
There are two related answers: this one about encoding, and a separate one about file formats. The reason is: they are two separate issues, though not separate enough! The overlap causes a lot of confusion.
Audio from a microphone and video from a camera are signals which can be represented by a continuous line. The line is proportional to pressure for sound audio, and to light intensity for monochrome (black and white) video. Colour video is actually three ‘separate lines’. The variation in sound and in light is continuous (analogue), but the variation can be coded digitally by sampling: a number representing ‘the height of the line’ is calculated so many times per second.
The result is a sequence of numbers, and that is the simplest form of digital encoding of an original analogue continuous signal.
Why complicate it then?
For video, the answer is that colour video is already complicated, being essentially three parallel phenomena being represented by numbers (red, green, blue; equivalently luminance plus two colour dimensions to represent a colour wheel). Some decision has to be made about how to put the three sets of numbers together, and that decision is part of the encoding.
A video signal is a rastor: so many numbers per row, so many rows per image. Another complication is interlacing: doubling the number of images per second by sending half the information (the odd lines) and then sending the other half (the even lines).
In video, a sequence of numbers has to be correctly interpreted to be divided properly into rows and images, and so colour information and the shuffling of interlacing can all be decoded.
The other reason for complicating the encoding is to squeeze the data. Sampling audio at professional rates (24 bit samples, 96k samples per second) produces 4.6 megabits per sec (Mb/s) — and sampling video at Rec. 601 (see FAQ on standards) produces 270 Mb/s — which can be cut to 200 Mb/s if storing the data in a file (because the zeroes during the blanking intervals can be stripped out).
These data rates are a real challenge: for capture, for moving between devices over networks, for broadcasting. Consequently technology was developed to reduce the data rate while keeping as much as possible of the information. The inherent predictability of the information (the fact that a dark area is more likely to be next to more of the same than to something very different) can be measured. The parameters of the measurement can be kept instead of the data, and a savings can be made. Older readers will remember when it was standard to ‘zip’ files to make the most of floppy discs that only held a few hundred kilobytes. It is still standard to compress audio and video. There are many ways to do this, and each is a different kind of encoding.
Kinds of compression: the zip compression used for general computer files was completely reversible: it just saved space, and didn’t throw any information away. After uncompressing, the result would be bit-for-bit identical with the original. That is lossless compression, and can also be applied to audio and video. The problem is, it doesn’t save much space: usually the result of lossless compression is 1/3 the size of the original (3:1 compression) or maybe 1/4 at best (4:1 compression).
At a sacrifice of information, huge compression factors can be achieved. We use compressed audio everytime we use a mobile phone, and the coding is roughly 10 kilobits/sec, a 70:1 compression ratio compared to CD quality, and 300:1 compression compared to the full archive standard of 24-bit samples, 96k samples per second. The video seen on the Internet is typically compressed by factors ranging from 200:1 to over 1000:1. All of this reduction in data rate has a cost: information is lost, quality is reduced. For archiving, where a basic principle is to maintain quality, it is not good practice to introduce lossy compression.
Encoding and file types: It became standard practice to develop a different file type for each encoding, which is where the overlap started. Real Audio (from the company Real), Window Media audio, jpeg images, mpeg video and so on were in files with extensions ‘ra’, ‘wma’, ‘jpg’m ‘mpg’ and so on.
As encoding schemes proliferated, that approach (one file type per encoding method) was heading for nonsense, so gradually file types developed that could hold multiple kinds of encoded data — with metadata inside the file to self-identify the coding. A simplification — but also a complication because it was no longer obvious what encoding was actually being used. An application could read the metadata, but a person could only see the file name, and file types like AVI and MOV (and even WAV) can hold many kinds of encoding — as well as holding both audio and video and subtitles and possibly even time code.
File formats got so complicated (powerful, the developers would say) that people started to call them wrappers, to emphasize that the file could hold many things: video, multiple channels of audio, subtitles, time code, other metadata. Recommended file formats are given in the answer to What file format(s) should I use?
Now that the basics of encoding, coding types vs file types and compression have been covered, here are the recommendations for encoding of audio, video and film:
Encoding for audio: just use the sequence of numbers from the digitisation, with no further encoding or compression of any sort. This encoding is sometimes referred to as linear PCM.
The standard for digitisation of audio is in FAQ on standards: 24 bit, 48 or 96 kHz.
The file format for audio is WAV, and the Broadcast version of WAV (usable by all applications that can use standard WAV) is recommended for its extra metadata. See the FAQ on file formats.
If you are taking digital audio from a carrier (CD, DAT, minidisc or even the sound from a videotape) and putting it in a file in an archive, see the FAQ on digital media. You should to clone the original — if you can — but there are other complexities.
Other encodings: for delivery and access you may want a compressed version, such as MP3. Whether this is made at the time of digitisation, or made on-demand at time of use, depends upon each particulary installation. Small collections with all the audio online uncompressed can make MP3’s on demand. If your master uncompressed audio is kept offline on data tape, then make the MP3 at time of digitisation and put that online.
Encoding for video: there are three cases:
1- original media is analogue, so either code uncompressed or lossless compressed
If using lossless compression, which one? The answer is: it doesn’t matter, regarding principles, so use whatever is most practical. But — proprietary methods should always be avoided.
There is a wealth of information on the advantages and pitfalls of all common encodings available from the US Library of Congress:
If you decide to change to another lossless encoding, that doesn’t matter as the files can be converted automatically and painlessly.
Probably the most common lossless compression for video is now JPEG2000, used by the Library of Congress and many others, and also used in the Digital Cinema standard.
JPEG2000 is now widely used as an encoding for still images. Document scanning produces still images, generally using JPEG2000 — so this format is widely used and understood in the library world. See http://www.dlib.org/dlib/july08/buonora/…
2- original media is digital (but NOT a file): see the FAQ on preserving digital media.
3- born-digital files (the following also applies to born-digital audio and digital cinema files)
Why is there any problem? A file comes in, you keep it.
Actually, there are two main problems areas:
1) you also want to produce a standard version; the original could be an oddball format, or your whole approach to digital archive may rely upon making an archive format;
2) you need to keep the metadata, not just the video. Your archive won’t know anything about the technical attributes of the video, or about the descriptive metadata (if there is any) unless you can pull out and interpret that metadata, and then put it into the catalogue of the digital repository or archive.
Recommendations for born-digital files coming into a digital collection (repository, archive):
1- Keep the original encoding in its original wrapper; this is always possible, so should always be done.
2- Make a standard format if desired. The quality won’t improve and it may take up much more space, but it might be a key step to the overall operation of a digital collection. so MASTER could be original or could be the standard version (as for 1.)
3- Make delivery and access versions as needed. It may also be pragmatic to make a mezzanine version; this is not the master copy, but it is the high-quality copy from which access copies in lower quality are made, in a way that is computationally efficient.
Encoding for film: (the following only covers the images, not sound on film)
Scanning of film can be done by two kinds of equipment: telecine and datacine.
The output of telecine equipment is a video signal, which is is NOT what is needed for archiving. Video has only 500 to 600 scan lines, and even HD video is only 1080, only suitable for preservation digitisation of reduced-quality 16mm content, such at the telerecordings (film made from a video input) in broadcast archives. The output of an older telecine machine will be an analogue video signal which has to be digitised to be digitally preserved. This case is technically equivalent to digitisation of any other analogue video, covered in the previous section.
Generally, film digitisation for preservation will us a datacine machine. The output is a file (or a whole lot of files): typically the results of scanning are available in the DPX format (with one file per frame, and a folder for the whole scanned film). The data in the DPX should be uncompressed images, and this is what should be saved. The other common archival film encoding is lossless JPEG2000.
So, two main options
1) encoding: uncompressed; file format: DPX
2) encoding: lossless compresses JPEG2000; file format: many options, see FAQ on file formats.
There may be many other output possibilities from a datacine machine, and it is tempting to use one which fits with existing processes. This is almost certainly a mistake, from the archiving and preservation viewpoint. Production in broadcasting will probably be based on a video format, which will be a big drop in resolution from a 4K or even a 2K scan, will probably be lossy-compressed and could even introduce all the mess of interleaved data. Most broadcast production uses a video encoding using MPEG-2 at 25 or 50 Mb/s for SD video and uses MPEG-4 encoding at 100 to 400 Mb/s for HD, using MPEG-4. These are lossy compression encodings, and so should never be used for holding the master version of the results of film scanning. Digital cinema relies on various version of the JPEG2000 encoding format, with the distribution version being considerably compressed (lossy compression).
With production (or customers of a footage library, or researchers at an archive) using, for their general work, an encoding that is very different from the master version, there needs to be a way to satisfy both the archive requirement (save the full data out of the scanner) and the business requirement (deliver to users something they can use). The solution is multiple versions of the material — which is nothing new as film archives have always had a range of versions: master negative; prints from the master; intermediate negative (interneg) made from the master and used for making prints, to preserve the master; prints from the interneg; prints from prints and possibly several more.
There are options for efficient production of encodings that suit the users. The new versions can be made at the same time as the original scan, and held along with the DPX or JPEG2000 in the archive. Alternatively, new versions can be made on demand.
Another option is more technical. It may take a lot of time and computing to make new versions from the uncompressed DPX, so if a lot of this work has to be done it can be computationally efficient to make a high-level intermediate file format (kept with the DPX in the archive) and produce more compressed versions from that. This is the mezzanine format approach to digital archive management. In digital cinema
The EDCine project recommended a master version in lossless JPEG2000, but that version is not computationally efficient for making lossy versions. So the archive would also hold a high-level lossy JPEG2000 mezzanine version. Any desired compression level can then be made from the mezzanine version in an efficient manner — as long as the final result is still a form of JPEG2000, and not some entirely different encoding family (such as MPEG).
There is a lot of complexity around the digital distribution of cinema productions. The industry standard is the Digital Cinema Package (DCP). It has lossy-compressed JPEG2000-encoded files for the images (using the MXF wrapper) and has separate MXF files for the sound, plus extensive digital rights protection to prevent any but the intended use. Any PrestoCentre information on DCPs will have to await a further series of FAQs.