A Compressed View of Video Compression

Digital audio and digitised film can also be compressed, but there are particular issues – and an interesting (well, for some) history – for video, so I will emphasise video. The general principles apply to any signal (including audio and scanned film), but not to files and digital data in general.

A signal is where engineers start with audio and video. A sound field (the variations in pressure in a three-dimensional space) is a complicated thing, but a microphone inside that sound field produces a voltage that varies with time: a signal. The visual field is equally complicated, but a video camera allows the pattern of light (through a lens onto some sort of receptor) to create a two-dimensional pattern which can then be scanned. Scanning converts the pattern into a voltage that varies with time: a signal. A datacine machine does much the same to film: converts an image into a signal.

Signals carry information, with greater or lesser efficiency. The data rate of the sequence of number representing a signal can be much higher than the rate of information carried by the signal. Because high data rates are always a problem, technology seeks methods to carry the information in concise ways.

Video: compressed from birth
Television started out by trying to produce successive images at a rate fast enough to exceed the ‘flicker fusion’ threshold of the human eye. At somewhere above 30 to 40 images per second, the pattern looks continuous and the eye is fooled into thinking it is seeing continuous motion. But television technology (in 1936) was incapable of transmitting 405 lines of information at 50 times per second, so they threw away half the data and sent half the image in one 20 millisecond time slot, and the second half in the next. The result was the needed 50% reduction in (analogue) bandwidth, the rough equivalent of (digital) datarate (bitrate). To make the ‘compression’ as visually acceptable as possible, odd numbered lines were sent as the first ‘field’ – and even ones in the next: interlace.

When colour television was developed, there were further problems. Ideally colour is two separate dimensions (the two dimensions of a ‘colour wheel’), which add colour information to the black and white pattern described by the luminance signal. The three signals form component video. For broadcasting this all had to go into one signal, so colour was jammed into the luminance signal as composite video – another kind of compression (and another compromise).

Some videotape formats (eg analog 1/2-inch, 1-inch, VHS, Betamax and U-matic; digital D2, D3) record a composite signal. In dealing with principles such as ‘keep the best’ and ‘keep the original’ it is important to know what the original actually is! It gets murkier: it was hard to get even composite video onto a videotape, so many composite analogue recorders (notably U-matic in the semi-professional area) also shifted the frequency modulation of the colour information, to get it into another place in the overall spectrum where colour information caused less interference to luminance: the colour under approach.

The conclusion is that alteration of video to squeeze it into limited bandwidth or into limited tape recorders has been with us from the beginning of video: interlace, composite, colour under.

In the UK, redundancy means losing your job. In information theory, redundancy is formally defined, and relates to the data rate of a signal being higher than the actual information. For instance, if I’m on a noisy telephone line I might start repeating key information. It takes more signal (more time), but improves the odds that the information will be transmitted.

A CD carries audio at 1.4 million bits per second. If the audio is a person speaking, they may be conveying about 3 words per second. With a 30,000 word vocabulary, that works out as about 45 bits per second (because 30,000 is about the 15th power of two). If the audio could be run through a speech recogniser, the 45 bits could be transmitted and the speech could be regenerated at the other end using a synthesizer. The compression achieved would be enormous: a factor of about 30,000 [1.4 million divided by 45]. But the synthesized sound would convey only the words, and not what the speaker sounded like or any other aspect of the original sound except the ‘meaning’. On music, nobody would be pleased. A transcript of a Janis Joplin song just doesn’t capture what matters.

Which brings us to the crux: what matters, and what can be thrown away?

For images, the ‘meaning’ is undefinable, but image quality metrics have been defined. It is difficult to come up with an equation that exactly fits the judgements a person would make about degree of impairment to an image, but the metrics come close. Essentially, video compression methods attempt to maximise the reduction in datarate while minimising the estimated visual quality difference (before vs after).

If the information is still there, and the datarate is reduced, then that’s A Good Thing, isn’t it? Not necessarily. Redundancy is useful, as in my telephone conversation where I repeated things. Redundant signal are robust signals (they have a higher probability of undergoing some sort of mishap, and still carrying the information). Heavily compressed signals are fragile: they can look great, but touch them and they shatter.

Managing Compression
As with everything else about archiving and preservation, a key issue is management: knowing what you’re dealing with, having a strategy, monitoring the strategy, keeping on top of things so loss is prevented.

I think some clear principles can be stated for audiovisual archiving, and these principles can be used to manage the use of compression:

Basic principles:

  • Keep the original
  • Keep the best
  • Do no harm

What do these principles have to do with compression?

Keep the original: means that compressed signals should be in the archive, and should be preserved – because compressed signal do come into the archive. The overhead is: software will also need to be preserved so the compressed signal can be converted back to a standard video signal.

Keep the best: if there is a compressed signal, then by implication somewhere there was an uncompressed signal. For instance, many professional high definition video cameras write a compressed signal to a solid-state memory card. Compression is used to get more minutes per card, which is important. But many of these cameras also have an uncompressed output. It may be fantasy to think the uncompressed signal from the camera could ever get to the archive, but in some cases (maybe not if the compression is in the camera, but just possibly if it is in post-production) an uncompressed or episode free gems less-compressed version could be obtained by the archive. It’s worth asking, and it’s worth pushing acquisition and post-production (if that is at all possible) to consider whether it’s time for them to upgrade to higher quality and lower compression.

Do no harm: this is a principle from medicine, but archives need to be just as careful. Audiovisual archive have the strange necessity of, from time to time, making a ‘new master’. Art galleries don’t repaint the Mona Lisa (though just what is acceptable as restoration is a tricky issue they do have) but audiovisual archives make new master copies when the ‘old master’ is coming to the end of its life.

While the software still works to play a compressed file, that file can be moved and replicated ad infinitum with no problems. When the software becomes obsolete, there is a problem. Unless emulation is a possibility (discussed below), the file will have to be converted to something else, either compressed or uncompressed. If compressed, it will use a new algorithm (the old one is obsolete). This will then be a cascaded compression.

Television production has been cascading compression ever since composite signals went onto videotape. The signal is played back, decoded, and then if videotaped again it is encoded and re-recorded. When the second version is played back, there is an inevitable generation loss. Video production and post-production has always lived with generational losses, but they have always been seen as a necessary evil, and as something to be managed and kept to a minimum.

The particular issue for managing cascaded digital compression systems is the unpredictability of results. Broadcasters knew how many generations of BetaSP or Digibeta could be produced before visible impairments were highly likely. The problem with cascading today’s JPEG2000 compression into tomorrow’s whoknowswhat compression is that we have no idea about the probability of visible impairment, and also no idea of the probable fragility of the result of the cascade. So the principle of do no harm is at risk when cascading disparate compression methods, and the risk increases with every repetition of the process

Best Practice for dealing with a compressed master:

  • Clone what arrives at the archive (keep the original)
  • If what arrives at an archive is lower quality than somewhere higher up the production change, investigate access to an archive version made earlier; this step particularly applies to broadcasting – and to film archives faced with DCP files
  • Remove all encryption and copy protection constraints (if possible)
  • Make an access copy from the clone, in the current access format
  • Make a new access copy (from the clone) when a new access format become current
  • Eventually migrate the clone, when the original codec is obsolete. If the original is uncompressed it will NEVER need to be re-coded, though it may need to be re-wrapped to suit whatever a ‘file’ is in the future
  • Migration to another compressed version (because of obsolescence of the original codec) will be a cascade of different types of compression – as discussed, this is best avoided! My fearless prediction is that after 2023 there will be no economic incentive for such a cascade of compressions – because storage will be so cheap ($5 per terabyte max, probably under $1 per terabyte)
  • Just possibly emulation (of the original system running the codec) could be used to continue to decode the clone into the indefinite future

My conclusion: compression is not here to stay – it is here to be managed. The next migration will dispense with the issue by migrating away from compressed to lovely, stable uncompressed video.