All posts by admin

The Voice of Stephen Hawking

When I first heard the synthetic voice that Stephen Hawking used as a communication aid, I said “I know that voice”. And indeed the voice was known to anyone who worked in speech research in the 1970’s and 1980’s, because it was the voice of Dennis Klatt.

Dennis Klatt, MIT research scientist from 1965 until 1988, specialised in speech synthesis. In the 1960’s it took hours of computing to form a single sentence. The process started with having data on the individual sounds of speech and then modifying that data to form a synthetic utterance. This was ‘phoneme synthesis’, and the starting point was enough knowledge of speech to write down, like Professor Henry Higgens in Pygmalion (My Fair Lady), an utterance in a phonetic script.

Dennis Klatt spent years making measurements on his own voice to produce, in the end, an advance on the state of the art in terms of natural sounding speech. Indeed, it was so natural that it sounded like Dennis himself: Klattalk. Then he added ‘text to speech’ rules, so the synthesiser could, at long last, be controlled by ordinary text rather than by phonetic symbols. The full system was described in an IEEE article in 1982,  by which time Dennis was already suffering from thyroid cancer — and losing his voice!

I could recognise his voice because I saw him frequently at Acoustical Society of America meetings during 1967 to 1974 (when I left the USA to come to England). He took an interest in me, a very junior researcher, because we had both done graduate study in the department of computer and communication sciences at the University of Michigan, though he was there several years before me. I was struggling to learn as much as I could about speech, particularly synthesis, and so we had several discussions — certainly enough so that when I first heard Stephen Hawking’s synthetic voice in the 1990’s I knew that it was Klattalk, and indeed it was still recognisably the voice of Dennis Klatt.

I last spoke with Dennis in 1982, at a speech conference in Paris. His larynx was already affected and his voice was hoarse. He died in 1988, but Klattalk had by that time been marketed by Digital Equipment Corporation (DEC) as DecTalk. Stephen Hawking continued to use a version of DecTalk throughout his life, because DecTalk had become identified as the voice of Stephen Hawking, though to anyone who did speech research in the 1970’s and 80’s it will remain the voice of Dennis Klatt.

[[also posted in Media:]]

Added 6 April 2018: Stephen Hawking’s daughter Lucy Hawking did a BBC Radio 4 programme on speech synthesis and her father’s voice, including a detailed discussion with Dr Laura Fine, the daughter of Dennis Klatt.

The Future of Television Archives

This blog was written for International Digital Preservation Day, and was originally published here:

Not everyone knows that there even are television archives. Europe is fortunate in having a tradition of public service broadcasters. They are publicly supported in various ways (licence fee, limited advertising, direct government funding) but all have a remit to provide high-quality information and entertainment. Broadcasting can be seen as ephemera, but yesterday’s ephemera becomes today’s heritage. Of particular interest in a time of fake and false news is the role of public service broadcasters in providing quality factual material: news and current affairs.

Public service broadcasters, particularly in Europe, have also led the way in maintaining archives of their productions. While drama and entertainment programmes are kept for repeats and for sale to other countries, factual content is heavily recycled to add depth and interest to current programmes. In the BBC, about 30 to 40 percent of ‘the news’ is actually archive material. Other uses include biographies; retrospectives on people, places and political situations; cultural history; and a wide range of factual content that needs archive footage for context and historical memory. Up to 2010, about 20% of the BBC television archive was accessed each year, and 95% of that use was internal: back into the BBC for adding depth to new programmes. The other 5% was commercial use. Broadcast archives had little or no public access. In the UK, public access to BBC Archives was via copies of tapes sent from the BBC to the British Film Institute.

Broadcast archives are large: the BBC archive is about one million hours, and most national public service broadcasters across Europe have archives of several hundred thousand hours. Because of legal deposit legislation (covering national and local TV and radio), the Swedish national audiovisual collection is six million hours!

Over the last 20 years the technology of television has completely changed. Analogue transmission from huge aerials to bulky cathode-ray tube receivers has been replaced by digital transmission from aerials, satellites and fibre-optic cables, sent to flat screen LED devices. An even greater change is ‘television’ sent over the Internet and viewed on a computer monitor, tablet or mobile phone.

Archiving has also changed. It used to be that only the broadcaster could make a professional archive, because ‘off air’ recordings on VHS tape were of low quality, and required a lot of manual effort to implement. With digital technology, it is relatively simple and cheap to have an automated process to grab virtually all of a nation’s television from satellite transmissions.

The new technology, combined with a growing appreciation of the significance of broadcasting, has led to the rise of national television (or broadcasting or audiovisual) archives. Poland, Hungary, Latvia in Europe and Columbia, Mexico, Singapore and a range of other countries now have some form of national institution which collects broadcasting. A related change is for national libraries to expand to cover national broadcasting, possibly because of ‘legal deposit’ legislation requiring broadcasts to be collected and preserved. These new institutions are good news for the public, as they are all committed to developing wider public access, including Internet access.

BUT – these are off-air recordings. They are better than the VHS tapes of the 1980s, but not as good as the output from the camera and the quality used in producing programmes. Off-air recordings are fine for viewing copies, but have real quality limitations when it comes to re-purposing the content for new programmes. The video signal for satellite transmission is compressed by a factor of 10 to 20. This is lossy compression, meaning original quality is not recoverable. In contrast, professional TV archives try to save uncompressed or losslessly compressed signals.

Who cares? The future will care. Lossy compression today leads to ‘cascaded compression’ in the future, when material is recoded to new standards. Decades of experience show that there is a great risk when cascading: eventually there will be significant failures. We already see dropout, freezing and blocking of the image as a regular feature of digital television. Most such problems are caused by errors in reception, but transcoding errors and cascaded quality loss are the time bomb ticking in all archives containing content with lossy compression – meaning all off-air archives.

In addition, professional TV archives no longer get as much master material as they used to. In 1980 the BBC made about 90% of its output in-house, so the archive could get ‘the master tape’. Now that figure has been cut to 30%. The rest of the content is made by independent production companies, which come and go in a volatile business. The independents send in a file that can ‘go to air’ – a transmission file, with lossy compression and not significantly better that an off-air recording.

So: the future of television archives is that, having faced the hurdle of digitising their old analogue content in order to keep it alive, they now face a future of mainly holding reduced quality viewing copies. The future of master quality (production quality) video content is very much in doubt.

Preserving Files – How to make sure your files are safe and uncorrupted

There are formal standards and technology for digital preservation: OAIS and all that. The field has had intense development for the last ten years, with many results:
• standards for trusted repositories (OAIS)
• open-source repositories (D-Space, Fedora, Greenstone …)
• standards for risk management (TRAC)
• open-source tools for
o identifying file types (JHOVE, PRONOM, FITS)
o verifying that files conform to standards (so that a PDF is really a PDF) (DROID)
o stripping out embedded metadata so a new file can be added to the catalogue (EXIFtool, FFMPEG and others)
• standards for describing digital files (MODS and lots and lots more)
• standards and tools for combining files and metadata into a unit (METS, BAGIT … or MXF)

There are two other main approaches to maintaining a digital collections:
1. Digital Asset Management (DAM) systems. These have been around in some form for nearly 20 years. The Imagen system from Cambridge Imaging ( is a good example of an asset management system tailored for the needs of audiovisual collections.
2. Do it yourself: make a spreadsheet (or simple database) of information about what you have, and use manual processes to ensure you have backups and that they work.

Here we come to the key issue: do your backups work? I won’t even consider the case of a collection that doesn’t have backups, as that approach is clearly doomed. The easiest way to ensure that backups are present and usable is to have software, like an asset management system or storage management system, which automatically makes backups for every new file that enters the system – and automatically does periodical checking that the main files and the backups are viable.

The periodical checking is vital. Anybody can write files twice to storage, and walk away. The issue is: what will still be there, error-free, when you come back?


Matthew Addis (then of Southampton University) and the PrestoPRIME project developed a simplification of OAIS, as shown in the diagramme. This approach is at the heart of the service provided by Arkivum ( – a company founded to not only offer storage, but to offer guarantees (with indemnities) that your content will be kept error-free.


The process begins (green circle) with having two files (master and backup), and with having a process that automatically checks the masters to make sure they are ok. If there is a failure, that means the state of the collection has switched to amber, meaning warning. The vital issue is detection of the failure, because only then can the system enter the yellow state (failure detected, corrective action initiated). Then the file is restored from backup – a process that only works if the backup file is also still ok. If it is, the system returns to green. If not, the file cannot be recovered and the status (for that file) is red: that file is lost and gone forever (except for the possibility of very expensive and time-consuming intervention to recover something from a corrupted file or storage system — or just possibly the re-ingest (re-digitisation) of items that were not ‘born digital’).

Several times now I’ve referred to checking that a file is ok. A simple concept, but how is it implemented? A small collection can be manually tested to see if the files open and play, but that doesn’t scale. The proper approach is to compute a “fixity check” on a file that is known to be good. Forever after, if a new fixity calculation produces the same number, the file has not changed.

A good asset management system will compute fixity checks for all new files, make two (or more) copies of all new files, and periodically recompute the fixity numbers to prove the files are still intact. In case of error, the system should replace the broken one from a backup which does pass its fixity check.

However, not all asset management systems manage backups, not all systems check fixity – and not all people have a comprehensive asset management system. There is now a simple, free tool that anyone can use for fixity calculation and checking. I use it on my personal computer for the inventory and monthly verification of a collection of about 25,000 photos. After getting the software set up, it now recomputes all 25,000 fixity codes every month, and informs me of any problems.

The software is called Fixity and comes from AVPreserve, a leading consultancy in audiovisual preservation ( I strongly recommend that anyone with responsibility for large collections of files moves immediately to use of fixity calculation and regular (at least monthly) fixity checking to provide clear proof that their files are intact.

So how big is a ‘large collection’? I think the tipping point is around 1000 files. More than that, and manual approaches just run out of steam — and lead to loss. You forget where things are, you forget to do backups, the backups are disorganised or incomplete or out of date — and the files may be corrupt without you ever noticing. Using a tool, like Fixity, puts you on the road to preservation instead of the road to loss.

Guidance for the Digitisation of Audio Archives

Frequency response, dynamic range, bit depth, decibels, aliasing, and even guidance for low-quality recordings — all explained in one page. I wrote this 15 years ago as an EBU guideline, but it’s still relevant. If you always wanted to know why every extra bit is worth 6dB — read on !

1 Scope
This document provides very basic information to non-technical people about audio signals, with specific reference to their digitisation. It is background information for the EBU Recommendation [1] on the Digitisation of Radio Archives.

2 Bandwidth and dynamic range
The technical characteristics of an audio signal can be pictured as a rectangle: it has height and width. The height is how powerful the signal is compared to the background noise; this dimension is related to the dynamic range of the signal, or signal-to-noise ratio, and is measured in decibels, dB. The width is the range of frequencies, called bandwidth, and is measured in Hertz, Hz.

3 Digitisation
A sound, an acoustical signal, is a variation in pressure. This variation is continuous: a waveform.
Digitisation means assigning numbers to this waveform. A signal with a wide dynamic range needs an equivalently wide range of assigned numbers. It is conventional practice to specify this range in terms of how many binary digits are allowed. Eight-bit sampling uses the numbers from 0 to 255 (because 255 is one below the 8th power of 2); 16 bit sampling allows numbers up to 65536. Every additional bit doubles the range of numbers, and so increases the dynamic range by 6 dB [2]. Eight-bit sampling has a maximum dynamic range of 48 dB [3] (20 log 256=48.16). Sixteen-bit sampling has twice the dynamic range: 96 dB. The human ear has, at its best, a dynamic range of 130 dB. However the background noise present in even the quietest situations means that the human ear would almost never hear a sound more than 100 dB above the background noise. For sound reproduction, a 40 dB range is low quality, like a pocket radio or cheapest possible cassette recorder. Studio quality recordings on tape achieve a 70 dB range, and CD has a 96 dB maximum (because it uses 16-bit sampling).

3.1 Digitisation and bandwidth
The range of human hearing varies significantly with age, and begins to deteriorate at puberty.
Young adults (age 18 to 20) can be expected to hear sounds up to 18 kHz. At age 70, the upper frequency limit would be nearer 10 kHz. When digitising, samples must be taken at twice the highest audio frequency in the signal, at a minimum. Usually a factor of 2.3 or 2.4 is used. This means that an 18 kHz sound requires (theoretically) at least 36 kHz sampling, and in practice requires more like 43 kHz.

3.2 Aliasing
When digitising, it is vital that the incoming sound is band-limited to the bandwidth of the digitising process. If not, high frequency sounds will be mirrored (aliased) as though they were within band. If a high frequency sound is present, such as 20 kHz, and the sound is sampled too low, say at 30 kHz, the signal at 20 will alias around half of 30 (namely 15), and appear at 15-5 kHz = 10 kHz, instead of where it belongs at 15+5 kHz. An inaudible sound at 20 kHz can be turned by aliasing into a very audible sound. One way to minimise aliasing is to sample at a rate much higher than the standard 2.4 times the maximum frequency (oversampling). Such over-sampling allows much simpler anti-aliasing filters to be used, which in turn means that bulky hardware associated with digitisation can be eliminated.

4 Very low quality audio
Very low quality audio (which has limited dynamic range and bandwidth, as in 78 rpm gramophone recordings) could potentially be adequately captured with 8-bit sampling. This is not done in practice, as the storage savings of a factor of two is shortsighted. Because of the continuous drop in cost of data storage media, this ‘savings’ represents the cost drop over 18 months. In effect, the process risks permanent content loss for a cost savings that would disappear in 18 months.

4.1 Reduced Sample sizes
Sample sizes between 8 and 16 bits probably requires 2 bytes of storage on the carrier, so there is no advantage in using anything less than 16-bit sampling.

4.2 Reduced bandwidth
The bandwidth of signals on the oldest carriers in sound archives (e.g. 78 rpm gramophone recordings) is usually less than 8 kHz. The use of 24 kHz sampling instead of the EBU standard 48 kHz could in principle capture this bandwidth. Again, ensuring that the bandwidth definitely did not exceed 8 kHz would probably cost more than would be saved by the reduction in resultant digital data. And, again, the savings in space/cost would effectively disappear in 18 months. Therefore there is no substantial justification for use of less than 48 kHz sampling. The only exception is in the production of audio CDs, which have a standard sampling frequency of 44.1 kHz.

[1] EBU Recommendation R105 – 2008 Digitisation of programme material in Audio Archives (a revision of the 2001 recommendation)

[2] Doubling the range of the numbers means doubling the amplitude of the signal represented by the numbers. Decibels are measured in power, and power is, by definition, proportional to the square of the amplitude. Decibels are defined on a log scale, specifically they are 10 times the log of the power, which means 20 times the log of the amplitude — because 10log(x2)=20log(x).
The log of 2 is very close to 0.3; to five decimal places, log(2)=0.30103. Ten time that (because decibels are defined as ten times the log, to expand the range of numbers) is 3.0103, which engineers always round to just 3. Doubling the signal power is thus an increase of 3 dB. Doubling the signal amplitude means the power is squared, which means including another factor of two after taking the log. So every extra bit in digitisation increases the dynamic range by a number very close to 6 dB (6.0206 to four decimal places).

[3] 20 log 256=48.16 (rounded to two decimal places) – or you can just multiply the number of bits by 6 and immediately say that 8-bit digitisation has a 48 dB dynamic range.

Coping with errors in audio and video files

My first encounter with technology for coping with errors in audiovisual files was at the AES conference in Budapest (1) in 2001 – where Jorg Houpert of Cube-Tec had his laptop stolen. That conference had a paper on minimising the effects of losing an entire block of audio data, by spreading the data across the whole file. Loss of a block of data produces a reduction in signal-to-noise ratio of the whole file, rather than a gap in the audio. The process was considered by NOA of Vienna, though never commercially implemented.

But that wasn’t the first audio technology to reduce effects of errors. Sony Walkman CD players used buffering to allow re-reading of tracks. The more expensive the player, the more buffering and the greater the likelihood of uninterrupted audio even when jogging.

Similar real-time or near real-time corrections are built into most video playback decks, though often the technology is proprietary, undocumented and inaccessible. One of the main advantages of the work of the DAVID project ( is to produce open and well-documented error-recovery processes.

The error indications produced by playback equipment for the video DV format are documented, and much has been gained from examining and using DV error information. The whole issue of how to get the best DV playback has been the subject of useful work by Dave Rice (2), independent consultant, and Chris Lacinak, AV Preserve (3); both in New York City.

But coping with playback errors in video was already part of videotape playback technology in analogue equipment, where methods of storage of one line of video would allow ‘concealment’ of a missing line (or part of a line). With digital videotape these methods just became easier and more powerful, leading to the ability to conceal whole frames of bad data.

The ability of videotape players to conceal errors – many errors – while a video file with a single error was unreadable, was brought to my attention by Jean-Hugue Chenot of INA around ten years ago. INA had tried to interest technology companies in doing something to make video files more resilient, but there had been no take-up.

Some years after that, I was invited to give a keynote talk at a conference I wouldn’t usually attend, specialising in still images: the Image Science and Technology conference Archiving 2008, held in Bern, Switzerland. There was a paper by Prof Manfred Thaller of Cologne, reporting on the striking difference in the effects on an image of one or a small number of file errors, depending upon whether the encoding was compressed or uncompressed. For any kind of compression, lossless or lossy, a single error was magnified by the computations used to turn the data back into an image. For an uncompressed file, one error basically affected one pixel. For a compressed file, large parts of an image were often destroyed. Volker Heydegger (4), a student of Prof Thaller, had subjected hundreds of thousands of files to various levels of readback error, and compiled statistics on the average number of pixels affected. In general compression magnified the effects of errors by 100 to 1000 times, or even more – depending upon the type of file and the type of compression. A one-byte error would, on average, corrupt 17% of a lossless JPEG2000 file, and 33% of a lossy JPEG2000 file.

This paper caused quite a stir. The head of technology at Adobe said it was ‘bad science’, because it would be better to store multiple compressed copies than to abandon all use of compression. Of course Adobe was a leading producer of PDF files, which are uncompressed yet still have the property of magnifying errors, when compared to .txt files. One could say .txt files were too primitive to be a fair comparison, as they have no structure, no markup. An obvious way to introduce markup without also introducing error magnification is by using XML files, which are just as resilient at .txt files. In fact, they are .txt files, really. And now, years later, Microsoft does use XML for its WORD files, and then re-introduces error magnification by packaging the XML files in a ZIP file – which it renames as DOCX.

Regarding whether it was better to have one resilient copy or three compressed copies – taking up the same space but without readback error resilience – it wasn’t until PrestoPRIME and the tools for modelling storage and file management (5) developed by ITI, that we were able to get statistical answers to such questions.

Meanwhile, Joanneum had been working with INA – and for a while also with the BBC – on image, video and film restoration. There is a whole range of technology available for restoration, and the DIAMANT system developed by Joanneum and marketed by HS-ART (6) is the market leader.

Back in Vienna, Christoph Bauer at ORF was encountering their first (and let’s hope only) major outbreak of file errors. An encoding problem had produced incorrect IMX (D10) files (7). Previously, Laurent Boch of the R&D department of the Italian broadcaster RAI had already identified a range of problems with the IMX format, so the ORF people were able to get help, and sympathy, from partners in EU projects and in FIAT/IFTA, the professional organisation for television archives. ORF had problems with fully half of the files from 23,000 hours of video. ORF needed more than sympathy, and one of the first successes of the DAVID project was to develop software to re-wrap the video in a standards-compliant fashion. This solution has become the Cube-Tec MXF Legalizer (8).

Related work on transferring 300,000 hours of video at ORF had a quality-control problem. The BBC approach of “completely manual, watch every frame and listen to every second” quality control was expensive and slow. Joanneum Research had the technology to automatically detect defects, because that is the starting point for image restoration systems such as DIAMANT. However there is a long route from fault detection to a working quality-control system, as evidenced by the BBC’s own struggle to introduce automation into their workflow. A principal problem is generating too many false alarms, each of which has to then be manually checked – obviating the whole purpose of the automation. The only solution to false alarms is better detection, which in turn requires hundreds and eventually thousands of hours of manually-corrected training and test data.

ORF has worked over many years with Joanneum, but the DAVID project brought the technology and the practical application close enough together, for sufficient time and with sufficient training and test material, to finally develop an effective process, which is now the Vidicert (9) product.

This has been my “view from a distance” of the DAVID project, and there is much more to the project that I haven’t mentioned. I hope interested parties will take the time find out more because I’ve only mentioned two of the seven areas that the DAVID website lists as ‘spotlights‘. What I have most enjoyed seeing from DAVID is solutions, at last, to problems that many interrelated projects have been working on since, originally, the Aurora video restoration project of nearly 20 years ago.

My congratulations to the DAVID partners.

(1) Stocker, Daniel. Protective Nonlinear File Structure. In Proceedings of the AES 20th International Conference: Archiving, Restoration, and New Methods of Recording, Budapest, Hungary, 2001 October 5-7
(2) Dave Rice:
(3) Digital Tape Preservation Strategy: Preserving Data or Video? By David Rice and Chris Lacinak – December 2, 2009
(4) Heydegger, Volker; Analysing the Impact of File Formats on Data Integrity. Archiving 2008, Bern
(5) Digital Preservation Tools from IT Innovation
(6) Diamant and DustBuster
(7) D6.2: Longitudinal CoP Impact Analysis.
(9) VidiCert

A Compressed View of Video Compression

Digital audio and digitised film can also be compressed, but there are particular issues – and an interesting (well, for some) history – for video, so I will emphasise video. The general principles apply to any signal (including audio and scanned film), but not to files and digital data in general.

A signal is where engineers start with audio and video. A sound field (the variations in pressure in a three-dimensional space) is a complicated thing, but a microphone inside that sound field produces a voltage that varies with time: a signal. The visual field is equally complicated, but a video camera allows the pattern of light (through a lens onto some sort of receptor) to create a two-dimensional pattern which can then be scanned. Scanning converts the pattern into a voltage that varies with time: a signal. A datacine machine does much the same to film: converts an image into a signal.

Signals carry information, with greater or lesser efficiency. The data rate of the sequence of number representing a signal can be much higher than the rate of information carried by the signal. Because high data rates are always a problem, technology seeks methods to carry the information in concise ways.

Video: compressed from birth
Television started out by trying to produce successive images at a rate fast enough to exceed the ‘flicker fusion’ threshold of the human eye. At somewhere above 30 to 40 images per second, the pattern looks continuous and the eye is fooled into thinking it is seeing continuous motion. But television technology (in 1936) was incapable of transmitting 405 lines of information at 50 times per second, so they threw away half the data and sent half the image in one 20 millisecond time slot, and the second half in the next. The result was the needed 50% reduction in (analogue) bandwidth, the rough equivalent of (digital) datarate (bitrate). To make the ‘compression’ as visually acceptable as possible, odd numbered lines were sent as the first ‘field’ – and even ones in the next: interlace.

When colour television was developed, there were further problems. Ideally colour is two separate dimensions (the two dimensions of a ‘colour wheel’), which add colour information to the black and white pattern described by the luminance signal. The three signals form component video. For broadcasting this all had to go into one signal, so colour was jammed into the luminance signal as composite video – another kind of compression (and another compromise).

Some videotape formats (eg analog 1/2-inch, 1-inch, VHS, Betamax and U-matic; digital D2, D3) record a composite signal. In dealing with principles such as ‘keep the best’ and ‘keep the original’ it is important to know what the original actually is! It gets murkier: it was hard to get even composite video onto a videotape, so many composite analogue recorders (notably U-matic in the semi-professional area) also shifted the frequency modulation of the colour information, to get it into another place in the overall spectrum where colour information caused less interference to luminance: the colour under approach.

The conclusion is that alteration of video to squeeze it into limited bandwidth or into limited tape recorders has been with us from the beginning of video: interlace, composite, colour under.

In the UK, redundancy means losing your job. In information theory, redundancy is formally defined, and relates to the data rate of a signal being higher than the actual information. For instance, if I’m on a noisy telephone line I might start repeating key information. It takes more signal (more time), but improves the odds that the information will be transmitted.

A CD carries audio at 1.4 million bits per second. If the audio is a person speaking, they may be conveying about 3 words per second. With a 30,000 word vocabulary, that works out as about 45 bits per second (because 30,000 is about the 15th power of two). If the audio could be run through a speech recogniser, the 45 bits could be transmitted and the speech could be regenerated at the other end using a synthesizer. The compression achieved would be enormous: a factor of about 30,000 [1.4 million divided by 45]. But the synthesized sound would convey only the words, and not what the speaker sounded like or any other aspect of the original sound except the ‘meaning’. On music, nobody would be pleased. A transcript of a Janis Joplin song just doesn’t capture what matters.

Which brings us to the crux: what matters, and what can be thrown away?

For images, the ‘meaning’ is undefinable, but image quality metrics have been defined. It is difficult to come up with an equation that exactly fits the judgements a person would make about degree of impairment to an image, but the metrics come close. Essentially, video compression methods attempt to maximise the reduction in datarate while minimising the estimated visual quality difference (before vs after).

If the information is still there, and the datarate is reduced, then that’s A Good Thing, isn’t it? Not necessarily. Redundancy is useful, as in my telephone conversation where I repeated things. Redundant signal are robust signals (they have a higher probability of undergoing some sort of mishap, and still carrying the information). Heavily compressed signals are fragile: they can look great, but touch them and they shatter.

Managing Compression
As with everything else about archiving and preservation, a key issue is management: knowing what you’re dealing with, having a strategy, monitoring the strategy, keeping on top of things so loss is prevented.

I think some clear principles can be stated for audiovisual archiving, and these principles can be used to manage the use of compression:

Basic principles:

  • Keep the original
  • Keep the best
  • Do no harm

What do these principles have to do with compression?

Keep the original: means that compressed signals should be in the archive, and should be preserved – because compressed signal do come into the archive. The overhead is: software will also need to be preserved so the compressed signal can be converted back to a standard video signal.

Keep the best: if there is a compressed signal, then by implication somewhere there was an uncompressed signal. For instance, many professional high definition video cameras write a compressed signal to a solid-state memory card. Compression is used to get more minutes per card, which is important. But many of these cameras also have an uncompressed output. It may be fantasy to think the uncompressed signal from the camera could ever get to the archive, but in some cases (maybe not if the compression is in the camera, but just possibly if it is in post-production) an uncompressed or episode free gems less-compressed version could be obtained by the archive. It’s worth asking, and it’s worth pushing acquisition and post-production (if that is at all possible) to consider whether it’s time for them to upgrade to higher quality and lower compression.

Do no harm: this is a principle from medicine, but archives need to be just as careful. Audiovisual archive have the strange necessity of, from time to time, making a ‘new master’. Art galleries don’t repaint the Mona Lisa (though just what is acceptable as restoration is a tricky issue they do have) but audiovisual archives make new master copies when the ‘old master’ is coming to the end of its life.

While the software still works to play a compressed file, that file can be moved and replicated ad infinitum with no problems. When the software becomes obsolete, there is a problem. Unless emulation is a possibility (discussed below), the file will have to be converted to something else, either compressed or uncompressed. If compressed, it will use a new algorithm (the old one is obsolete). This will then be a cascaded compression.

Television production has been cascading compression ever since composite signals went onto videotape. The signal is played back, decoded, and then if videotaped again it is encoded and re-recorded. When the second version is played back, there is an inevitable generation loss. Video production and post-production has always lived with generational losses, but they have always been seen as a necessary evil, and as something to be managed and kept to a minimum.

The particular issue for managing cascaded digital compression systems is the unpredictability of results. Broadcasters knew how many generations of BetaSP or Digibeta could be produced before visible impairments were highly likely. The problem with cascading today’s JPEG2000 compression into tomorrow’s whoknowswhat compression is that we have no idea about the probability of visible impairment, and also no idea of the probable fragility of the result of the cascade. So the principle of do no harm is at risk when cascading disparate compression methods, and the risk increases with every repetition of the process

Best Practice for dealing with a compressed master:

  • Clone what arrives at the archive (keep the original)
  • If what arrives at an archive is lower quality than somewhere higher up the production change, investigate access to an archive version made earlier; this step particularly applies to broadcasting – and to film archives faced with DCP files
  • Remove all encryption and copy protection constraints (if possible)
  • Make an access copy from the clone, in the current access format
  • Make a new access copy (from the clone) when a new access format become current
  • Eventually migrate the clone, when the original codec is obsolete. If the original is uncompressed it will NEVER need to be re-coded, though it may need to be re-wrapped to suit whatever a ‘file’ is in the future
  • Migration to another compressed version (because of obsolescence of the original codec) will be a cascade of different types of compression – as discussed, this is best avoided! My fearless prediction is that after 2023 there will be no economic incentive for such a cascade of compressions – because storage will be so cheap ($5 per terabyte max, probably under $1 per terabyte)
  • Just possibly emulation (of the original system running the codec) could be used to continue to decode the clone into the indefinite future

My conclusion: compression is not here to stay – it is here to be managed. The next migration will dispense with the issue by migrating away from compressed to lovely, stable uncompressed video.