Guidance for the Digitisation of Audio Archives

Frequency response, dynamic range, bit depth, decibels, aliasing, and even guidance for low-quality recordings — all explained in one page. I wrote this 15 years ago as an EBU guideline, but it’s still relevant. If you always wanted to know why every extra bit is worth 6dB — read on !

1 Scope
This document provides very basic information to non-technical people about audio signals, with specific reference to their digitisation. It is background information for the EBU Recommendation [1] on the Digitisation of Radio Archives.

2 Bandwidth and dynamic range
The technical characteristics of an audio signal can be pictured as a rectangle: it has height and width. The height is how powerful the signal is compared to the background noise; this dimension is related to the dynamic range of the signal, or signal-to-noise ratio, and is measured in decibels, dB. The width is the range of frequencies, called bandwidth, and is measured in Hertz, Hz.

3 Digitisation
A sound, an acoustical signal, is a variation in pressure. This variation is continuous: a waveform.
Digitisation means assigning numbers to this waveform. A signal with a wide dynamic range needs an equivalently wide range of assigned numbers. It is conventional practice to specify this range in terms of how many binary digits are allowed. Eight-bit sampling uses the numbers from 0 to 255 (because 255 is one below the 8th power of 2); 16 bit sampling allows numbers up to 65536. Every additional bit doubles the range of numbers, and so increases the dynamic range by 6 dB [2]. Eight-bit sampling has a maximum dynamic range of 48 dB [3] (20 log 256=48.16). Sixteen-bit sampling has twice the dynamic range: 96 dB. The human ear has, at its best, a dynamic range of 130 dB. However the background noise present in even the quietest situations means that the human ear would almost never hear a sound more than 100 dB above the background noise. For sound reproduction, a 40 dB range is low quality, like a pocket radio or cheapest possible cassette recorder. Studio quality recordings on tape achieve a 70 dB range, and CD has a 96 dB maximum (because it uses 16-bit sampling).

3.1 Digitisation and bandwidth
The range of human hearing varies significantly with age, and begins to deteriorate at puberty.
Young adults (age 18 to 20) can be expected to hear sounds up to 18 kHz. At age 70, the upper frequency limit would be nearer 10 kHz. When digitising, samples must be taken at twice the highest audio frequency in the signal, at a minimum. Usually a factor of 2.3 or 2.4 is used. This means that an 18 kHz sound requires (theoretically) at least 36 kHz sampling, and in practice requires more like 43 kHz.

3.2 Aliasing
When digitising, it is vital that the incoming sound is band-limited to the bandwidth of the digitising process. If not, high frequency sounds will be mirrored (aliased) as though they were within band. If a high frequency sound is present, such as 20 kHz, and the sound is sampled too low, say at 30 kHz, the signal at 20 will alias around half of 30 (namely 15), and appear at 15-5 kHz = 10 kHz, instead of where it belongs at 15+5 kHz. An inaudible sound at 20 kHz can be turned by aliasing into a very audible sound. One way to minimise aliasing is to sample at a rate much higher than the standard 2.4 times the maximum frequency (oversampling). Such over-sampling allows much simpler anti-aliasing filters to be used, which in turn means that bulky hardware associated with digitisation can be eliminated.

4 Very low quality audio
Very low quality audio (which has limited dynamic range and bandwidth, as in 78 rpm gramophone recordings) could potentially be adequately captured with 8-bit sampling. This is not done in practice, as the storage savings of a factor of two is shortsighted. Because of the continuous drop in cost of data storage media, this ‘savings’ represents the cost drop over 18 months. In effect, the process risks permanent content loss for a cost savings that would disappear in 18 months.

4.1 Reduced Sample sizes
Sample sizes between 8 and 16 bits probably requires 2 bytes of storage on the carrier, so there is no advantage in using anything less than 16-bit sampling.

4.2 Reduced bandwidth
The bandwidth of signals on the oldest carriers in sound archives (e.g. 78 rpm gramophone recordings) is usually less than 8 kHz. The use of 24 kHz sampling instead of the EBU standard 48 kHz could in principle capture this bandwidth. Again, ensuring that the bandwidth definitely did not exceed 8 kHz would probably cost more than would be saved by the reduction in resultant digital data. And, again, the savings in space/cost would effectively disappear in 18 months. Therefore there is no substantial justification for use of less than 48 kHz sampling. The only exception is in the production of audio CDs, which have a standard sampling frequency of 44.1 kHz.

[1] EBU Recommendation R105 – 2008 Digitisation of programme material in Audio Archives https://tech.ebu.ch/docs/r/r105.pdf (a revision of the 2001 recommendation)

[2] Doubling the range of the numbers means doubling the amplitude of the signal represented by the numbers. Decibels are measured in power, and power is, by definition, proportional to the square of the amplitude. Decibels are defined on a log scale, specifically they are 10 times the log of the power, which means 20 times the log of the amplitude — because 10log(x2)=20log(x).
The log of 2 is very close to 0.3; to five decimal places, log(2)=0.30103. Ten time that (because decibels are defined as ten times the log, to expand the range of numbers) is 3.0103, which engineers always round to just 3. Doubling the signal power is thus an increase of 3 dB. Doubling the signal amplitude means the power is squared, which means including another factor of two after taking the log. So every extra bit in digitisation increases the dynamic range by a number very close to 6 dB (6.0206 to four decimal places).

[3] 20 log 256=48.16 (rounded to two decimal places) – or you can just multiply the number of bits by 6 and immediately say that 8-bit digitisation has a 48 dB dynamic range.

Leave a Reply

Your email address will not be published. Required fields are marked *