How much will it cost to digitise my content?
There are people who use the word preservation interchangeably with digitisation. This is a mistake. Preservation never ends. Eternal vigilance is the price of preservation, to paraphrase. Digitisation is, to avoid confusion, one of a range of preservation actions (one-off interventions) in the course of the existence of something being preserved.
So this answer is for the narrow question about the cost of converting analogue audio or video into a digital form.
There are various ways to run a digitisation project, and there are various ways to measure cost. All this leads to confusion.
In the interest of clarity, the following answer will try to make simple statements. Most any of these could be qualified with a range of ‘yes, but’ considerations of varying degrees of importance, relevance and complexity. These statements come from a twenty-year background in audiovisual digitisation, so they should be worth something, providing the reader also remembers two things:
- the answers are simplifications;
- things change; some things will get cheaper, others more expensive. This answer is written in February 2013.
Major factors affecting cost of a digitisation project include:
- what kind of audio, video or film is being digitised: cassettes are cheaper than open-reel; gramophone recordings take a lot of manual handling; film digitisation takes very expensive equipment. As gross simplifications, audio costs at least €20 to €50 per hour for open-reel tapes (with no problems); video costs twice as much, and film costs ten times as much. [2019: film digitisation has gotten cheaper; now perhaps ‘only’ five times the price of video digitisation.]
- condition of the material: problem material takes more operator time, to reset things and try again. The cost can be limited (at the price of a higher failure rate) by limiting the time spent trying to play problem material. One common method is to limit operator time to twice the duration of the actual material. This will limit the time spent on each item, but says nothing about the overall time. One reason most projects of 1000 items or more begin with a test run or pilot sample is to estimate the percentage of problem material. Condition matters! See bottom of page.
- amount of material: this is rather obvious, but it’s not strictly linear as there are economies of scale in larger project (over 1000 items)
- workflow: this factor relates to amount of material, because an efficient workflow doesn’t make much difference on small projects. On large projects (over 1000 items) it is quite possible to save 50%, and some projects have quoted savings of 70%, on the base price per item.
- quality of the work: cheap equipment, untrained operators, no calibration or checking of equipment, cheapest digital storage for the result — can all reduce the cost, but if the digitisation is not of preservation quality then the project is a waste of money, and may destroy, forever, the chance of doing a proper digitisation. It will be hard to get funding to ‘do it right the second time’.
- checking the quality: some projects quote 30% of the total cost dedicated to quality control. That is at the high end, but somewhere between 15 and 30% extra has to be allocated for quality checking.
- metadata: the database or catalogue for a collection has to be updated — or in the worst case there is no documentation and something has to be created as part of the project. Bar codes and automation can cut the cost of logging new digital items into a database, but only for quite large projects where it is worth creating software to connect to an existing database and automatically update it for completed items. At least 10% should be added for metadata. If there is no catalogue and part of the project is to make one, the cost is probably double: the person doing the transfer is not a cataloguer so you need two people. Even if the person can do both jobs, they can’t be done at the same time (it has been tried; it can’t be done).
- method of measuring costs: the cost that matters (for getting a project funded) is the cost an institution has to pay. If computing resources, transport or cataloguing are done by existing staff and systems, they may be invisible for the purposes of the project. In the limit, if there is a technical person already on staff who can do a few items per day, digitisation may be seen as free. It is, formally. The only problem is that the workflow for such projects is usually as inefficient as possible (no division of labour, because there only is one person doing everything) and so the project will proceed very slowly. As with any project, the throughput (items per year, basically) has to be assessed, and compared with the requirements of the project. Anything that will take more than five years is risky, because nobody knows the availability of equipment and operators even just five years from now. The situation in video is desparate, audio is getting desperate, and film has a whole range of difficulties. The ‘free’ digitisation could be the road to ruin, if it leaves material undone five years from now.
The costs so far are for making a digital file from analogue content. Where does that file go? This used to be daunting but digital storage costs have come down by about a factor of 100 in the last decade, to the point where 1000 hours of high quality audio (24 bit, 96 kHz) can be stored on €150 of hard drives (for four terabytes) — and so three copies are under €500. [2019: €80 for 4TB, €250 for three copies.] Video will take ten times more storage (for lossless compressed standard definition video) and 25 times more for uncompressed. High definition video is currently a jungle of formats, but lossy compression at an ‘archive quality’ of 400 megabits/sec translates into five hours per terabyte of storage. If more than roughly ten terabytes of storage are needed (30 with double backups) — then LTO datatape is the preferred storage medium. There is an overhead of a few hundred Euros to buy a tape drive, but after that the cost of datatape is less than the cost of hard drives, the reliability is considerably better and the energy cost of datatapes on shelves is of course much less than the cost of spinning discs (though discs on shelves also consume no energy).
[2019: many public institutions are joining forces for shared storage services. ‘The cloud’, which simply means using Internet to access rented storage, is now the future of storage: letting somebody else worry about hardware, backups, redundancy and maintenance — providing it is somebody you can trust because of a demonstrable track record, transparent costs and guaranteed security.]
Condition Matters! Rule of thumb: problem material takes four times as long as problem-free material, leading to these results:
- 10% problems means 30% more work;
- 20% means 60% more: a $10 000 job will instead cost $ 16 000;
- 30% means 90% more: nearly doubling the cost of the whole project;
- 40% means 120% more and so on.