How To Choose A Digital Format For Storing Video Archiving Masters

Codecs and containers for video

The different data streams in a video file are held together by a container or wrapper. Codecs are used to encode video and audio signals. A codec is a piece of gear needed to encode a data stream or signal for transmission, storage, or encryption and to decode it for playback or other uses, like editing. The word "codec" is made up of the words "coding" and "decoding." Most of the time, the word "codec" refers to a coding or compression format. Different codecs, with or without compression, can encode the essences of video and audio (the bit streams).

Video codecs include H264, MPEG2, JPEG2000, IV41, Cinepak, and Sorenson, among others.

So that computer software can read a video file, the encoded video and audio streams are put in a container with some other data streams, such as metadata and subtitles. Depending on the container format, a container can hold a specific number, type, and variety of data streams.

AVI, MOV, MP4, WMV, and MXF are all types of video containers.

Lossless and lossy compression are also supported.

Audio and video can be encrypted with or without compression, as was already said. In an uncompressed video file, all the information from the digitized source is captured and encoded without any reduction. When a lot of content needs to be digitized, uncompressed video files are huge and take up a lot of space. Video compression is used to re-encode the original content differently so that the file sizes and bit rates can be smaller.

Lossless and lossy are two class of compression codecs. When you use a lossless codec, you can make an exact copy of the data (as in an uncompressed file). Not all of the information is kept when a lossy codec is used. Different methods and algorithms can be used to reduce the size of a video (wavelet, motion compensation, discrete cosine transform, or DCT). Most compression methods can be put into three main groups:

        Visually compression

        Mathematically compression

With lossy compression, some bits are taken out of the video file to make it smaller. Most of the time, this is done by reducing the amount of color information. This process means that a part of the image and sometimes details of its luminance and chrominance (the chroma subsampling and the color bit depth) are lost for good. Lossy compression is done by codecs like MPEG-2/D10, Apple ProRes, DVCPro, and H264.

Most digital cameras have built-in lossy compression codecs that record video. Lossy compressed formats are always used for production and access (a.o. web, TV, and DVD).

Manufacturers sometimes call technically "lossy" compression schemes "visually lossless" because the average human eye shouldn't be able to tell the difference between the compressed video and the original. Despite its name, "visually lossless" is a way to compress data in which some of the data is lost forever. Because of this, the phrase "visually lossless" is sometimes better described as "near-lossless compression." In the rest of this document, the word "lossy" will also be used to describe " visually lossless compression."

Ratios of compression

The data compression ratio is the ratio between how big a file is without compression and how big it is after compression. There are different compression ratios for different compression algorithms and methods. The examples below show how lossy, lossless, and uncompressed video codecs need different amounts of space to store:

uncompressed (e.g., v210) 10-bit = about 100GB per hour of video; lossless compression (FFV1 and JPEG 2000) Lossy compression: 10-bit = about 45–50 GB per hour of video; MPEG 2 (50 Mbps) = about 25 GB per hour of video; DV (DV25) = about 12 GB per hour of video; MPEG 2 (DVD quality) = about 3.6 GB per hour of video.

Choosing a format to keep for a long time

What's different between broadcast archives and heritage archives?

The broadcasting and cultural heritage sectors often have different ideas about keeping audio-visual material safe. Both have the right to keep and share audio-visual history, but they do so in different ways and with different amounts of material. This means that other people have different ideas about what it means to preserve something and how to do it. In the context of VIAA, the content that needs to be digitized comes from a wide range of different institutions. About 70% of the content comes from the broadcast sector, and the rest comes from different heritage institutions.

Because they store a lot of material, broadcast archives often need speed, efficiency, and a format that works with their technical toolchain and workflow. Their use cases are straightforward, like when they need to use content they made in the past for their broadcasting or to make it available to other people. Usually, the message the content sends is more important than the quality of the image. On the other hand, organizations that care for cultural heritage see themselves more as caretakers than as owners of audiovisual heritage.

Most of the time, they didn't make the things they keep, so they have to answer to the people who gave them the items and keep them in the best way possible. Access is also essential for heritage institutions, but they do things based on conservation principles like authenticity, integrity, and long-term sustainability rather than short-term efficiency.

Back to Blog