Get Started

    Media archiving at the Library of Congress

    October 24, 2014

    James Snyder, Senior Systems Administrator for the Library of Congress' National Audio-Visual Conservation Center (NAVCC), delivered a presentation on "Media Archiving, Standards & the Library of Congress." The Library of Congress is one of the world's largest media collections and is digitizing its entire collection, including over a million video recordings, three million audio recordings, 255 million feet of film, and hundreds of thousands of video games. Over 100,000 new items are added every year, and nearly every known format, both common and rare, must be accommodated. Snyder described the  challenges the Library faced in planning and building its media migration plant, how it chose its preservation and access file formats and how it uses international standards. "We need to keep the media safe and verified until we hand it over to our successors, and make sure they are properly trained," he said. "It's a relay marathon, not a sprint." Snyder said they also need to pass on proper documentation on how the collection was created, as well as how to maintain migration equipment such as 2-inch Quad. "As we hand over the collection to people who grew up in digital, we have to teach them analog," he noted.

    Currently, the audio lab has been in production since February 2009 and video since October 2009. In that time, they've produced over 125,000 archive files and over 5.25 PiB of content in the archive. With regard to film, they have a physical film lab but electronic preservation is in early roll-out. QC is currently manual but the first automated file QC is being tested and installed. Archive and proxy file formats include Broadcast WAV for audio, and JPEG2000 lossless MGF OP1a for video. Film is scanned and recorded in 4K; they are currently testing HD, 2K and 4K JPEG2000 lossless MXF OP1a encoders. Proxies are maintained on servers but also stored in the archive. The Archive includes dual copies, geographically dispersed. The Library uses Oracle StorageTek T10000-C data tapes. Metadata is maintained in databases, with a copy to be inserted in each archive file as well.

    "Our data set is effectively permanent although we only have a 150 year retention period designated in Copyright law," he said. "I don't think our successors will want to throw away things. We have to treat it as permanent, so we have to think about how this content will survive. We must think about archive contents that stand on their own, with no external databases required to access information about essence within a file. It must be file format agnostic, scalable to a very large size (larger than an exabyte), with very low bit error rates."

    The Packard Campus Workflow Application (PCWA software is the glue that holds it all together, tying COTS products together and offering custom ordering, media pulling, scheduling and reporting function specific to the needs of MBRS. Off-the-shelf file movement software is provided by Signiant and Aspera.

    "If you boil it down, the real goal is sustainability -- for at least 150 years," he said. "How do we sustain the effort and the technology that we're using to do it. We have to be careful not to go down any cul de sac and design the archive for the highest chance of long-term survival. From our perspective, traditional media has been about essence migration. We're moving from that to data migration, ensuring survival through multiple data migrations."

    "We are not an island," he said. "We get content from many people sitting in this room, via Copyright submissions. So we have to design workflows that fit in a larger life cycle of media content. So one of the key aspects of this is using standards." MXF was designed for interoperability as standardized tape formats once provided, offering features not available in physical media. "But as we got into that first implementation, we realized it didn't go far enough," he said. "MXF AS-07 was developed for audiovisual archiving." MX AS-07 offers the types of metadata that librarians and archivists use to make these collections searchable. "It's an application specification aimed at the media archiving field, with some features unique to the needs of this field," he said. "The goals are interoperability. Playback system validation is a key component." What about AXF? "I've worked with these gentlemen for a number of different reasons," he said. "We need to have movement of materials between different archive systems of any size and between operations of the same organization. We' also need flexibility in changing archive management vendors, with no loss of data." AXF as a single container for multiple essences is a use for AXF at the Library. "It solves many long-term data management technical challenges and potentially eases multiple migrations over the decades," he said.

    Metadata standards is another key component. "There's a lot inside that one word," he said. "Descriptive, technical, provenance, temporal, and there are precious few standards." How do we carry this through the production workflow? PBCore 2.0 is a good descriptive etadata for TV programming and XML is a good metadata format, but there is still plenty of work to be done. Snyder left the audience with two questions: "Will your content survive until 2164? And if it does, will it be somewhere other than the Library of Congress?

    Tag(s): HD , AXF , 4K , XML , QC , 2K , NAVCC

    Debra Kaufman

    Related Posts