Information Today, Inc. Corporate Site KMWorld CRM Media Streaming Media Faulkner Speech Technology Unisphere/DBTA
Other ITI Websites
American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM Faulkner Information Services Fulltext Sources Online InfoToday Europe KMWorld Literary Market Place Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Streaming Media Producer Unisphere Research

News & Events > NewsBreaks
Back Index Forward
Threads bluesky LinkedIn FaceBook Instagram RSS Feed

Digital Developments at the Wellcome Library
Posted On December 1, 2015
With an average of nearly a terabyte of data preserved every month, the Wellcome Library, a medical-related archive (part of the Wellcome Trust charitable foundation), pays particular attention to the issues of data growth and obsolescence. Located just a stone’s throw from the British Library, the Wellcome Library contains a huge collection of material, much of it in digital form. This material provides insight into health and social issues in the U.K. and beyond from the medieval period to the present. Ironically, it’s located on one of the most polluted roads in the region.

The library’s collection includes records of the health effects of airship raids on London 100 years ago and the effect of World War II bombings on household cleanliness in the city. Recently, it began digitizing the health reports of medical officers from every borough in London from 1850 to 1974. These reports include details on tuberculosis testing from 1914 to 1916, the number of houses destroyed in bombing raids in World War II, the cost of building public toilets, and the number of rabid dogs by borough.

Preserving the History of DNA Research

The Wellcome Library also recently digitized documents by genetics pioneers James Watson, Francis Crick, Maurice Wilkins, and Rosalind Franklin, which are held at the University of Cambridge’s Churchill College. Their collective work on the structure of DNA won the Nobel Prize for medicine in 1962, although Franklin’s death in 1958 meant she could not be honored (Nobel Prizes are not awarded posthumously).

The collection comprises more than 1 million pages of original notes, letters, sketches, essays, and photographs. There is also a digitized version of Photograph 51, Franklin’s X-ray of a strand of DNA that contributed to Crick and Watson’s discovery of its double helix shape.

Franklin was an expert X-ray crystallographer. Incidentally, her discovery is the subject of a play, Photograph 51, which recently ran in London’s West End with Nicole Kidman in the lead role.

Help From Preservica

The DNA-related material contributes to a lot of data being loaded into the Wellcome Library’s digital archive. It runs largely on systems from specialist provider Preservica, with which the library has a commercial relationship. Production and development software provided by the latter is, according to the Wellcome Library, integrated into systems operated by the Wellcome Trust.

The Wellcome Library says its collection is held on a Preservica Enterprise Edition digital preservation platform. It includes 85,000 items, such as books, posters, paintings, and videos. On average, 11,000 users per month view an item. The third party’s software is used to manage and store the library’s digitized and born-digital collections. Dave Thompson, the Wellcome Library’s digital curator, says the digital content is stored locally. As of November 2015, it is 21TB in size and “contains approximately 14 million Jpeg2000 images and about 1,000 born digital collections.”

Thompson explains, “The choice of Jpeg2000 (part1) as a master format for digitization was partly made on the basis that this format is perceived to have a long and stable life. When that format becomes obsolete, as it inevitably will, Preservica will assist the library in migrating that content into another format. The same applies to the diverse range of formats that form the born digital collection.”

He adds that the platform housing the archive has three core functions. “It provides a secure managed environment within which we can store our digital assets. It also provides a set of decision support tools that allow the library to fully understand what is held and how to manage that content. And it also provides a platform out of which content can be disseminated.” Library users have no direct access to the platform or its content, “which is good from a data security perspective,” says Thompson.

‘Future-Proofing’ the Wellcome Library

The Wellcome Library also uses the Goobi open source software to track and manage digitized content, while metadata and page layout, formatting, and tagging software are used to provide access to digital content and to make it searchable. Nevertheless, backup or replication, as every old IT hand knows, is the key to preservation.

According to Thompson, the Wellcome Library does not back up digital content held in the Preservica system. “In reality the 1 terabyte of data is too large a body to back up on a nightly basis.” Instead, he says, the library works with the Wellcome Trust’s IT department, “to ensure that data is included in the replication strategy that is part of the Trust’s overall IT strategy. This means that, in real time, live data is replicated to two offsite storage nodes. Thus we have data security and the comfort of being able to restore content in the event of bad things happening.”

As for dealing with potential obsolescence, the Wellcome Library uses lifecycle management to help address digital preservation issues. For example, it allows for the identification of all individual file formats, says Thompson. “This supports decision making around what formats are current and viable, and decision-making around which formats may be obsolete. Definitions of obsolescence vary but preservation interventions are designed to ensure that data remains in a form that is accessible.” Support tools are available that can migrate obsolete formats to more current ones, a process that may be automated by using workflow software.

These and other software tools and techniques are being used to, as the Wellcome Library puts it, “future-proof” the digital archive. It’s certainly clear that the library is churning out large amounts of digitized data. “Figures for the production of digitized content can vary over time,” says Thompson. “Over the summer of 2015 we were ingesting over a terabyte per month [and] peaking in July at 1.43 terabytes.” The average volume of data ingested per month over the past 6 months is 0.88 TB. According to Thompson, this equates to 1,671,200 individual files. However, he says “the key measure of success for the library’s overall digitization strategy is not the volume of content ingested but the number of items that can be made available online to library users.” 

John Charlton writes about technology, law, and education for several publications.

Related Articles

12/19/2023Preservica's Preserve365 Comes to the Microsoft Azure Marketplace
9/26/2023AM and Preservica Enable Quartex Users to Publish and Preserve Their Collections
1/8/2019Virtually Unraveling the Damaged Past at Cardiff University
10/18/2018Study Reveals That DNA May Not Be Anonymous
7/19/2018Wellcome and Springer Nature Introduce Pilot for Making Research Datasets Available
6/13/2017New York Academy of Medicine Library Debuts New Website
2/7/2017Preservica Commits to Digital Preservation Group
9/6/2016Webrecorder Makes Web Preservation Personal
6/14/2016ProQuest Completes Digitization Project With Wellcome Library
9/8/2015ProQuest Makes More Early European Books Available
8/5/2014Wellcome Library and Jisc Will Digitize Historic Medical Books
3/4/2013History of Science, Technology and Medicine Now Available from EBSCO Publishing
8/15/2011ProQuest Uncovers More Treasures from European Rare Book Libraries
7/28/2011Wellcome Library Partners With ProQuest to Digitize Early European Books

Comments Add A Comment

              Back to top