CD-ROM Preservation (Geoffrey Brown)
For the past 20 years, CD-ROMs have been the primary media for distributing key economic, scientific, environmental, and societal data as well as educational and scholarly work. Indeed, more than 150,000 titles have been published including thousands distributed by the United States and other governments. Yet no viable strategy has been developed to ensure that these materials will be accessible to future generations of scholars. In the short term, these materials are subject to physical degradation which will make them ultimately unreadable and, in the long-term, technological obsolescence will make their contents unusable.
VIRTUAL CD-ROM COLLECTIONS
The objective of this project is to develop web-accesible collections of CD-ROMs that provide browsing and search within images of the original material supported by document migration to enable easy access by scholars. The prototype work includes a website providing access to nearly 3000 CD-ROMs published by the United States Government Printing Office (GPO). We are currently planning to move the key components of this work to the Indiana Digital Library Program in partnership with the GPO.
- K. Woods and G. Brown. 2009. From Imaging to Access - Effective Preservation of Legacy Removable Media. Proceedings of Archiving 2009.
- K. Woods and G. Brown. 2002. Creating Virtual CD-ROM Collections. International Journal of Digital Curation, Vol. 4, No. 2. Paper
ASSISTED EMULATION FOR DIGITAL PRESERVATION
Supported by the NSF
This project is developing practical techniques using off-the-shelf emulators (virtualization software) to ensure long-term viability of CD-ROM materials. Although emulation has been widely discussed as a preservation strategy it suffers from a fundamental flaw—future users are unlikely to be familiar with legacy software environments and will find such software increasingly difficult to use. Furthermore the user communities of many such materials are sparse and distributed, thus any necessary technical knowledge is unlikely to be available to library patrons. The key objective of this project is to develop the technology and processes necessary to mitigate these flaws and to enable large-scale deployment of emulation by libraries and archives.
- K. Woods and G. Brown. 2010. Assisted Emulation for Legacy Executables. International Journal of Digital Curation, vol5, no 1. Paper
HIGH-QUALITY FORMAT MIGRATION OF SCIENTIFIC DATA (GEOFFREY BROWN)
Supported by the IU Data to Insight Center
The objective of this project is to develop tools and processes to enable high-quality (low-risk) migration of scientific data from formats that have poor long-term viability due to dependencies on legacy software and hardware to more viable formats. While the majority of information about science, culture, society, economy, and the environment is now born digital, the underlying technology is subject to rapid obsolescence. For example, Lotus 1-2-3, the dominant spreadsheet format in the 1980’s and 1990’s, is no longer supported in Microsoft Excel, the current leader in spreadsheet software. Even when Excel supported migration for Lotus 1-2-3 significant differences such as formula calculation and supported features meant that some files could not be faithfully translated. While format migration is widely practiced and supported by many software packages there are currently no tools or processes that can ensure that data are migrated without the introduction of errors or loss of information. A significant problem is that newer formats—even where similar in function—do not generally support all of the features of their predecessors, and, where similar features exist, there may be significant differences of interpretation. For example, netCDF and CDF are important data formats with common roots, yet netCDF doesn’t support native-mode representation while CDF doesn’t support named dimensions.