Beyond the Shelf:
Serving Historic Kentuckiana Through Virtual Access
A Proposal to the
Institute of Museum and Library Services
National Leadership Grants
Preservation or Digitization
Feb. 1, 2002
University of Kentucky
Lexington, KY 40506-0456
Kentucky Virtual Library
Kentucky’s early published history is inextricably interwoven with that of the opening of the great western movement in the United States. One is all but dependent on the other. As adventurers, pioneers, explorers, and eventually families settled in Kentucky, they recorded its rapidly changing history and geography. The mid-eighteenth century exploratory journals of Thomas Walker and Christopher Gist (later to be published in book form) marked a beginning of the recorded history of Kentucky, as did the journals of William Caulk, Richard Henderson, and Captain George Rogers Clark. The 1787 attempt to draw together as much information as could be gathered in a locally printed volume made John Filson’s Discovery, Settlement and State of Kentucky, a seminal historical record of the beginnings of Kentucky. Between 1787 and 2002, the expansion of the bibliographic body of Kentucky materials has been garnering increased attention.
In 1945, Kentucky historians (including Thomas D. Clark, Distinguished Professor Emeritus of History at both the University of Kentucky and Indiana University) persuaded bibliophile and historian John Winston Coleman to assemble a bibliography of Kentucky history. This was a landmark date in the orderly consideration of Kentucky non-fiction literature. At the beginning, neither Coleman nor anybody else had a clear concept of the volume or variety of writings about the Commonwealth of Kentucky. Coleman, an indefatigable collector of Kentuckiana, possessed an appreciable knowledge of the location and identification of titles. Within three years, he produced an impressive bibliography with proper annotation and classification. There is but a slender list of titles that were either overlooked by Coleman or were unknown to exist in 1948.
A Bibliography of Kentucky History, published in 1949 by the University of Kentucky Press, includes 3,571 items divided into 76 categories, including county histories; early explorations and settlements; military expeditions, battles, and campaigns; reminiscences, recollections, and memoirs; and speeches and debates. For researchers into the history of Kentucky as a border state in the U.S. Civil War, a westward frontier, an agrarian economy, and a socially and politically diverse commonwealth, Coleman’s work is the indispensable starting point. Appendix Item #1 includes the Table of Contents from Coleman’s Bibliography.
Since 1992, the University of Kentucky Libraries has participated in the NEH-funded SOLINET/ASERL Cooperative Preservation Microfilming Projects (CPMP2-5, 1992-2002). By June 30, 2002, the CPMPs will have microfilmed approximately 3,000 titles in 6,000 volumes from the Libraries’ comprehensive Kentuckiana collection. During the CPMPs, the University of Kentucky selected approximately 1,500 titles from Coleman’s Bibliography. This corpus of microfilmed titles will form the target collection for Beyond the Shelf.
National Impact: Description of Target Collection
The target collection comprises 1,500 monographic titles on 35mm silver halide, polyester base, second-generation negative microfilm (print masters). The first generation camera masters are stored by SOLINET in climate-controlled, underground storage. The second generation print masters are stored in the UK Libraries climate-controlled storage facility in Special Collections.
The Kentucky Association of Teachers of History (KATH) reviewed the list of CPMP microfilm titles from Coleman’s Bibliography and enthusiastically supported the goal to convert 950 titles into digital format. A sample of the selected titles is included in Appendix Item #2.
Prior to participation in the CPMPs, the UK Libraries’ Preservation Department conducted a condition survey of the pre-1950 published Kentuckiana. The results of the survey supported the need for immediate preservation action: 88% of the history titles and 94% of the literature titles were either brittle or printed on imperiled, acidic stock. The survey also showed that no filming project had focused specifically or comprehensively on Kentuckiana. Before inclusion in the CPMP projects, each title was searched in the OCLC and RLIN databases to determine if acceptable microformats or reprints existed. If an acceptable replacement could not be located, the title became a candidate for microfilming, and an “intent to film” MARC bibliographic record was created to alert the scholarly community of the plans for filming by CPMP. These two steps ensured that NEH did not fund duplicative or redundant preservation efforts. To date, no project to select and preserve Kentuckiana has been as systematic and inclusive as the efforts of the SOLINET/ASERL CPMPs.
Kentuckiana titles included in the CPMPs were microfilmed by one of three microfilming services: Preservation Resources (1992-present) Northeast Document Conservation Center (1992-present), or University Microfilms, Inc. (1992-95). All three vendors are well known for their capacity to produce 35mm film according the ANSI/AIIM standards and RLG Guidelines.
At the filming agent, three generations of microfilm were created. The master negative was used to generate a second-generation print master. From the print master, the positive third generation was created. The print masters and positive copies were returned with the source documents to the UK Libraries.
Few, if any, source documents were withdrawn after microfilming. Instead, many volumes were mended, cleaned or released from corroded stapled bindings before image capture. Many volumes were re-housed in phase boxes, taken out of circulation, and transferred to Special Collections. Some materials, particularly state publications, were removed from the University of Kentucky’s collection and placed at the Kentucky Department for Libraries and Archives (KDLA) to create a complete paper archive of formerly disparate and scarce resources.
The CPMP microfilm as the target collection for the KYVL Kentuckiana Digital Library logically extends the University’s and KYVL’s missions to improve or enhance access to specialized materials for a diverse learning environment. Clearly, the process used to select, produce and store the CPMP microfilm conforms to well-established standards for quality and well-embraced principles for selection of significant collections for preservation funding. With the source documents secured in UK Libraries Special Collections or at KDLA, and the informational content preserved in high quality film; the next logical step is to broaden access by converting the microfilm to digital formats.
To develop Beyond the Shelf’s goals and workflow, the UK Libraries Preservation Department, the UK Electronic Information Access Management Center (EIAMC) and KYVL successfully completed a pilot demonstration project (http://digilib.kyvl.org:8080/dynaweb/ebind/kukebooks/) comprised of a sampling (100 titles) of the CPMP microfilm targeted for digitization. During the pilot, we determined that second generation print masters produced superior results to third generation positive service copies. We also determined that pre-image capture inspection of the print master was unnecessary. After experimenting with manual and automatic modes on the microfilm scanner, we determined that an automatic image capture with post-image review was the most efficient method of production scanning. We worked with a variety of topics, titles and text presentations. Text that lacked illustrations proved easier, faster and cleaner to scan. This high production and high quality image capture could best help us meet the goal to create 950 digital volumes in two years. Additionally, we determined how to manage irregularities in the texts, e.g., misnumbered pages, property stamps that intrude on the text, uneven densities, etc. See Appendix Item #3 for Pilot Project Statistics. Finally, we worked with KYVL to develop the encoding workflow and the storage/service routines via the KYVL server and the Z39.50 interface. This collaborative effort demonstrated the feasibility of the plan of work. See Appendix Item #4 for Sample Printouts from the Pilot Project and Appendix #5 for the Beyond the Shelf Image Creation Procedures.
Key criteria for selection for digital projects involve current and anticipated use, significance of the material, copyright, and suitability of the technology to capture the significant detail of the source documents.
When considering the significance of the material, UK Curators relied heavily on Coleman’s comprehensive coverage. Along with the heavily used state and county histories, one finds historical overviews of Kentucky’s agrarian economy, the coal mining and railroad industries, the thoroughbred breeding and racing economy, and the welfare and education movements. Materials selected provide a complete representation of the state’s history.
In many cases, the materials were scarce, rare and/or embrittled. Heavy use of some materials could be demonstrated simply by the wear and tear on the artifact. Other materials survived in much better condition because they were not cataloged, and thus not available outside the UK Libraries. In other cases, the holdings were incomplete or scattered among several libraries. The CPMPs brought together the disparate holdings, located good condition copies, and procured missing issues to create a more researchable compilation for scholarly use. In all cases, the goals of the CPMPs were to preserve significant informational content about the history of the state on microfilm, to stabilize the artifact for specialized usage, and to make the service copies of the film widely available at the William T. Young Library (a 24-hr facility) or through the UK Libraries’ extensive Interlibrary Loan and Document Delivery consortia.
Our target collection clearly possesses significance and user interest. To ensure that we are in compliance with current copyright practices, we will select titles that have already passed out of copyright. The pre-1924 titles will permit us to work with a critical mass of material that will not require copyright research. This will allow us to meet our project goal to digitize a significant body of material, 950 volumes, within the proposed project timeline.
Another key criterion is that of suitability of the technology to capture the significant detail of the artifact or source document. We will select titles that emphasize text rather than illustration. The technology that we will employ succeeds highly in capturing text in bitonal or grayscale images. Black and white line art is also well represented, but half tones, lithographs, photographs, and color do not translate particularly well. Therefore, we will choose titles whose informational content is not dependent on the illustrative materials, if indeed, they are present. Our goal is to faithfully render the text so that it is readable via the Internet on the average user’s PC monitor and so that it prints out legibly on a printer. We aim to capture all text: front matter, main text, indices, footnotes, captions, labels on maps, headers and footers. If illustrative material is included, we will capture the image at several settings to bring out the tonal aspects; however, we understand the shortcomings of bitonal rendering of certain illustrative types, so we will not choose to feature these highly illustrated materials in this project. We anticipate that 950 titles out of the target collection of 1,500 will render faithfully from film format to digital.
By embracing these criteria and by asserting the suitability of the hybrid approach, the collaborators in Beyond the Shelf will clearly demonstrate the capacity to build a digital collection of considerable value, one that serves a varied constituency and that merits long-term maintenance.
Adaptability: Hybrid Strategy
The authors of Digital Imaging and Preservation Microfilm: the Future of the Hybrid Approach for the Preservation of Brittle Books point out that microfilm remains the preferred preservation choice for embrittled published material (Chapman, Conway, Kenney, 1999). At the same time, the best access option is clearly found in the digital arena. Instead of choosing between the microfilm and digital options, using both in a hybrid model allows for minimal risk in terms of effective long-term preservation of the resources along with improved access to digital versions of the microfilm frames. In the Fall 2001, OCLC created the “Digital and Preservation Resource Centers,” and the “Digital Co-op”. Both of these services are based on the fundamental ideology of the hybrid approach. Digital access saves originals from unnecessary deterioration via repeated use. Relying on high quality 35mm microfilm as the preservation master and scanning books from existing from film assures that the digital conversion process has no negative impact on imperiled source documents.
In short, Beyond the Shelf provides an opportunity to achieve the following preservation and access objectives by employing economies of scale and reduction of multiple handlings of source documents:
· Fragile source documents remain protected.
· Completeness is ensured before scanning.
· High quality page images are derived from high-resolution, high-contrast film.
· Microfilm master negatives remain in secure cold storage.
· Print master is utilized to create a digital access master.
· Print master returns to vault.
· Migration plan is developed for images, text and metadata.
· Back up CD Rs for all images, text and metadata are created.
· Web accessible pages are served equitably through the KYVL using an easy-to-navigate gateway.
· Access is free.
· Training and searching tips are provided on the website.
· Printouts are legible.
· Screen images are legible.
· Keyword searchability enables full-text access.
· Frames and non-frames interface are options.
Beyond the Shelf depends on the collaboration of five well-established entities:
· KYVL’s Kentuckiana Digital Library
· UK Libraries Electronic Information Access and Management Center (EIAMC)
· UK Libraries Preservation Department
· SOLINET/ASERL CPMPs 2-5
· Kentucky Association of Teachers of History (KATH)
Over the last three years, the UK Libraries EIAMC has worked with KYVL to establish a state-of-the-art digital conversion laboratory. By working with several national consultants, UK and KYVL developed guidelines for digital library production (Appendix Item #6) and created a digital library systems infrastructure. The lab now serves as the central conversion facility for the creation of digital collections served by KYVL. This partnership has resulted in the Kentuckiana Digital Library. This digital library provides enhanced access to special collections material, some extraordinarily fragile, housed throughout the state. The lab includes 4 Pentium III scanning workstations, a PhaseOne planetary digital camera with a MAC workstation, a MEKEL 525s microfilm scanner, and fireproof storage cabinets for work in progress. To manage the lab, the University of Kentucky has one full-time librarian. Through partnership with KYVL, two part-time students and a full-time scanning/encoding technician are funded. Currently, the digital library represents 15 Kentucky repositories and offers online access to over 3,500 archival finding aids and 15,000 digital images.
Two other players in this collaboration include the UK Libraries Preservation Department, particularly the Reprographics Unit, and the SOLINET/ASERL CPMPs. As described earlier, UK has participated in the CPMPs since 1992. Now, the results of this collaboration will jumpstart Beyond the Shelf.
By including the UK Libraries Preservation Department in the development and management of Beyond the Shelf, the project aligns itself with the philosophy of creating digital collections of value that are worth preserving. The Project Co-Managers will strive for highest quality that meets the functional requirements of the target audience. They will develop and implement a long-term management strategy for the endurance of this digital collection. A migration plan for Beyond the Shelf will feature as one of the project’s end products and will serve as model for other KYVL digital collections. Additional information about the history of the Preservation Department, particularly of the Reprographics Unit that will oversee the image capture on a second MEKEL 525s, can be found in the ORGANIZATIONAL PROFILES.
In the final analysis, University of Kentucky and KYVL structured Beyond the Shelf on prior and current collaborations. The University agreed to provide the print masters, the scanning environment, and the project oversight, while the KYVL agreed to provide the server space, some salary cost share, and the interface for the web-based deliverables. Beyond the Shelf clearly provides an opportunity for several groups to combine their technical expertise in the creation of a digital collection of enduring value.
Design: Plan of Work
During the CPMPs, the RLG Guidelines and the ANSI/AIIM standards guided the production, inspection and storage of the microfilm. By the completion of the project, the film had been inspected for technical quality (resolution, density, skew, etc.) and for bibliographic integrity by the filmer, by SOLINET and by UK Libraries. To date, the print masters have not been heavily used for film duplication or for digital conversion. During pilot testing, pre-image capture inspection was determined to be unnecessary. Physical defects were not found before scanning. Any minor defects in the film, such as fingerprints, can be managed with the scanning software.
Cataloging of the digital titles will occur in two phases. Drafts of the MARC bibliographic records for the digital titles will be constructed using the MARC bibliographic record for the microfilm version of the title. Cataloging will progress concurrently with the scanning process. The draft records will be saved until the administrative metadata is established during the scanning and encoding process. Before the digital titles roll over to the KYVL “live” database, the final cataloging will occur. This includes final editing of the bibliographic record, submission to OCLC and introduction to the UK Libraries Voyager database. A Cataloging/Encoding Technician using a pre-determined template to create the draft records can complete the first phase of the cataloging as an “assembly line” process. The Special Projects Cataloger will design the template and the process. The second phase will be closely coordinated between the Special Projects Cataloger and the Co-Project Managers. The goal is to introduce, through the KYVL server, 50 titles per month beginning in Feb. 2003. Just before each batch goes “live,” the final cataloging will be completed.
High-resolution digital masters will be created during the project. Derivative files will include web-friendly GIF and PDF formats and an XML (eXtensible Markup Language) encoded transcription that facilitates full-text searching. Two Scanning Technicians will perform these operations.
Page images will be scanned from the microfilm at specified reduction, as true or interpolated 400 dpi bitonal (one-bit) images. All images will be uncompressed TIFF 5.0 format. The scanner will be set to number page scans sequentially starting with 0001. The scanner will be set to batch mode. Utilizing this setting, the scanner will automatically move from one microfilm frame to the next, identifying page edges, then cropping, aligning and saving the page images. Additionally, when set to duplex mode, the MEKEL scanner will capture a 2A orientation film frame and then separate the scanned image into two equal halves while increasing the image count and the file naming by two instead of one.
If microfilm frames are scanned at 2A orientation without duplex mode activated, the resulting page images will be separated. This strategy is used when narrow inner margins prevent the use of duplex mode. If the microfilm includes intentional second exposures, both frames will be scanned and then judged later for best instance. Scanning will include some of the targets from the microfilm. These include:
· the eye-legible title target
· the bibliographic record for the source document which also identifies the reduction ratio, filming orientation and the filming date.
· CPMP description and NEH grant award number
· Lists of Irregularities
Resulting TIFF images will be stored on CD Rs using the EasyCD Creator PC software program and following the Joliet extension of the ISO 9660 format standard. The Joliet extension allows CDs to be created using long filenames up to 64 characters, including spaces. Joliet is readable by PCs running Windows 95 or later operating systems and by Macintosh computers running the Joliet Volume Access extension. As images are captured, they will be viewed by a technician for legibility. Legibility tests will be a subjective assessment of readability of the smallest lower case “e” found in all text including the main content, front matter, indices, footnotes, captions and map labels. The legibility of the lower case “e” at 400 dpi TIFF must translate to a totally human eye-legible (can be read without magnification) image in the web presentation files and on paper printouts. At this step in the process, the Scanning Technician can validate the image integrity of the digital master by selecting a group of sample pages and creating screen size derivatives and paper printouts for eye-legibility analysis. A minimum of 10% of the images will be tested for eye-legibility analysis. The image filename, the text page number, and the results of the analysis will be recorded a Project Database. See Appendix Item #7 for Project Database Description.
After scanning, the print master will be removed from the MEKEL 525s and rewound on Neumade manual rewinders. This will obviate the potential damage to the print master during the MEKEL’s high speed rewind mechanism.
Quality Control and Document Structuring
Texts will be encoded using an XML structural markup language adhering to the Digital Library Federation specifications outlined in TEI Text Encoding in Libraries: Guidelines for Best Encoding Practices. These guidelines outline five increasingly granular levels of encoding. At higher levels of encoding, more content analysis and human intervention is required. The first three levels establish encoding without content analysis. These levels are established so that libraries can match encoding guidelines to specific project goals.
One of our goals is to build a sustainable, high-production workflow for producing electronic texts utilizing the microfilm to digital page image approach. Our goal is to establish an approach that will be manageable with reduced staff after the conclusion of the project, and therefore must include processes of automation wherever possible. At the same time, we want to use a non-proprietary encoding format that is both extensible and interoperable. The encoding format must facilitate full-text searching capabilities as well as basic document hierarchy structure for table of contents presentation. As specified in TEI Text Encoding in Libraries: Guidelines, the Minimal Encoding (Level 2) approach is utilized “to create electronic text for keyword searching, linking to page images, and identifying simple structural hierarchy to improve navigation,” (Friedland, Kushigian, Powell, Seaman, Smith, Willet, 1999) for projects with the following characteristics:
· A large volume of material is to be made available online quickly.
· A digital image of each page is desired
· The material is of interest to a large community of users who wish to read texts that allow keyword searching
· Rudimentary search and display capabilities based on the large structures of the text are desired
· Each text will be checked to ensure that divisions and headers are properly identified
· Extensibility is desired; that is, one desires to keep open the option for a higher level of tagging to be added at a later date
The Ebind encoding scheme, developed at UC Berkeley and based on TEI (Text Encoding Initiative) will be used as the markup language. Although Ebind is similar to and contains the same full header element as TEI, there are two fundamental differences between Ebind and TEI. One difference is that “Ebind privileges the physical structure of a document while TEI privileges the intellectual structure” (Pitti, 1996). Another difference is that “Ebind is simpler to use” and is therefore “more appropriate for use in a high-volume production environment” (Pitti, 1996). Ebind allows for the presentation of page images accompanied by underlying text for full-text searching and compliance with the TEI Text Encoding in Libraries (Minimal Encoding). Using Ebind, which conforms to the XML standard, permits us to meet our project goal to present a good representation of the page image along with a dynamic, extensible, text augmentation for searching, both supported by a navigable structural hierarchy.
Several encoding schema were reviewed during the pilot project. New structural markup schema such as METS (Metadata Encoding & Transmission Standard) and MOAII (Making of America II) are currently being developed through national digital library efforts. We plan to monitor these developments very closely, as our use of a non-proprietary XML encoding format assures that migration to these formats will be possible once they have been fully developed.
Skeletal texts will first be automatically created from saved drafts of the MARC records for the digital titles. These texts will have tags indicating the start and end of the document instance as well as a completed TEI header. See Appendix Item #8 for Sample XML Encoded Document Including Header Element. Individual pages will be automatically given paragraph tags and end of page markers (ASCII character 12) after the optical character recognition process has been completed. The individual OCR pages will then be merged into one file using a Perl script. The merged files will then be joined with their skeletal text counterparts, producing a complete document instance for each book.
After validation by an XML parser (James
Clark’s SP software) to ensure correct encoding structure, each XML document will
be indexed on a test server. The XML
document will be viewed through the computer interface, checked for correct
page image and text pairing, and given correct pagination references, document
divisions, and division headings. These
additions will be added to the markup by the Cataloging/Encoding Technician who
will then use the SP software to check for correct encoding one last time upon
completion of the markup.
At this point, each individual page will be reviewed for accuracy and numbered sequentially starting with ‘0001’. If the volume, filmed in 2A orientation, was scanned without the duplex mode, the digital page images will be separated and collated. Any blank pages will be replaced with a blank page target. Batch software will be utilized to perform image cleanup including crop, de-speckle, align, de-blob, and de-skew. Intentional second exposures are evaluated for best quality and the duplicate image is deleted. The page images are then burned onto a CD Rs and are then used to perform optical character recognition for generation of searchable text. To assure a high level of accuracy in the recognition process, PrimeRecognition OCR software utilizing 3 recognition engines and voting technology will be used.
The CD Rs holding the scanned page images will be used to create a number of derivative access files including a 1-bit PDF version for printing, an 8-bit, 640 pixel wide GIF for screen presentation, an 8-bit, 150 pixel wide GIF for thumbnail access, and an 8-bit 1000 pixel wide GIF for high-resolution access. These derivative images will be produced with Adobe Photoshop’s batch macro function. A script for converting to each derivative format will be created and run on each group of page images. The resulting images will be stored on the KYVL digital image server as well as on CD Rs for backup.
During the image conversion process, some titles on the print masters will not meet the lower case “e” quality index. These titles will be recorded in the Database for evaluation at a later date. The Co- Project Managers will assess the ultimate readability of the film, determine if re-filming is necessary, and establish if the title is critical for inclusion in Beyond the Shelf. If the title is critical to the project, scanning from the source document or re-filming may be considered. These efforts will be auxiliary to the project and are not reflected as cost share or requested funds. However, we will report the number of titles rejected as a value-added end product of this project as it further identifies for us a body of titles for which another access medium or process may be necessary.
Management Plan and Personnel
Paul Willis, Director of Libraries at the University of Kentucky will serve as the Project Administrator. He will oversee the budgets, convene quarterly meetings of the grant staff and assist with the promotion of the project to the public. Mr. Willis will devote 3% of his time to the project.
Becky Ryder, Head of Preservation Services at UK Libraries, will serve as Co-Project Manager and devote 10% of her time to Beyond the Shelf. Ms. Ryder will oversee the image capture and quality control using the MEKEL scanner in the UK Libraries Reprographics Unit. She will assist with the evaluation of all end products: images, encoded text, MARC records and Project documentation. Ms. Ryder has served as Project Manager for the CPMP projects at the University of Kentucky since 1992. Ms. Ryder has participated as faculty for NEDCC’s workshop series, Preservation in a Digital World: To Film or To Scan. She is a member of SOLINET’s Preservation Advisory Council, ProQuest/University Microfilm, Inc. Advisory Board, and the Kentucky State Historical Records Advisory Board. Ms. Ryder oversees the work of the Reprographics Unit which annually microfilms approximately 175 KY newspapers and books, journals, manuscripts, and scrapbooks from the Libraries’ collections.
Eric Weig, Digital Initiatives Librarian at the University of Kentucky and Director of the KYVL Kentuckiana Digital Library, will serve as Co-Project Manager and devote 25% of his time to the project. Mr. Weig will manage day-to-day project operations involving digital conversion via one microfilm scanner, as well as image quality control, OCR, text encoding, and full-text interface design and refinement. Mr. Weig manages the digital lab at the Libraries and also manages the KYVL Kentuckiana Digital Library Project. As a part of the Kentuckiana Digital Library Project, Mr. Weig has recently completed management of a state-wide cooperative EAD project resulting in a union database of over 3,500 SGML finding aids representing 15 Kentucky repositories. He will also handle the Dynaweb XML/SGML server administration and Perl programming for the full-text interface.
Beth Kraemer, Lead Systems and Web Design Librarian for UK Libraries, will devote 5% of her time to the project. Ms. Kraemer has extensive knowledge and experience with database and web interface design. She will be called upon to assist with interface design as well as establishing Z39.50 access to the Beyond the Shelf collection via the KYVL gateway.
Nancy Lewis, Special Projects Cataloger, will contribute 5% of her time to set up and oversee a special cataloging project to create the TEI header compatible MARC bibliographic records for the digital versions. Ms. Lewis will coordinate the final cataloging shipment of the records to OCLC and to the Libraries’ INFOKAT databases, and she will oversee a .5FTE staff assistant (to be hired) who will complete most of the cataloging records and will perform the quality control and editing of the encoded texts.
Margie Plarr, Encoding Technician for UK Libraries, will devote 15% of her time to performing the XML encoding and will oversee a .5FTE student worker to perform manual markup.
Two Scanning Technicians (to be hired on grant funding) 100% scanning and quality control for the first 20 months of the project. They will assist with the problem solving, manual markup, and encoding editing during the final four months of the grant.
Budget & Contributions
The Budget is presented in detail on the accompanying sheets. Contributions from the KYVL, UK Libraries and IMLS are described in the Budget Justification.
This project will create world-wide access to 950 scarce works of Kentucky history and literature. Project statistics concerning the number of volumes and page images scanned, OCR’d and encoded will be tracked in a FileMaker Pro Project Database. The Project Database currently includes the basic bibliographic information for each title in the target collection. Quality control data, time spent on the conversion and encoding processes, and notes about irregularities and problems will be recorded. The data will provide information about costs and labor for reports. A key observation will be the comparison between estimated costs and actual costs.
Usage statistics of Beyond the Shelf will be tracked by KYVL. An online survey will gather feedback from users in many educational environments and libraries. Feedback from Kentucky history teachers will determine if the availability and quality of Beyond the Shelf has improved students’ ability to perform research and to think critically about historical documentation. To track these desired outcomes, KATH will assist with focus groups to be held at the their annual conferences. Critical questions will be asked to determine the effectiveness of digital collections and student performance.
Dissemination: Delivery Systems
Access will be provided through the Kentucky Virtual Library Z39.50 gateway (http://www.kyvl.org), through the Kentuckiana Digital Library Project site (http://www.kyvl.org/kentuckiana/digilibcoll/digilibcoll.shtml), and through the University of Kentucky Online Web Catalog (http://infokat.uky.edu/).
Goals of Delivery System
Access to bibliographic records for the digital titles will be accessed through Endeavor Voyager (INFOKAT) at UK or OCLC SiteSearch Suite for KYVL. In SiteSearch, searches can be limited to electronic texts only and access points are Dublin Core Elements. In INFOKAT, access points include keyword, author, title, and subject. For full-text searching and page image presentation to users, Enigma’s Dynaweb 4.3 XML/SGML Publishing System will be used with added stylesheets and custom Perl interface for electronic text navigation and printing.
Dissemination: Web Content Accessibility
Best efforts will be made to refine our existing electronic text interface to meet all applicable priority 1 and priority 2 guidelines outlined in the Web Content Accessibility Guidelines issued by the World Wide Web Consortium (W3C). This publication identifies 14 major guidelines for web content accessibility. Under each guideline, checkpoints are identified, along with specific techniques for establishing conformance. See Appendix Item #10 Web Accessibility Chart.
This collection of Kentuckiana books will be publicly accessible to anyone with access to the World Wide Web. The completed digital archive mounted on the web by the KYVL’s Z39.50 gateway will offer a user-friendly interface for database searching of records. Full-text searching will be offered for the archive, using the Dynaweb SGML/XML search engine and custom Perl scripts. This interface will allow for searching of all the digital texts at once or individually, showing hit results by individual titles and individual pages within the text.
In addition to the focus groups to be conducted at the KATH annual meetings, we also plan to demonstrate the project and discuss its design. We plan to solicit ideas and opinions and ideas for future projects from Kentucky’s history teachers. We plan to present the project at regional and national conferences, and we plan to write an article about our experience with the microfilm to digital hybrid approach. In particular, we hope to present this project as a model for other NEH-funded SOLINET/ASERL CPMP participants to follow. This can be accomplished at the SOLINET Preservation Advisory Council semi-annual meetings and at the SOLINET/ASERL Annual Meeting.
Sustainability: Long Term Maintenance
The fact that the project will be using XML will have a major impact on the long-term sustainability of the digital collection. This is mainly due to XML’s non-proprietary encoding standard and ASCII text file format.
The project is built on a master file concept. The first digital generation will be a high quality, bit-rich 400 dpi bitonal, uncompressed TIFF image. The digital access master will support future migration. Meanwhile, the master negative and the print master microfilms are retained in optimal conditions to serve as the ultimate preservation master should the digital access master suffer any form of unexpected corruption. The hybrid approach provides several avenues for preservation success.
We estimate that between 150,000 and 200,000 page images will be created through the life cycle of this project. Keeping these access master and derivative files viable and useful for as long as possible will be essential. Toward this end, these high-quality images will be saved onto KODAK CD-R Gold Ultima media, which “has a projected lifetime of 100 years or more” (Kodak, 2000). Since each 400 dpi bitonal image is roughly 800K, and each XML encoded page roughly 1K, the project will require approximately 153 Gigabytes (500-600 blank CD-R media) of storage space for two copies of the access master image files and XML encoded texts. Each copy of the access masters and XML encoded texts will be saved on two separate CD-R media. One of the CD-Rs will be placed in off-site temperature controlled storage while the other will be saved in temperature controlled storage onsite. The web deliverable derivative images will also be backed up to CD-R media and stored onsite
This is a cooperative project between the University of Kentucky and the Kentucky Virtual Library. A major goal is the establishment of enhanced digital access to out-of-copyright, published Kentuckiana. Both the University of Kentucky and the Kentucky Virtual Library are committed to maintaining these resources once they are created and made available to the public. This project is the first phase of an ongoing conversion activity involving the production of digital images from preservation microfilm. Considering this, as a part of Beyond the Shelf, we plan to focus research efforts to develop a detailed and sustainable migration plan that will include quality control processes and will follow a 5-year schedule to move data while performing an inspection of the access masters every 2-years (Puglia, 1999). The completion of this research will allow us to enact a long-term maintenance program for securing the long-term integrity and authenticity of our digital access masters. The Final Report to IMLS will include a copy of our migration plan.
The University of Kentucky and the Kentucky Virtual request matching funds from IMLS to establish a digital Kentuckiana collection of significant content, scope and quality. The preceding Narrative describes the goals and plan of work. The following appendices, budget forms, Curriculum Vitae, and examples augment the project’s goals and structure.