stacksimage

From Investigation to Implementation

Building a Program
for the Large-Scale
Digitization of Manuscripts

Metadata

The information in this section describes the metadata gathered and encoded for the Thomas E. Watson Papers Digital Collection. Although these workflows and guidelines have informed the decisions made through the course of the Extending the Reach project, they have not been explicitly adopted for the SHC's large-scale digitization program.

Metadata for materials in the Thomas E. Watson Papers Digital Collection was captured at multiple levels using a variety of XML-based schemas.

With the exception of the Encoded Archival Description (EAD) finding aid, metadata records for collection materials were created by graduate research assistants working on the project using spreadsheets. This data was then transformed into XML formats using XSLT.

Descriptive Metadata
Encoded Archival Description (EAD)

All materials in the Thomas E. Watson Papers Digital Collection are described in the aggregate through the collection's Encoded Archival Description (EAD) finding aid. This finding aid has been created according to the Southern Historical Collection at UNC-Chapel Hill's local implementation guidelines for EAD. For materials in all series except for Series 1. Correspondence and Series 8. Pictures, the sole point of discovery for digitized materials is through the finding aid. All digitized materials from the Thomas E. Watson Papers Digital Collection can be accessed by clicking on hyperlinked container titles.

For digitized materials from Series 1. Correspondence and Series 8. Pictures, we created item-level metadata records in addition to the description found in the finding aid. These records were encoded in TEI and MODS, respectively, and used for searching and browsing in the digital collection interface.

Text Encoding Initiative (TEI)

Descriptive metadata for the materials in Series 1. Correspondence was gathered for dual purposes. We gathered this data to attempt to identify copyright holders for the letters (see Copyright and the Thomas E. Watson Papers Digitization Project: A Case Study for a description of this process), and we also wanted to use this data to build a searchable, browsable index of the letters.

We chose the Text Encoding Initiative (TEI) header to encode the records because of the availability of a Correspondence Description module, which allowed us greater flexibility when manipulating our data. The following fields were captured and encoded using TEI:

Field LabelTEI Element(s)
Title<titleStmt><title>
Identifier<publicationStmt><idno>
Sender<sender><persname> or <corpname>
Recipient<recipient><persname> or <corpname>
Location<placeSender><location><settlement> or <region> or <country>
Date<dateSender><date>
Note<note>
Repository<msIdentifier><repository>
Language<msContents><textLang>
Material<support><material>
Extent<extent>
Letterhead<letterhead><p>

Download a detailed TEI project data dictionary.


Metadata for Object Description Schema (MODS)

Descriptive metadata for the materials in Series 8. Pictures was gathered and endcoded in the Metadata for Object Description Schema (MODS) to build a searchable, browsable index of the images.

The following fields were captured and encoded using MODS:

Field LabelMODS Element(s)
Identifier<identifier>
Resource Type<typeOfResource>
Genre<genre>
Title<titleInfo><title>
Publisher<originInfo><publisher>
Place of Publication<originInfo><place><placeTerm>
Date<originInfo><dateIssued>
Abstract<abstract>
Subject—Topical<subject><topic>
Subject—Name<subject><name>
Subject—Geographic<subject><hierarchicalGeographic>
Bibliographic Note<note>
Extent<extent>
Transcription<physicalDescription><note>
Repository<location><physicalLocation>

Download a detailed MODS project data dictionary.


Technical Metadata
Metadata for Images in XML (MIX)

Technical metadata captured during the digitization process was encoded using the Metadata for Images in XML schema:

Field LabelMIX Element(s)
Identifier<BasicDigitalObjectInformation><ObjectIdentifier><objectIdentifierType/><objectIdentifierValue/>
Format<BasicDigitalObjectInformation><FormatDesignation><formatName>
Compression<BasicDigitalObjectInformation><Compression><compressionScheme>
Image Width<BasicImageCharacteristics><BasicImageInformation><imageWidth>
Image Height<BasicImageCharacteristics><BasicImageInformation><imageHeight>
Color Space<BasicImageCharacteristics><PhotometricInterpretation><colorSpace>
Date Created<ImageCaptureMetadata><GeneralCaptureInformation><dateTimeCreated>
Creator Information<ImageCaptureMetadata><GeneralCaptureInformation><imageProducer>
Capture Device<ImageCaptureMetadata><GeneralCaptureInformation><captureDevice>
Hardware<ScannerCapture><ScannerModel><scannerModelName>
Optical Resolution<ScannerCapture><maximumOpticalResolution>
Software<ScannerCapture><ScannerModel><scanningSoftwareName>
Bit Depth<ImageAssessmentMetadata><ImageColorEncoding><bitsPerSample><bitsPerSampleValue/><bitsPerSampleUnit/>

Download a detailed MIX project data dictionary.


Tying It All Together: Metadata Encoding and Transmission Standard (METS)

The Metadata Encoding and Transmission Standard (METS) schema has been used to encode the metadata and digital files associated with this digital collection. METS can be used to bundle administrative, technical, and any other descriptive metadata associated with the project itself. METS can also be used to point to external metadata.

For materials in the digital collection which have item-level metadata (e.g., photographs and correspondence), METS records are created at the folder level, each file containing a descriptive metadata record (either using TEI or MODS) per item. For materials for which there is no descriptive metadata outside that which exists in the collection finding aid, METS records are created at the series level only.