That was what Dorothy and her traveling companions were told when they arrived at Oz. I was telling someone yesterday about a project we are working on and it occurred to me that I was actually doing the opposite of the Wizard of Oz. I was trying to pull back the curtain and explain some of the behind the scenes work that archivists do to make digital projects happen. I was also trying to demonstrate the fragility of digital records, something else many people don’t recognize.
Only seven years ago (2010) a library team completed our first digital project to create a collection of digitized postcards of Kentucky, part of the Gilliam Family Collection. A project like this is much more than simply pressing the button on a scanner but unless you’ve participated in one, you don’t know the behind the scenes work it took to produce the wizardry of a digital collection. It is much more than a wave of the magic wand. A description standard (Dublin Core) was agreed upon including where the information would come from for each of the fields. Since these are archival materials, they don’t have a call number like published books. A field called “Is Part of” was used to connect the postcards back to the archival collection of which they are a part. There are actually many more postcards of other parts of the US in the Gilliam Family Collection. Our Intellectual Property Librarian, John Schlipp, researched the copyright of the postcard images. The nine for which he identified a publisher have an entry in that field. And of course, the postcards were scanned. All the components were brought together with a custom built search box and put on the library’s website for the public to use.
Jump forward seven years to 2017. The university’s website is being redesigned which includes the library and our digital collection. Like many small archives, we lack the manpower to recreate work that has already been done simply because the website look has changed. If we have to constantly do this, we can’t process other records for use. For this reason, the Gilliam Postcards are being migrated to the new digital repository the library is building. Here is where the fragility of digital information is revealed. Our department’s scanner was used to create the images, so we have all the images. Another department created all the Dublin Core information. Due to how the description was created and change in personnel, it appears that the only complete copy of the metadata for 200+ postcards is the individual pages on the website.
Latonia Race Track, near Cincinnati, Ohio
- Description: Postcard of a picture of the beautiful and popular Latonia Race Course. It is six miles from downtown Cincinnati, Ohio
- Subject: Gilliam Family
- Subject: Latonia (Covington, Ky.)
- Subject: Racetracks (Horse racing)–Kentucky
- Format: jpeg
- Source: Digital copy of postcard: Latonia Race Track, near Cincinnati, Ohio
- Language: English
- Rights: This postcard is in the public domain, so this image may be freely used. Please cite as follows: Gilliam Collection, Eva G. Farris Special Collections, W. Frank Steely Library, and Northern Kentucky University.
- Publisher: Wittenborg Toy Co.
- Resource Identifier: gilliam_ky_latonia_01
- Provenance: Donated by Dr. Katherine Kurk in 1991 to the Eva G. Farris Special Collections, W. Frank Steely Library, Northern Kentucky University.
- Type: Image, postcard
- Is part of – Gilliam Collection
- Is part of – Kentucky Postcards Series
- Is part of – Latonia File
So, what to do in the most efficient and error free manner? A staff member copied and pasted the title, image and metadata for all 200+ postcards into a single word document to capture a complete backup copy of the project. Next he created a second document with just the metadata (the field labels and filed information). Now the metadata has to be “cleaned up” and formatted to the new format required for the digital repository. Thankfully both projects use Dublin Core, although the specific DC fields used for one are not identical to the other, otherwise this would be more difficult. The final product is a spreadsheet with the labels in a horizontal row across the top. The field data for each postcard occupies a horizontal row below the appropriate field label. To reach that stage, the field label and its data are converted into comma separated values (CSV), labels being one value and data being a separate value. This means place the curser just to the right of the colon and hit the tab button once for every field on every postcard. The word “relation” is inserted in front of every “Is Part of” label. The CSV is done in the word document, all the metadata is copied and pasted into the spreadsheet.
Here is the first spreadsheet illustrating the cleaned up metadata for the first postcard. The number in column A is to track the postcards to not skip any, but won’t go into the final product.
After a little more cleanup, the metadata is copied again and transposed into a new spreadsheet converting the information from a column to a row. After the first postcard, only the field data needs to be copied into the second final spreadsheet.
This is a time consuming process to migrate from one software to another, but the intention is to reduce the need for future migrations to get the greatest benefit from the time invested. Hopefully I have also pulled back the curtain to show how complex a digital project can be, the value of using standard (Dublin Core), and how a lot of work can quickly be lost if time is not invested in planning upfront to preserve a project like this.
See our January 7, 2014 blog post “Our ten minutes of NEH fame” for more about the postcard collecting craze in Steve Moyer’s article.