Back Issue Digitizing Study Group

From Augustan Society Staff Wiki
Revision as of 17:11, 8 August 2016 by Bruce (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

As part of the mission of The Augustan Society, Inc., to provide educational offerings, the Society sells back issues of our publications. When stock is on hand, we ship that stock, but other means must be employed to reproduce issues that are out of stock. Thus there has been created the Back Issue Digitizing Study Group. It is headed by a Coordinator who is named by and serves at the pleasure of the Dean of Studies.

The process of digitizing involves a number of distinct steps:

  1. The first step is to obtain a decent copy of the original. (As some of the issues are over fifty years old, the best available may not be very good, and in some cases they will be photocopies.) Assuming a better copy is available for the files, magazines are taken apart into individual sheets, letter-size or smaller, and these passed through the sheet-feed scanner. When only one original exists, the staples should be removed and the individual pages passed through the scanner using the transparent carrier guide. This will result in a PDF file which is to be named according to the prevailing standard. Note: That this PDF file can be printed out to fill orders before the rest of the process is complete. Given the slow rate of progress, this represents the majority of copies to date.
  2. The next step is to process the PDF file through OCR (Optical Character Recognition) software. This can be done from the same interface that is used for the scanner. The resulting file shall be given the same name, changing the PDF suffix to RTF.
  3. These two files are then uploaded to the AugustanConvert YahooGroup. If any file is larger than 5M, it must be broken up into pieces not larger than that limit. Note: One of the folders used to store these files is named "Urgent", the others being named after the magazine titles. At present, the Urgent folder is used for magazines containing articles by Arthur Germond, as it is planned to combine these into a book.
  4. Members of the Back Issue Digitizing Study Group then select the files they wish to work on. Starting with the RTF file, they compare it to the PDF images and make such edits as may be needed, producing a file that is (preferably) in ODT format. If this isn't possible, DOC format will do. At this point, no effort to reformat the text is done other than to strip odd type styles, indents, frames, and spacing, though this isn't strictly required. It will generally be found easier to convert multiple columns into one, and articles that skip to distant pages may be combined. When work is complete, the ODT (or DOC) file us uploaded back to the AugustanConvert YahooGroup.
  5. The Editor or Associate Editor assigned to work on the project will then remove all three files for an issue from the AugustanConvert YahooGroup, and begin work on the ODT file, first converting from DOC format if needed. It is at this stage that formatting is imposed. Where possible, a style close to that used for The Augustan Omnibus is used, though it is often necessary to reduce the font size so that articles will begin on the same page as the original.
  6. Some things will need to be redacted in these copies. Obsolete addresses and prices are the primary targets, and Society addresses need to have the current address inserted nearby. No effort to add current prices should be made.
  7. Photographs in the ODT file are generally replaced, as the OCR process is unkind to images. This may be done by pulling the same image from the PDF file, or a replacement image may be used. As many images are public domain, it is often possible to replace a black & white photo with a color photo. Depictions of arms should be colorized, provided a blazon can be found without excess effort. In the process of replacing the images, it may be found helpful to reformat some pages or resize some images. This is left to the discretion of the Editor, with the understanding that the goal is to recreate the original, particularly in regard to the page things originally appear on. This is important as it permits indexes to be built that apply equally to the original and this reproduction.
  8. The original PDF file and the corrected ODT file (which may also be converted to PDF for this purpose) are then sent to a Proofreader to confirm that errors weren't introduced.
  9. When proofed and corrected, the file is placed with the other back issues and this new file is used to generate copies for sale or shelving.

It must be admitted that only a few small issues have gone through this entire process, and some of those skipped the scanner and were transcribed by hand. It may very well prove that changes to the above procedure will be needed.

One obvious change would be for the Coordinator to assign magazines for work, or at least to track them, so that two volunteers don't spend time working on the same issue. One expects that to be addressed as soon as a second transcriber joins the Study Group.