Difference between revisions of "Back Issue Digitizing Study Group"

From Augustan Society Staff Wiki
Jump to: navigation, search
(add copyright notice)
(revision of process)
Line 10: Line 10:
 
<li>The next step is to process the PDF file through OCR (Optical Character Recognition) software. This can be done from the same interface that is used for the scanner. The resulting file shall be given the same name, changing the PDF suffix to RTF.</li>
 
<li>The next step is to process the PDF file through OCR (Optical Character Recognition) software. This can be done from the same interface that is used for the scanner. The resulting file shall be given the same name, changing the PDF suffix to RTF.</li>
  
<li>These two files are then uploaded to the [[AugustanConvert YahooGroup]]. If any file is larger than 5M, it must be broken up into pieces not larger than that limit.
+
<li>These two files are then given to this Study Group in such form and manner seems best.
  
Note: One of the folders used to store these files is named "Urgent", the others being named after the magazine titles. At present, the Urgent folder is used for magazines containing articles by Arthur Germond, as it is planned to combine these into a [[Germond|book]].</li>
+
Note: When the original is in poor condition, it is reasonable to flag the issue as "Urgent", and the Study Group should prioritize work on such issues. A cause of only slightly lower priority is "Books" which is used for issues containing articles due for inclusion in a book broject, such as the articles by Arthur Germond.</li>
  
<li>Members of the Back Issue Digitizing Study Group then select the files they wish to work on. Starting with the RTF file, they compare it to the PDF images and make such edits as may be needed, producing a file that is (preferably) in ODT format. If this isn't possible, DOC format will do. At this point, no effort to reformat the text is done other than to strip odd type styles, indents, frames, and spacing, though this isn't strictly required. It will generally be found easier to convert multiple columns into one, and articles that skip to distant pages may be combined. When work is complete, the ODT (or DOC) file us uploaded back to the AugustanConvert YahooGroup.</li>
+
<li>Members of the Back Issue Digitizing Study Group then select the files they wish to work on. Starting with the RTF file, they compare it to the PDF images and make such edits as may be needed, producing a file that is (preferably) in ODT format. (If this isn't possible, other plain or improved text format will do.) No effort should be expended on images, as they will be inserted later. They may be left as placeholders. No effort to reformat the text is required, though stripping odd page sizes, type styles, indents, frames, and spacing is helpful and easy to do at this stage. Multiple columns shoul dbe converted into one; tables should generally be rebuilt or left untouched. When this work is complete, the ODT (or text) file is returned to the Coordinator for final formatting and insertion of improved images.</li>
  
<li>The Editor or Associate Editor assigned to work on the project will then remove all three files for an issue from the AugustanConvert YahooGroup, and begin work on the ODT file, first converting from DOC format if needed. It is at this stage that formatting is imposed. Where possible, a style close to that used for ''The [[Augustan Omnibus]]'' is used, though it is often necessary to reduce the font size so that articles will begin on the same page as the original.</li>
+
<li>The editor assigned to work on the project will then and begin work on the ODT file, first converting from DOC or other text format if needed. It is at this stage that formatting is imposed. Where possible, a style close to that used for ''The [[Augustan Omnibus]]'' is used, though it is often necessary to change the font size so that articles will begin on the same page as the original. (This to preserve any citations or references as much as possible.)</li>
  
 
<li>Some things will need to be redacted in these copies. Obsolete addresses and prices are the primary targets, and Society addresses need to have the current address inserted nearby. No effort to add current prices should be made. Care need be taken with the copyright notice if any. Issues copyrighted by a predecessor of the Society (like the [[Octavian Society]] or the [[Hartwell Company]] need to have a new notice added with the current date. A few issues claimed copyright belonged to the individual authors; these will take substantial legal effort before we can reprint them. In all cases, a notice that it is the "Second Printing" is required, along with the year.</li>
 
<li>Some things will need to be redacted in these copies. Obsolete addresses and prices are the primary targets, and Society addresses need to have the current address inserted nearby. No effort to add current prices should be made. Care need be taken with the copyright notice if any. Issues copyrighted by a predecessor of the Society (like the [[Octavian Society]] or the [[Hartwell Company]] need to have a new notice added with the current date. A few issues claimed copyright belonged to the individual authors; these will take substantial legal effort before we can reprint them. In all cases, a notice that it is the "Second Printing" is required, along with the year.</li>
  
<li>Photographs in the ODT file are generally replaced, as the OCR process is unkind to images. This may be done by pulling the same image from the PDF file, or a replacement image may be used. As many images are public domain, it is often possible to replace a black & white photo with a color photo. Depictions of arms should be colorized, provided a blazon can be found without excess effort. In the process of replacing the images, it may be found helpful to reformat some pages or resize some images. This is left to the discretion of the Editor, with the understanding that the goal is to recreate the original, particularly in regard to the page things originally appear on. This is important as it permits indexes to be built that apply equally to the original and this reproduction.</li>
+
<li>Photographs in the ODT file are generally replaced, as the OCR process is unkind to images. This may be done by pulling the same image from the PDF file, or a replacement image may be used. As many images are public domain, it is often possible to replace a black & white photo with a color photo. Depictions of arms should be colorized, provided a blazon can be found without excess effort. In the process of replacing the images, it may be found helpful to reformat some pages or resize some images. This is left to the discretion of the editor, with the understanding that the goal is to recreate the original, particularly in regard to the page things originally appear on. This is important as it permits indexes to be built that apply equally to the original and this reproduction.</li>
  
 
<li>The original PDF file and the corrected ODT file (which may also be converted to PDF for this purpose) are then sent to a [[Proofreader]] to confirm that errors weren't introduced.</li>
 
<li>The original PDF file and the corrected ODT file (which may also be converted to PDF for this purpose) are then sent to a [[Proofreader]] to confirm that errors weren't introduced.</li>
Line 27: Line 27:
 
</ol>
 
</ol>
  
It must be admitted that only a few small issues have gone through this entire process, and some of those skipped the scanner and were transcribed by hand. It may very well prove that changes to the above procedure will be needed.
+
Care needs to be taken by the Coordinator that the same issues aren't taken up by different members or editors to avoid duplication of work. This will become increasingly important as the size of this Study Group grows.
  
One obvious change would be for the Coordinator to assign magazines for work, or at least to track them, so that two volunteers don't spend time working on the same issue. One expects that to be addressed as soon as a second transcriber joins the Study Group.
+
There have been proposals to convert the back issues to HTML for posting on the web site. This Study Group will not undertake that task, but will support as possible another Study Group chartered for that purpose.
  
 
----
 
----

Revision as of 12:59, 24 February 2022

As part of the mission of The Augustan Society, Inc., to provide educational offerings, the Society sells back issues of our publications. When stock is on hand, we ship that stock, but other means must be employed to reproduce issues that are out of stock. Thus there has been created the Back Issue Digitizing Study Group. It is headed by a Coordinator who is named by and serves at the pleasure of the Dean of Studies.

The process of digitizing involves a number of distinct steps:

  1. The first step is to obtain a decent copy of the original. (As some of the issues are over fifty years old, the best available may not be very good, and in some cases they will be photocopies.) Assuming a better copy is available for the files, magazines are taken apart into individual sheets, letter-size or smaller, and these passed through the sheet-feed scanner. When only one original exists, the staples should be removed and the individual pages passed through the scanner using the transparent carrier guide. This will result in a PDF file which is to be named according to the prevailing standard. Note: That this PDF file can be printed out to fill orders before the rest of the process is complete. Given the slow rate of progress, this represents the majority of copies to date.
  2. The next step is to process the PDF file through OCR (Optical Character Recognition) software. This can be done from the same interface that is used for the scanner. The resulting file shall be given the same name, changing the PDF suffix to RTF.
  3. These two files are then given to this Study Group in such form and manner seems best. Note: When the original is in poor condition, it is reasonable to flag the issue as "Urgent", and the Study Group should prioritize work on such issues. A cause of only slightly lower priority is "Books" which is used for issues containing articles due for inclusion in a book broject, such as the articles by Arthur Germond.
  4. Members of the Back Issue Digitizing Study Group then select the files they wish to work on. Starting with the RTF file, they compare it to the PDF images and make such edits as may be needed, producing a file that is (preferably) in ODT format. (If this isn't possible, other plain or improved text format will do.) No effort should be expended on images, as they will be inserted later. They may be left as placeholders. No effort to reformat the text is required, though stripping odd page sizes, type styles, indents, frames, and spacing is helpful and easy to do at this stage. Multiple columns shoul dbe converted into one; tables should generally be rebuilt or left untouched. When this work is complete, the ODT (or text) file is returned to the Coordinator for final formatting and insertion of improved images.
  5. The editor assigned to work on the project will then and begin work on the ODT file, first converting from DOC or other text format if needed. It is at this stage that formatting is imposed. Where possible, a style close to that used for The Augustan Omnibus is used, though it is often necessary to change the font size so that articles will begin on the same page as the original. (This to preserve any citations or references as much as possible.)
  6. Some things will need to be redacted in these copies. Obsolete addresses and prices are the primary targets, and Society addresses need to have the current address inserted nearby. No effort to add current prices should be made. Care need be taken with the copyright notice if any. Issues copyrighted by a predecessor of the Society (like the Octavian Society or the Hartwell Company need to have a new notice added with the current date. A few issues claimed copyright belonged to the individual authors; these will take substantial legal effort before we can reprint them. In all cases, a notice that it is the "Second Printing" is required, along with the year.
  7. Photographs in the ODT file are generally replaced, as the OCR process is unkind to images. This may be done by pulling the same image from the PDF file, or a replacement image may be used. As many images are public domain, it is often possible to replace a black & white photo with a color photo. Depictions of arms should be colorized, provided a blazon can be found without excess effort. In the process of replacing the images, it may be found helpful to reformat some pages or resize some images. This is left to the discretion of the editor, with the understanding that the goal is to recreate the original, particularly in regard to the page things originally appear on. This is important as it permits indexes to be built that apply equally to the original and this reproduction.
  8. The original PDF file and the corrected ODT file (which may also be converted to PDF for this purpose) are then sent to a Proofreader to confirm that errors weren't introduced.
  9. When proofed and corrected, the file is placed with the other back issues and this new file is used to generate copies for sale or shelving.

Care needs to be taken by the Coordinator that the same issues aren't taken up by different members or editors to avoid duplication of work. This will become increasingly important as the size of this Study Group grows.

There have been proposals to convert the back issues to HTML for posting on the web site. This Study Group will not undertake that task, but will support as possible another Study Group chartered for that purpose.