Digitizing of Historical Texts and the possible role of Optical Character Recognition in that process

We had two assigned readings for class on Tuesday. These readings were “How does OCR document scanning work?” by Chris Woodford and “Is Digitizing Historical Texts a Bad Idea?” by Mills. First I would like to discuss the article by Woodford. Woodford gets into OCR or Optical Character Recognition. Optical Character Recognition in simple terms is a program that computers use to view printed text in an easier format. It allows for your computer to recognize printed text. There are two basic ways for OCR to work. The options are Pattern Recognition or Feature Detection. Pattern recognition works to see the pattern of letters and feature detection looks for things such as strokes of handwriting when writing the letter. OCR is a very useful tool. Using it you can almost always at least get a decent guess at what is written. Even if someone with extremely sloppy handwriting is writing a message OCR may be able to decode it using pattern recognition and feature recognition. It simplifies this process even further if you have some background information on what the person may have been trying to say. OCR was invented earlier than I would have thought for sure. It’s roots date all the way back to 1928.

How Does OCR Scanning Work? by Chris Woodford

Postal Worker controlling OCR letter scanner.

In our next reading by Mills the topic is digitizing of historical texts. His main point seems to be that if we digitize texts they lose some of the connection in general. He spoke about how he was showing some students an old book and while they were not excited to start with they became intrigued by the book and by the end enjoyed the whole experience. Would this have happened if he had shown them a digitized version of the same book? While the answer may have been no, there is also an argument on the other side of the spectrum. There are a lot of advantages do digitizing texts. The main advantage is the audience you can reach with digitized text expands greatly. Mills and his colleagues debate over this topic for what he describes as a few weeks and he himself is still unsure while writing this article which way he is siding with. I believe the pros outweigh the cons in digitizing historical texts. In my opinion, digitizing historical texts is beneficial because it opens the text up to a way larger range of people and it becomes easier to access and sometimes even to read. The only argument, although it is a big one, to not digitize historical texts is that it takes away from the physical aspect and the historical connection that the text contains. This is a good point, but I sway towards digitizing these texts.

Is Digitizing Historical Texts a Bad Idea? by Mills

Codex from the book that Mills showed his students.

These two articles are very much intertwined. The OCR article speaks about computers being able to detect handwriting and digitize it and the second article speaks about the debate over digitizing historical texts. If you side with me on the discussion about digitizing historical texts than Optical Character Recognition seems to be a great tool that may be able to be used in the process of digitizing historical texts. In our homework for this week we have to transcribe part of an old Albany census. There are certainly going to be some names that we are not sure about when we are reading the census. An OCR machine may be able to pick up on patterns in writing that we are not able to and digitize this text for us, but it also may not. Also our homework is another good example of why digitizing historical texts is necessary. Us transcribing the census into print makes for a way better resource than the original because it will be easier to get the content even if it takes away from the historical connection.

Questions I would like to pose:

1: Do you think that Optical Character Recognition is reliable and a good source for transcribing?

2: The obvious question of this write-up, do you think it is a good idea to digitize historical texts?

3: Do you think that Optical Character Recognition could play a part in the digitizing of historical texts?