A Kludge By Any Other Name

Human-computer interactions are funny. We don’t usually think alike, us and our computers, but we do manage to complement each other, most of the time. But a lot gets lost in the translation. This ground-shattering realization came to me as I appreciated the winding path by which I figured out a minor little detail.

For background: the father of a friend wrote a diary about his experience working on the Manhattan Project. The several hundred page diary, unfortunately, is only found in a text file on a 5.25″ floppy disc. My friend has a very old personal computer that can read it, but it’s from the pre-Internet, pre-USB era. I know that there are likely companies that will transfer such legacy files, and you could probably pay someone to retype it (which would require giving them the original computer and disc), but he was wondering if there was some easier way to transfer the text to a modern file, through some combination of transfers from 5.25″ floppy to Zip drive to who-knows-what. Let this be a lesson for those who think backwards compatibility is limited to upgrading from Word 2007 to Word 2010.

Since hardware isn’t my cup of tea, I thought about how I would do it with the tools I have. What quickly came to mind was my memory of sitting in front of a microfilm machine in 2004, taking digital photographs of a microfilmed book on interlibrary loan, one screen at a time. My thinking was that my friend could use the same process and take it a step further: take a digital photo of each screen of text (a pain I realize, but something we do all the time in the archives these days, unless we happen to research at the British Library), import them into OCR software (like ABBYY FineReader) and convert it all to text. Not the most elegant of techniques I realize, but that’s probably the route I would take if it was me.

(As a bizarre aside, this concept of multiple ways to ‘copy’ content from an un-networked computer also came up in an episode of the spy show Burn Notice a few seasons back. The protagonist happened to get his hands on some explosive data on an encrypted USB flash drive. To keep the plot going, the dialog emphasized several times that although they could clearly read the unencrypted content displayed on the computer screen, the flash drive’s encryption somehow prevented the data from being copied to another computer, thereby allowing the dramatic tension of who would physically possess the flash drive and its unique data. I kept yelling at the screen: “Just photograph each screen dammit! You’re a spy! You’ve used tiny cameras in previous episodes to steal info!” But they didn’t listen. They never listen.)

So in order to test out how well my process would work, I needed to do a test run for proof of concept. For whatever reason, I decided to try it out on one of the photos from my 2004 microfilm experiment – it didn’t hurt that I just recently purchased the new version of ABBYY FineReader, and was testing out its accuracy on various types of sources anyway. This in turn made me realize I needed to find one of those digital photos. My 2-terabit external hard drive is almost full with PDFs and images and documents, which gives you a sense of how many thousands of files I’d need to sift through. But I did remember one of the works I photographed back in 2004. Unfortunately, I couldn’t remember the title or the author. You know, it was that multi-volume book that listed all sorts of letters from French commanders early in Louis XIV’s reign. In the Preface the editor even talked about how these letters were models of elegant French prose. You know the one, right? So how can I remember the title or author? The title was a pretty generic French title, possibly with the word “lettres” in it, or maybe “correspondance.” That doesn’t narrow it down too much. I cited the work in several different things I’d written, so I could probably skim through my 20-page bib until I find it. Alternatively, it’s one record among 30,000 in my bibliography database, so I could limit the records to those written in French, before 1800, source type = primary source, and wade through the several hundred results until I came across it. And I also remember I was talking about it with a friend via email several years back, so I could find all the emails sent to him over the past 8 years and it would probably be listed there somewhere. Or maybe I could sort my sources by file type in the file browser, and look through all the JPG file titles. All sorts of options, but one paradoxical impact computers have on their users (or at least ‘power users’) is to make them lazy – “surely you don’t expect me to ‘manually’ look through things on the computer; isn’t automation what computers are for?”

Then, just as I realized I might have to spend more than 30 seconds finding the photos, it hit me. The one outstanding feature of the sought-for work wasn’t its title or its editor, nor its date of publication. The most salient fact that I remembered was that it was published in 8 volumes. That’s a lot of volumes. And, fortunately, my bib database just happens to have a separate field for Number of Volumes. Voila, presto, five seconds later I had nine records that matched my search for Number of Volumes = 8, and the eighth of the nine records was my source: Père Henri Griffet, Recueil de lettres pour servir d’éclairissement à l’histoire militaire du règne de Louis XIV, 8 vols. (The Hague, 1760-1764). Ta da! Now I could search for files titled Griffet in my massive primary source folder, import one of the photos into ABBYY FineReader and away I go.

All those twists and turns. And what do we learn from this? First, this is what computers do. They allow you to take all sorts of paths, all sorts of different methods, all towards reaching (hopefully) the same goal. And at the same time, your comfort level and past experience tends to push you in a particular direction, whether or not that technique is the most efficient way to achieve your objective. Second, none of this was really necessary, since I could have just photographed some text on my current monitor and run it through OCR. But that’s also what computers do (to me at least) – they make you want to use what you already have in the ways you usually use them. It’s much harder to step back a ‘take a fresh look.’

So what does this have to do with EMEMH? Notes – it always seems to come back to notes. What I learned yet again on this little adventure is how important it is to control your own notes. In this case, being able to search and sort by a variety of keywords.  Sometimes a field that has almost no substantive meaning for a human (say, the number of volumes) may actually be the quickest and easiest way to find what you’re looking for, because to the database the Number of Volumes field is just as meaningful as the Title field, and even easier to process. Sometimes you need to think more like a computer and less like a human. Because computers still aren’t very good at thinking like us.

And those photographs? Turns out I deleted those old photos when I downloaded a more complete version of the Griffet work from Google Books. Shaggy dog indeed.


Tags: ,

3 responses to “A Kludge By Any Other Name”

  1. Andy Tumath says :

    Oh, the British Library and its prohibition of digital cameras……the bain of my existence. What takes 10 minutes in the National Archives is a day’s labour in the BL.

    There was a pilot scheme for the use readers’ cameras in the BL, I have been told. I envy those who enjoyed such days of plenty.

  2. Björn Thegeby says :

    On the original question, I would suggest copying to an IDE internal drive, the most likely HD interface on the old computer. That drive can then be inserted in a modern (non-laptop) computer using a IDE-to-SATA adapter for a few bucks. The power connections should be the same. An old IDE drive han be had for nothing.

    It definitely beats JPGs+OCR:-)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: