Historians and their damned eclecticism
After a recent post, I received an email from a blog reader pointing me to an online project focused on presenting information on WW2 in a digital environment: Envisioning History. If you’re interested in the potential for digital history, you should check out some of its YouTube videos.
Watching a few of the videos made me appreciate once again how different note-taking needs can be from one academic inquiry to another. For historians they appear to break down into two categories:
- Storing the unstructured data of the original sources themselves, whether they be scans of archival documents or photographs of battle paintings or fortifications.
- Keeping tabs on the structured data extracted from the unstructured data for a particular analytical purpose, usually a summary of the sources’ content, most often in the form of summaries, keywords or quantities, or maybe a quote or two.
These two types of information are fundamentally different yet related, as many methodological treatises will no doubt explain. They also require different types of note-taking capabilities. Historians are generally generalists and a surprising number still rely on 3″x5″ notecards and simple Word documents, but those on the cutting edge (like me?) also want all the cool toys our colleagues in other fields play with. Back in the day, I referred to the options as a quadrangle or rectangle.
On the one hand, historians want the unstructured, original documents, in all their messy glory. That means, ideally, transcripts of the full document, as well images of the originals. But the hard work for historians is to find structure in this chaos. We need the originals in case we need to verify a quote or ask a new question. But usually we are creating structured data by extracting information from the original sources according to a pre-determined method – ideally the methodological choices are made clear to the reader. Structuring the unstructured requires, at a minimum, keywords and categories, with a link back to the original whenever possible. There’s an immense amount of winnowing in the journey from source to analysis.
Unfortunately, few off-the-shelf software packages handle both unstructured and structured information with equal facility, a limitation I am once again confronting in my transition from MS Access to Devonthink. The CLIO or Kleio software by Manfred Thaller was an early, extremely historio-centric, attempt, but most historians fall back on packages with a broader user base. In part the specialization of software is probably explained by cultural explanations as much as purely technical ones. Over the several decades of the Personal Computing Age, increasing processor power has expanded our toolkit far beyond the simple, business-friendly relational databases and punch-card legacy systems of the early days. But it took time – well into the 1990s social scientists still shoehorned their qualitative data into the quantitative model preferred by most early software, creating quantitative dummy variables (0=No, 1=Yes) and numeric codes (1=French, 2=English, 3=Dutch) that could be handled by the slower processors of the period. Though the 0/1-Yes/No dummy variable retains its elegant simplicity, today’s more powerful relational databases can handle big chunks of rich text, but we also have so many other types of analysis to choose from: quantitative analysis software like SPSS and Minitab, GIS software which allows us to analyze geospatial relationships between objects, and textual analysis software for those seeking word frequencies, KWIC, co-occurrences, topic modeling and the like. Modern programs even allow photographic (and increasingly video) analysis, Picasa’s facial recognition being one of the simplest examples. Today, “big data” is as likely to be composed of text or images as it is of numbers.
This embarrassment of computational and data riches (the data-mining metaphor isn’t accidental) comes at the price of having to deal with often-incompatible software packages, not to mention distinctive methodologies. Unfortunately I don’t have a solution for methodologically eclectic historians, but I figured I could at least give those ignorant in relational databases a better sense of what this type of software does well. So here I post my Powerpoint slides from a presentation I gave 14 years ago, describing the basic design of my note-taking Access database. It preserved both the unstructured original sources (transcriptions only, alas) and built a layer of structured data on top of it. As it turns out, I didn’t use it to its full potential, but perhaps it might be of use to others, if they can suss out what the slides mean without my accompanying explanation (that’ll cost you extra). Maybe I’ll even return to relational databasing in my next project.