Books vs. websites
The holidays are almost over, and I have two research projects to try to finish up before classes start in a few weeks, plus service obligations to attend to. But by way of diversion, I give you more meta-talk on note-taking and digital resources.
Since I always try to teach my students the variety of ways we collect, store, search and use historical information, a recent post on the Humanist listserv provides an interesting contrast between how works in print (i.e. books and journal articles) are indexed and cataloged, vs. the simplistic way in which we have to search for information on websites. On its own terms, Google’s search engine is impressive, but we all know how much effort it takes to winnow through all the chaff (or signal-to-noise, for a more modern metaphor).
But techno-optimism runs rampant, particularly among the ‘digital natives’ – just look at all the web sources our students cite in the first drafts of their research papers. I recall way back in the late 1990s having to inform a library student worker (as I sat at the microfilm machine scrolling through some 18C newspapers) that, contrary to his belief, most information was not in fact online. With 2013 now dawning, much of history is still offline, and much of the good stuff that is “online” is in fact hidden behind paywalls or is otherwise proprietary. Just as importantly, the information in print is much better organized and more findable than most of what we have on all but the most popular websites – for someone used to Library of Congress subject headings and multiple defined-vocabulary keywords (like my Notes database), the brute force Google uses in its searches is a giant step backwards, although it does have the advantage of being automated. A ‘keyword’ search is a dumb search, as I like to say.
As the Humanist post above points out, the print world is much more searchable because its content has already been aggregated into human terms, concepts by which we meta-organize information. The full-text of print works may not always be available online (although some are in Google Books and you can always scan your own), but their catalog records certainly are, along with subject heading info. Look at all the useful ways my Vauban book has been cataloged:
|Siege warfare — Europe — History — 18th century|
|Offensive (Military science) — Europe — History — 18th century|
|Manpower — Europe — History — 18th century|
|Spanish Succession, War of, 1701-1714|
|Vauban, Sébastien Le Prestre de, 1633-1707|
Or, in my own system:
We can even find an increasing number of Table of Contents and book Indices, for example, on Amazon. In short, full-text search of digital sources is great for finding proper nouns, but keywords and subject headings are even more useful for finding meaningful chunks of information, particularly when most websites don’t include much searchable metadata. I’m reminded of this fundamental digital limitation every time I have to search through my hundreds of scanned PDFs (OCRed) of secondary sources and thousands of un-OCRed primary-sources PDFs. You’ll see the same thing when you search Google Books – limited semantic search capabilities means lots of false hits to wade through (not to mention OCR issues). As a result, I’ve recently realized that I really should be scanning in the Indices of the books I scan, as well as the text itself. Previously I had rather stupidly assumed that I didn’t need the Index, because I already had the full text. Oops.
Now if only we could hire a thousand librarians to provide paragraph-level keywording and subject headings for all EMEMH books, articles, and archival documents! Maybe next Christmas.