Archive | July 2013

Publishing as a signal for historical quality?

I should probably clarify a point I was indirectly making in my previous post. It’s been on my mind for several years as I study the historiography of a Great Captain and national hero, and as I pay attention to what types of works get published, what early modern textbooks are available from various presses, what types of history positions get filled, what types of historical topics dominate conferences, and what assumptions are being made when we debate issues like open access, making history more popular, and embargoing dissertations.

I find myself increasingly concerned, as an Associate Professor with tenure at a teaching university, with what publishing market pressures seem to be doing to academic history. (I may well be imagining a ‘golden age’ of academic history). Academic publishing and academic history seem, increasingly to me, to be working on quite different assumptions. As I hinted at in the last post, I’m particularly concerned about what this disconnect means for early modern European history. What does it mean for that whole notion that “you are only a good historian if you publish a book”? If there’s very little market for early modern titles (or diplomatic history, or…), as there seems to be, this presumably means that few presses will publish early modern monographs, as seems to be the case. So does that mean early modernists aren’t particularly good historians because they can’t get their work published with more than a handful of publishers? For those early modern works that will be published, will publishers insist that they be much broader in scope, inevitably taking young scholars beyond their expertise, possibly to their detriment when it comes time for other scholars to review the book? (I know of one case where a work on 20th century American history, addressing a topic ten-times more popular than any early modern topic, had to tack on a final chapter to turn it into contemporary history – I find that appalling, and disturbing if it’s part of a broader tendency.)

What impact do these market forces have on what early modernists can publish? Monographs are seen as the measure of a tenure-worthy historian, at least at research universities. And monographs are facing significant market challenges as their academic customer base dwindles. So how does that combination impact early modern faculty on the tenure-track? Do we historians really want, as the AHA seems to insist, to define our discipline by what the publishing industry considers salable, especially when that market has radically changed over the past few decades? Should market forces in publishing, forces that increasingly require presses to sell their books to a popular market (since purchases from academic libraries and academics are declining), a market that isn’t particularly interested in early modern history generally, be determining tenure status at research universities? Is that a good measure of scholarly worth? Do we even know what book publication is really measuring anymore?

More disconcerting to me is the question of how this affects the type of early modern history that is written. How much dumbing down and stretching of topics is acceptable to get that early modern monograph published? Methodologically, how do we reconcile the need to provide representative evidence with the publishing preference for the telling anecdote and straightforward story? What is hiding behind that scholar’s anecdote – a mountain of data or nothing but suppositions and assumptions? How many more books do we need from established scholars and university presses that are a mile wide and an inch deep? How many more surveys of a period do we need when we don’t have the detailed case studies to justify their sweeping generalizations? How many more syntheses do we need based on a handful of non-random case studies and a few easily-accessible published sources? Do we publish these types of works because we historians are interested in such broad topics, or because we know those types of proposals are more likely to be published, or because a press approaches us with such a book idea because it will sell better than the alternative?

How do we square the demand for published monographs with the widespread academic disdain of popularized history? Are monographs being significantly shaped (deformed?) by popular market pressures the publishers are under, and which they pass on to the author? Market pressures that don’t necessarily align with how academic historians see themselves? The most “pure” monographs would seem to be those published by a press like Brill, which allows the author to publish in as much detail as they want, but at a hefty price tag. Is that good or bad?

As usual, I have more questions than answers. But I keep wondering whether reconciling what publishers want and what ‘academic history’ wants is sustainable. Maybe I’m missing something?

A sign of things to come?

Derek Croxton’s general narrative of the history of the treaty of Westphalia has finally been published.

Croxton, Derek. Westphalia: The Last Christian Peace. New York: Palgrave Macmillan, 2013.

The Peace of Westphalia, which brought to a close the Thirty Years War, is arguably the most important treaty signed before the twentieth century. It was a signal event of the early modern era, thoroughly of its time even as it prefigured the radical political developments of subsequent centuries. This sweeping, exhaustively researched history is the first comprehensive account of the treaty and its wider significance to appear in the English language. Bringing together the latest scholarship with an engaging narrative, it retraces the European situation leading up to the Congress of Westphalia, exploring its political and intellectual underpinnings and placing it in a broad global and chronological context. In doing so, it definitively fills a massive lacuna in the scholarly literature while offering fascinating insights into the long historical transition to modernity.

In addition to being envious that Derek’s published yet another book, I was also a bit taken aback by the book. Not the content mind you. But the context. Let me explain.

So here we have a narrative on the Peace of Westphalia, surely the most well-known peace conference of the early modern period – even present-minded political scientists are familiar with it, Westphalia often being described as a turning point in modern Western diplomatic history, the beginning of the modern international state-system. Given its name recognition, it’s not a surprise that it would be published by a hybrid academic-popular publisher like Palgrave Macmillan.

What did surprise me, however, is its cost. A narrative history of probably the most famous early modern peace treaty, providing the only recent full-length narrative history of the peace that I can think of. Yet this book lists at $115. From Palgrave Macmillan – not a press I normally think of as giving Brill pricing a run for its money. In fact, it’s priced only $10 more than Derek’s 2001 edited dictionary on the peace, by Greenwood. What’s going on here?

Theories:

  • Academic publishers are in a world of hurt, and even books like this mandate a $100+ price tag. I suppose the 400 pages may have increased the price, but I wouldn’t think by that much. Would a 250-page book have decreased the price by $60? Somebody should calculate how many more dollars your book will cost for every year you delay its publication.
  • Diplomatic history is in trouble. If this is what a book on Westphalia costs, I shudder to think of how much they would charge for a history of the peace of Rijswijk (Ryswick)!
  • Early modern European history is in trouble. It’s not just this book. All sorts of books on early modern European history, by a variety of presses, are regularly selling in hardcover for $100 these days. Is this because English-reading audiences don’t care about the period?

All of these theories happen to be backed up by other data, whether it’s the number of faculty research in the fields, the number of job postings in the fields, the number of publications and presentations in the field…

My assumptions (that may not be correct, and haven’t really been clarified in past Publishing discussions):

  • List price is a signal for the popularity of a book, a calculation made by the publisher to make a profit while not pricing the book out of the market.
  • Some topics will not be published at all, since publishers aren’t willing to charge the astronomical amount that would be required for them to get their money back from the printing.
  • Price is largely driven by the format: hardcover vs. paper. Hardcover books are much more expensive, and if your book is only published in hardcover, that’s an indication that the publisher doesn’t think it will sell many copies regardless of the price.

Thoughts?

Random thoughts on transcribing and typing sources

Now that there’s software that can actually take advantage of full text, that means each historian has hundreds of documents that could, nay, should, be entered as full text. So what’s a digital early modern historian to do? (I’ll ignore the more challenging handwritten documents and focus on the ‘easy’ published documents.)

  1. Type all those treatises, campaign narratives and histories yourself. Been there, done that. Not fun, especially when you keep finding more and more of them. It’s labor-intensive work that ideally would be done by someone who charges far less than what you get paid per hour, or far less than what you think your time is worth.
  2. Prioritize. What do you really need fully transcribed, and what can you get by with just reading and taking notes? Unfortunately certain types of documents, and certain methods/questions, may require full text. Or if you’re hoping to create some kind of monster textbase that will allow you to find every instance of a particular term, person or place in an instant.
  3. Optical character recognition. OCR software has been a godsend, especially given all those 19C-20C published collections of documents. The problem comes when trying to OCR published works from the 18C and earlier (even some 19C texts), or when dealing with imperfect copies of the most-pristine text: blurry photos, phantom hands, smudged pages… Still waiting for a solution to that.
    Even if the copy is perfect, the original typeface might not be. From personal experience I can confirm that editing OCRed text, even at 90% accuracy, takes forever. And don’t get me started on old italicized text. OCR no-likey.
    There’s ‘dirty’, i.e. uncorrected, OCR, and then there’s filthy OCR. Like this:

    The Ugly Side of OCR

    The Ugly Side of OCR

    The gutter is a black hole, the text on either side of it wasn’t even recognized as text (the box in the middle represent an image), and the text that was recognized has many errors. Oddly though, this is actually usable output, as long as you don’t expect every instance of every word to be found (or even most), and you aren’t trying to look at word frequencies and the like. So sometimes you just accept the dirty OCR as a semi-useful supplement to an image PDF.

  4. Or you could download full texts from online databases. Google Books and Archive.org both have OCRed text versions of many of their works, but that doesn’t help too much given the OCR issues with early print works.
  5. Some text versions of early modern works are available to subscribers of databases like EEBO, ECCO, Burney… Their hefty subscription prices help pay for their transcription costs. EEBO offers downloadable full-text versions for a small selection of its works (see the EEBO-Text Creation Partnership), while ECCO doesn’t allow you access to the underlying text at all, other than showing snippets in the results window. On one of my research jaunts I ended up just taking screenshots of each page of results (dozens), so that I could later match them up with my own PDFs at home. Pain in the ass.
    ECCO search results

    ECCO search results

    EEBO will be further extending its utility over ECCO by adding more text versions, as well as releasing more texts to the public, over the next several years. So one option is to simply delay your research till those get released. Assuming the works you need will be included in future releases.

  6. Recently I explored another option: I purchased a copy of voice recognition software (Dragon Naturally Speaking 11.5), went through the brief training process, and then started reading one of my English campaign narratives out loud. The process seemed to work well – by page 30 or so I was reading along in a monotone voice at a pretty decent clip. And the results looked really good on the screen. Too good, as it turns out. I discovered that voice recognition software is, in some ways, more dangerous than OCR software. The errors from OCR software occur on the level of individual letters, which means that you can quickly skim over a page and the misspelled words will jump out at you, and almost every error (aside from the stray speck interpreted as the letter i) will be an incorrect letter, at most two letters substituted for one. For many of these you can use fuzzy search and they won’t make much of a difference – assuming the misspelling is only off by one letter or two. OCR may give you a lot of errors, but they also tend to be consistent: in a given book, the letter g may always gets confused for a q, the ii is usually a u, and uneven inking might result in many e’s being read as c’s. Those can be easily fixed with a well-thought-out global find and replace. This also means that most OCR errors can be easily ‘fixed’ by reading them in context – ctc. becomes etc., sepamte becomes separate, and so on.
    Voice recognition eliminates the inability of OCR software to interpret irregular and faded fonts, because your human eye is processing the text instead. But it replaces the computer’s difficulty interpreting imperfect visual symbols with the computer’s difficulty interpreting imperfect aural symbols. The result: completely different types of errors. It insists on typing real words, so if it doesn’t know the word you’re saying, it types one (or more) that sounds (somewhat) like that word. Which means it’s almost impossible to just glance over a page and have the errors jump out at you – they’re all words alright, just not the right ones. So you have to read every sentence to see if the words actually make sense in the sentence. Grammar checkers aren’t helpful because one noun or verb or adjective is as good as another in most cases – try grammar checking Chomsky’s famous Colorless green ideas sleep furiously. Even worse, if Dragon can’t figure out what four-syllable word you just said, it figures you must’ve said two two-syllable words, or maybe three one-syllable words and a one-syllable word. Whichever combination of words are in its dictionary and sound most similar to what you said. To prevent this, you could watch what’s typed on the screen as you dictate, but that would require you stop looking at the printed text you’re reading from. That slows down your pace quite a bit, unless you’re good at memorizing paragraphs of text at a glance. (Come to think of it, maybe having the image PDF in an adjacent window on the screen might help? I’ll have to experiment, but it’d probably interfere with what allows us to read quickly – glancing ahead.)
    But that’s not the end of your problems. Any document that discusses places, say a campaign narrative describing all the places around which armies maneuvered or mentioning the polyglot officers and regiments performing such maneuvers, will require you to program an alias in for each proper name, or train the software to type Esquerchin properly, or you’ll be surprised by the results. The problem is multiplied because on one page you’ll be talking about Esquerchin, and on the next page the army will have moved on to Hénin-Beaumont. So you may have dozens of small places not in the standard Dragon dictionary, and not really predictable until you come across them in the text. Nor does Dragon like it when you pronounce those French villages in French, at least when dictating an otherwise-English document. But it can even have problems with English. It’s less likely to choose an archaic English word over a more common modern one (or two, or three…). The complex, long 18C sentences also seem to play havoc with its ability to guess a word based on grammatical context. Sometimes even I need the commas to make sense of the sentence, but saying “comma” four times in every sentence gets old real quick (and you never know which commas are important until you’re at the comma). Speaking out 18C punctuation probably increases the number of words to say by a third! And you can forget about preserving all the orthographic complexities of early modern English, fuffice to fay.
    There are some fixes to these problems. You can, for example, check the transcription every sentence or so. Or you can just be content with the 1% (less? more?) of the words that make absolutely no sense – which is fine unless they are important words, or unless you need to understand the sentence that it appears in (maybe I need to practice my phonetic reading). Or if you hear like a computer, you could pre-read through the text, trying to identify which words will be problematic. The only other alternative I see is to read through every sentence afterwards just to find those small number of unpredictable but possibly important (and certainly confusing) errors. All these fixes necessarily slow down the data entry speed, which is kinda the whole point after all. Which makes me ambivalent about voice recognition for transcribing early modern historical sources, unless of course they don’t mention any people and places, or use old-fashioned language!

So what’s left? The only thought left is the old-fashioned way – hire somebody to type it all out. That’s what I’m seriously considering at this stage. Suggestions?

To embargo or not to embargo, that is the question

Interesting discussion going on about a recent American Historical Association policy calling on History departments and libraries to allow PhD holders to keep their online dissertations closed to public access for up to six years, instead of the now-standard 1-3 years.

Two overviews: Chronicle of Higher Ed and Inside Higher Ed. Be sure to check out the comments as well. There’s also a Q&A on the AHA website defending the policy, with a few comments.

On the one hand we have the AHA, combined with increasingly-nervous PhD graduates facing a horrible academic job market. Their argument essentially boils down to protecting the potential future job prospects of said grads. Without a book (i.e. a revised dissertation), it’s said PhD grads won’t be able to get academic jobs and acquire tenure. And, according to some university press editors, an openly-available online dissertation discourages them from offering a publishing contract. They only want to publish fresh original arguments that will blow your mind. The argument is pretty simple: No book, no tenure. So make it easier to get a book contract.

On the other hand, just about every point and assumption above is contested by critics: that university presses won’t publish work derived from online theses; that scholars won’t buy works derived from online theses; that most dissertations will even get published as a monograph; that History should remain focused on the published monograph. In contrast with the AHA’s professional (read: job) concerns, critics of the new policy tend to gravitate around several related philosophical beliefs: in timely open access to publicly-funded research; that historical knowledge is about disseminating information rather than hiding it; that the profession needs to squarely face up to the reality that an increasing number of History PhDs won’t ever get an academic teaching position (even if they want one) regardless of whether they publish or not – hence the rise of (short-term?) jobs in the Digital Humanities; and a general desire to drag History into the digital age, which means lessening the totemic status of the monograph. As the joke goes: “monograph (noun): a book written by one person and read by one person.”

I’m a bit conflict on the question, so I thought I’d look back at my own experience (“write what you know,” they always say).

My Experience

Untangling all these arguments would require a lot of empirical evidence that we probably don’t have. I have no idea whether anyone will be denied a job or tenure if they don’t embargo their dissertations. But I can speak to how my online dissertation spread, who used it, and where I am now as a result. My main conclusion is primarily descriptive: if anybody cares (and you want people to care), your dissertation will get released into the wild regardless of what you do.

I successfully defended my dissertation in October 2002, on the anniversary of Lepanto in fact. Fortunately I played the role of Don Juan of Austria, and made only a few minor changes to the manuscript before it was “presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University.” I chose to forego an embargo although I had the option of up to three years. Don’t really know why I refused – maybe I was lazy, or freaking out over the impending job search, or maybe there was already a glimmer of the open-access proponent bubbling up inside me. I wasn’t particularly embarrassed with the product, although I’m very glad I was able to delete a chapter or two for the book and add a few new chapters. The dissertation was thus immediately placed online (full PDF) on Ohiolink for the world to download – still there in fact.

I don’t know how often the dissertation has been downloaded over the past 11 years – there’s a Download count on the webpage, but since that only says ‘2’, which includes my download just now, I’m thinking it’s either a new site design, or it doesn’t include views, or something. It certainly doesn’t include other people sharing the PDF file, and it probably doesn’t include the various database services subscribed to by many Research I university libraries (e.g. Electronic Theses and Dissertations).

Why do I think more than two people looked at my dissertation? First, my ego won’t allow me to consider the alternative. But second, because I know others have referenced the dissertation. The one thing I can say for certain is that your dissertation will get out there, assuming anyone is interested in the subject.

A simple Google search of “Vauban’s Siege Legacy” (a silly title I realize) results in 411 hits. Don’t worry, I won’t cite them all, but here’s a flavoring of the venues in which your dissertation will likely appear if it’s online. Whether this publicity is enough to risk the chance of having your manuscript refused by a publisher is up to you.

At the beginning, the basics of your dissertation will appear in the various online catalogs. That meant, in my case, OhioLink and the Ohio State Library catalog, which include the abstract. So when crafting your abstract you need to balance your paranoia about getting scooped with your desire to entice those reading the abstract into checking out the whole thing. Or maybe you want a really boring diss abstract to keep the vultures away? Hopefully a milquetoast abstract won’t reflect poorly on you.

In the pre-digital days, dissertations still managed to find their way to scholars who cared – you could order a copy from UMI or the degree-granting university; some foreign dissertations were passed around like so many trading cards. Such dissemination was largely invisible to the author, unless you happened to get a royalty statement listing the number of copies sold.

Online dissertations have brought all this out into the open. Your dissertation will get much more attention if it’s easily available: online = on the mind. If my example is any indication, your dissertation will make its way into Google Books – perhaps surprisingly as No Preview, without the ability to search the full text (with an uncertain “20??” year to boot). It will also appear in WorldCat, where I discovered that my diss apparently floated over the ocean currents to the University of Wollongong (Australia). In this flat-earth age, your dissertation download link might be brought to the attention of Chinese web surfers, or your abstract even translated into Vietnamese (2011). The flotsam and jetsam of academic history I guess.

If you wrote on a subject that has a broader historiography (and if you didn’t, why bother?), your dissertation will likely come to the attention of bibliographers. In my case, the diss and an annotation were included in Bill Young’s International Politics and Warfare bibliography,  published a mere two years after the dissertation was finished. The timing was rather fortuitous (or not, if you’re paranoid) in this case, since such bibliographies see very sporadic publication.

A relevant dissertation is a cited dissertation, which means that your dissertation will likely get cited before you are able to see the book in print. This depends in part on how long it takes to convert the diss into a monograph – mine took four years (October 2002 to November 2006), and the conventional wisdom suggests a shelf life of six years, depending on the overall activity of your subfield’s historiography. But whether your dissertation gets cited before it becomes a book also depends on your dissertation committee. I was fortunate to have two prolific authors on mine who are kind enough to highlight the work of their students, but this also guaranteed that my diss would be cited by them before I could finish my revisions (John Lynn in 2004 and Geoffrey Parker in 2004 and 2005). In that case, you hope they don’t divulge all of your trade secrets, but you’re not normally in a position to tell them what to publish, nor do you want to refuse their imprimatur on your scholarship. Networking with productive scholars in your field is a requirement for academic advancement, yet the only thing most newly-minted PhDs have to network with is the dissertation.

If you’re trying to get an academic job, you’ll have to decide for yourself how much to lift the secret veil surrounding your conclusions. Here’s where job hunting dovetails with concerns about plagiarism: beyond the intellectual theft, will your ideas be publication-worthy if somebody already published them? How far will you go to prevent such premature release? Will you present your results at a conference, to your peers, to those most likely to steal your ideas? In this situation the paranoid person is damned-if-you-do, damned-if-you-don’t: don’t present and nobody will know when someone else steals your ideas; present and you’re letting the would-be thief know the combination to your safe. Or perhaps you’ll reason that getting some articles or book chapters into print quickly will provide a broader audience for your ideas, whet their whistle for the full book, and maybe even forestall any potential plagiarists. After all, it takes a long time to convert the average dissertation into a book. So maybe you publish a few small pieces before the book, or even a summary of the main findings (pink rectangle):

Pre-monograph publications

Publications relating to my dissertation in pink

Of course a lot of this is beyond your control: if you chose a crowded subfield back in grad school, you’ll have more competition, you’ll need to publish more quickly in order to keep up with the historiography, the pressure to plagiarize will likely be greater, as will the likelihood that “your” ideas will be published under someone else’s name.

Now that social media are all the rage, you can also expect all sorts of other sites to mention your dissertation. My own ad on my website (without mentioning the full version online) let the cat peek out of the bag. Posts on your blog might also steal the future book’s thunder. You can also expect word-of-mouth to spread your online dissertation to the winds. Sites like GoodReads mention my dissertation. More and more websites simply automate the inclusion of materials that mention a term, for example here. The more accessible your dissertation is online, the wider it will spread. Does that lead to greater (eventual) book sales? To more opportunities to present your work? To an inability to get it published at all? Who knows.

If your topic is something which non-academics are interested in reading, you can expect your dissertation to appear in all sorts of other places, especially if the dissertation is free while your book costs just south of $200. (That would be an interesting argument for open-access dissertations – nobody will read your astronomically-expensive book, so you might as well offer the dissertation draft for free.) Amateur military historians (i.e. non-academics) are quite capable of using the Internets to find items of interest. This website on Military Architecture includes a paragraph from my diss as well as a link to the full PDF (April, year unclear). This wargamer message board includes a link to a wargaming review of the book (2009). Here’s a Napoleonic website forum reference (May 2013), and another in a wargamer online journal (Winter 2013). Heck, your dissertation might even get mentioned in the comments of a Russian military history blog (2011).

But, it being the Internet, vultures abound as well. For example, this site offers to sell you an iBook copy of my diss for less than 3 euros (“published” May 2013). What a bargain! I won’t mention that you can also download a pirated copy of the published book for that matter.

More seriously, you may very well find your online dissertation work plagiarized, or perhaps ‘copied without proper attribution’ is a more generous phrase. That happened with my dissertation, in a French book on Vauban that shall go unnamed. At least with my dissertation online (and cited in the bibliography in question), it’s easy for any curious reader to pull up the dissertation and see how closely the French text follows my own formulation, even quoting exactly the same examples that I did. But it doesn’t help that the very concept of plagiarism is itself nebulous: not only is its definition unclear here in the US (witness the AHA’s hesitancy to engage the issue), but it is even less clear once we look at graduate student research assistants, not to mention standard practices in other disciplines and the academic cultures of other countries. If you consider overseas markets critical to your work, probably better to err on the side of paranoia if you can.

In short (if you can call 2000 words ‘short’), the availability of my dissertation online hasn’t killed my career, such that it is. But there are plenty of possible rebuttals, beyond the impossibility of (dis)proving counterfactuals. I had a book contract soon after graduation. I was also somewhat protected against plagiarism in that my book was published around the same time as the French plagiarism came out, and a summary of my work was soon thereafter published in French. And to be honest, my career path may be less than ideal for those young, aspiring PhDs still hoping to get a job with a 1-0 teaching load: I’ve yet to be interviewed as an up-and-coming young historian after all.

My impressionistic takeaway on this whole debate: whether a dissertation is embargoed or not probably has little impact on whether most History PhDs get a tenure-track job, and whether most receive tenure. But I’m left with many larger questions about the state of the profession and publishing, and how they relate to jobs. To what extent does the current mandatory online publishing of dissertations explain the tough job market for History PhDs over the past five years? Have recent graduates published their works at lower rates than in the pre-digital age? How many tenure-track History jobs are there overall, compared to the number of applicants, compared to the number of History PhD graduates each year (this policy’s intended target)?  How many of those jobs are at research universities that require a prestigious university press monograph? How many History faculty are denied tenure for failing to publish a book? How much difficulty did those denied tenure have because their dissertation was already online? Is publishing a book enough for tenure at a research school, or do you need glowing reviews as well, which means you really don’t have six years anyway? If you need six years to complete your book, does that mean you’re not competitive for a position that requires a book for tenure in the first place? What is the relative harm when every new PhD has the same 1-3 year embargo – are publishers simply not publishing young PhDs without a longer embargo, and are schools refusing to hire new PhDs as a result? How many dissertations – as a percentage and in absolute numbers – will actually get published by academic presses these days compared to previous decades? To what extent should the AHA base its policies on what the History publishing industry says it wants? Who is this policy really for: the AHA, History departments, History publishers, or all those History PhDs madly scrambling for jobs and adjuncting in the meantime? Too many big questions for me to answer.

The AHA’s policy may assuage the field’s collective conscience, might protect the publishing prospects of a few PhDs, and could serve as a sop to History PhD students livid at their field for preparing them to graduate into un- or under-employment. But it doesn’t address the real crises in academic History: 1) helping the average History PhD land a tenure-track position, and 2) making History more relevant to the broader public. To me, the debate is about a First World problem in a world increasingly made up of Third Worlders.

Using DTPO: The Hard Part

Now that we understand why tags are for provenance and groups are for topics, as well as the weaknesses of metadata and content keywords, we can get to the nitty gritty – the data entry part.

Read More…

Rediscovering the older past through the younger past

Great essay at the Chronicle of Higher Ed on Google Books’ rediscovery of the long 19C.

High notes for EMEMHians like myself include:

Good stuff. And free.

Organizing with Devonthink

Yet more Devonthink.

The advantages of Devonthink Pro Office for me compared to my old Access database are straightforward:

  • DTPO collects all my research documents in one place, making it easy to view them all together. This includes PDFs (both image and OCRed), jpgs, Word docs, rich text docs, Excel files, web pages (you can easily clip items from your web browser), emails, RSS feeds (in fact, Skulking posts get automatically fed into DTPO), not to mention many other file types. It’s so liberating to simply press the down arrow key to shift from viewing an image PDF to reading a text doc, rather than hunting in the file viewer and waiting for Word, Adobe Acrobat, Photoshop… to open. You can also open up multiple windows, for example, to transcribe a manuscript or compare documents. Ideally I’d have all of my sources and notes already in text format, but that’ll never happen: 6,661 text files currently in DTPO, but 15,596 image PDFs in my main database, 68 GB overall. (FWIW you can achieve some of this with OS X’s Preview/Quick Look.)
  • DTPO allows me to group and tag documents regardless of their file format. I can search these varied documents with a consistent syntax within a single interface. Previously, I had an excellent keywording system in my Access database, but it only included text documents, not the thousands of PDFs I had in various folders. (FWIW you can achieve some of this functionality by simply tagging your individual files and using Spotlight searches to find them.)
  • DTPO allows for rapidly robust text searches, including wildcards, nesting terms, proximity searches, as well as fuzzy searches. Its AI (Classify, See Also, Auto Group) will also identify textual documents that share similar word patterns, which eliminates much of the need to manually categorize sources.  Related words are also identified with its Similar Words command. These search features are critical to my DTPO usage.
  • DTPO allows for easy hyperlinking (or wikilinking if you prefer) between RTF and other documents, so you can link notes to original sources, textual notes to the PDF documents they sprang from, etc. Your text note can include a hyperlink that will whisk you to the exact PDF page of the original document. A shortcut or back arrow will take you right back to the note.
  • DTPO allows easy syncing between my iMac desktop and MacBook Air laptop, so that all of my sources and notes and thoughts are available wherever I go. There’s even an iPad version of DT, although it’s received mixed reviews.

Less important advantages of Devonthink Pro Office for me (i.e. things I could already do, or probably won’t do much in DT but the capability exists):

  • DTPO converts file formats (e.g. from PDF to rich text, Word to rich text…).
  • DT’s Pro Office version comes with ABBYY FineReader to OCR imported documents. I usually use the full FineReader 11 Pro for PC because I want more control over the results.
  • DTPO can run a web server so you can share your databases online.
  • DTPO runs Applescripts and Perl scripts, and supposedly Python scripts can be run within Applescripts. Something I might look into for the far future, assuming I ever learn Python.

So much for the main advantages.

But this is old hat if you’ve read previous posts on DT. If you aren’t a computer power user, follow the straightforward DT organizations described on other blogs. They’re simple setups, even if they don’t take advantage of some of DT’s more powerful features, i.e. the AI. The rest of this post provides a more powerful (and complicated) variation on how to make efficient use of DT’s myriad organizational structures for historical research. If you’re not familiar with DT already and you want to be, go read some of the previous posts on the software, check out the forum on the Devonthink website, and buy the Take Control of Getting Started with Devonthink 2 book. Reading through the PDF manual wouldn’t hurt either. If you can’t or won’t RTFM, then you should stop reading now and stick with what you know. Read More…