Soldiering through Text
I figured I might as well add a bit more info to my posts on new books/articles/chapters – mostly because I can, rather than because it necessarily provides a fundamental insight into the work. Starting with our old friend the word cloud:
The above cloud was made in www.wordle.net, and allows for some customization. [For the record, www.tagcrowd.com had a bit more customizability, e.g. displaying the word count beside each word, although it has a text length limit]
Unfortunately, wordle doesn’t do what Google Books does, i.e. provide a list of words that are unique (or at least uncommon) in this source compared to other sources – this would require wordle to have a huge database of documents. And of course Google Books doesn’t include journal content.
If you want to play around with the text a bit more, the free Voyant (formerly the awkwardly-named Voyeur) website provides all sorts of bells and whistles, although it can be a bit buggy at times, especially when you’re trying to use it in the classroom 😦
An example of its features:
As you can see, Voyant gives you a whole bunch of additional info – word clouds (with extremely limited customization), word counts, the original text with chosen words highlighted (“identity” here), the occurrence of each word throughout the work (Word Trends), a KWIC (keywords in context) view of the chosen word, etc. You can also load multiple texts and do comparisons across documents.
One of my other recent software acquisitions (spoiler alert!), Devonthink Pro Office*, provides a few of the same functions, but it allows you to compare any document against all your other documents (a theoretical limit of 300 million words and 200,000 documents per database, I’m told – but you can have multiple databases). I’ll show just a couple screenshots of my far-from-complete system in DTPO. Currently it consists almost exclusively of secondary sources, totaling 14 million words (265,000 unique words), and 7,680 documents.
First, a word count of the article (which only counts words of three characters or more, hence some of the stats are different from Voyant above):
Then there’s a list of words that this article uses more frequently than other documents in my DTPO database use (mostly on the War of the Spanish Succession thus far) – if that makes any sense:
I’m still playing around with how to organize things in DTPO. I’ve been focusing on secondary sources thus far because: 1) it’s helped me the most with my immediate projects, and 2) it’s obvious how I can use DTPO for secondary sources, whereas I still need to give some thought to whether DTPO will be able to replace my Access database or not.
There are lots of other applications for analyzing text sources, but I’ll end with these three for right now. Of course this is just the tip of the digital humanities iceberg.
* Yes, I finally bit the bullet and joined the Apple fan boys, mostly for two pieces of software: Scrivener and Devonthink Pro Office. I absolutely love my MacBook Air BTW (it’s the iPad with keyboard I’d been wanting). I am, however, currently typing on my old PC desktop – will keep both. I’ll post more about my use of them in the future.