There’s gotta be a better way
In preparation for a new introductory digital history course that I’ll be teaching in the fall, I’ve been trying to think about how to share my decades of accumulated computer wisdom with my students (says the wise sage, stroking his long white beard). Since my personal experience with computers goes back to the 80s – actually, the late 70s with Oregon Trail on dial-up in the school library – I’m more of a Web 1.0 guy. Other than blogs, I pretty much ignore social media like Facebook and Twitter (not to mention Snapchat, Instagram, Pinterest…), and try to do most of my computer work on a screen larger than 4″. So I guess that makes me a kind of cyber-troglodyte in 2017. But I think that does allow me a much broader perspective of what computers can and can’t do. One thing I have learned to appreciate, for example, is how many incremental workflow improvements are readily available, shortcuts that don’t require writing Python from the terminal line.
As a result, I’ll probably start the course with an overview of the variety of ways computers can help us complete our tasks more quickly and easily, which requires understanding the variety of ways in which we can achieve these efficiencies. After a few minutes of thought (and approval from my “full-stack” computer-programming wife), I came up with this spectrum that suggests the ways in which we can make computers do more of our work for us. Toil, silicon slave, toil!
So I’ll want my students, for example, to think about low-hanging fruit (efficiency fruit?) that they can spend five minutes googling and save themselves hours of mindless labor. As an example, I’m embarrassed to admit that it was only when sketching this spectrum that I realized that I should try to automate one of the most annoying features of my current note-taking system, the need to clean up hundreds of PDFs downloaded from various databases: Google Books, Gale’s newspaper and book databases, etc. If you spend any time downloading early modern primary sources (or scan secondary sources), you know that the standard file format continues to be Adobe Acrobat PDFs. And if you’ve seen the quality of early modern OCR’d text, you know why having the original page images is a good idea.
But you may want, for example, to delete pages from PDFs that include various copyright text – that text will confuse DTPO’s AI and your searches. I’m sure there are more sophisticated ways of doing that, but the spectrum above should prompt you to wonder whether Adobe Acrobat has some kind of script or macro feature that might speed up deleting such pages from 1,000s (literally) of PDF documents that you’ve downloaded over the years. And, lo and behold, Adobe Acrobat does indeed have an automation feature that allows you to carry out the same PDF manipulation again and again. Once you realize “there’s gotta be a better way!”, you only need to figure out what that feature is called in the application in question. For Adobe Acrobat it used to be called batch processing, but in Adobe Acrobat Pro DC such mass manipulations now fall under the Actions moniker. So google ‘Adobe Acrobat Actions’ and you’ll quickly find websites that allow you to download various actions people have created. Which allows you to quickly learn how the feature works, and to modify existing actions. For example, I made this Acrobat Action to add “ps” (primary source) to the Keywords metadata field of every PDF file in the designated folder:
I already copied and tweaked macros and Applescripts that will add Keywords to rich text files in my Devonthink database, but this Adobe solution is ideal after I’ve downloaded hundreds of PDFs from, say, a newspaper database.
Similarly, this next action will delete the last page of every PDF in the designated folder. (I just hardcoded to delete page 4, because I know newspaper X always has 4 pages – I can sort by file size to locate any outliers – and the last page is always the copyright page with the nasty text I want to delete. I can, for example, change the exact page number for each newspaper series, though there’s probably a way to make this a variable that the user can specify with each use):
Computers usually have multiple ways to do any specific task. For us non-programmers, the internet is full of communities of nerds who explain how to automate all sorts of software tasks – forums (fora?) are truly a god-send. But it first requires us to expect more from our computers and our software. For any given software, RTFM (as they say), and then check out the software’s website forum – you’ll be amazed at the stuff you find. Hopefully all that time you save from automation won’t be spent obsessively reading the forum!