Quick Tech Tip: The Power of Placeholders and Cobbling Things Together on the Cheap

So you just received a whole box full of newish French books on warfare in the age of Louis XIV, but don’t have time to read through them all now, much less copy them for search purposes? Just scan and OCR the Table of Contents and Index of each book into your digitized note-taking system of choice (mine still being DTPO, thank you very much). These will serve as placeholders (a virtual index, if you will) whose keywords will show up in your search-for-a-string results, leading you to the bookshelf and the relevant pages. For even better utility, add some keywords (or group in a topical group) for metadata-powered filtering and sorting.

If you have a little more time, use a batch find/replace (e.g. in MS Word, which you can open DT documents in) and automatically split each Index entry into its own record. For example, if, after you’ve OCRed a book’s Index into a text document, each entry ends with a page number, a period, then a paragraph mark and a new line, just search (in Word) for “.^p” and replace that text string with a unique marker e.g. Replace “.^p” with: “.#####^p”. Save and close the Word document, go back to DT, then run your Explode by Delimiter Applescript (from Devonthink script forum – you can also find similar code in WordVBA online). Enter the delimiter ##### and you’ll get hundreds of records on individual topics that can then be sent (automatically via AutoClassify) to an appropriate topic group. With such automation it’s always a good idea to first skim through the results to make sure there weren’t any errors or snafus. If there are many, delete the resulting files, fix those delimiters in the original doc in Word, and repeat the exploding. Note: with DT you may need to convert the resulting documents from .txt to .rft – you can do it in a single batch though, right after the parsing process when they’re all still selected. And if you keep your provenance data in the Spotlight Comments, open the Info window and type the source info in while you still have all those documents selected.

LD Word replace

This takes several steps and is not particularly elegant, but if you’re dealing with hundreds or thousands of records and don’t want to take the time to learn Regex or Applescript or Python or the next programming-language-of-the-month, it will be well worth your while. I just used this process to import and parse 25,000 records from my old Access database into DTPO, as well as parsing 1600 letters from Marlborough’s Letters & Dispatches that I hadn’t yet entered individually into Access. As “one thought-one note” cultists already know, small chunks make searching and processing much much easier. And it makes a huge difference when using DTPO’s proximity search. Digitize, man!



Tags: ,

One response to “Quick Tech Tip: The Power of Placeholders and Cobbling Things Together on the Cheap”

  1. jostwald says :

    Reminder: If you want to turn an Index into separate records, make sure that your OCR software is interpreting a page as two separate columns, rather than combining together two columns into a single line.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: