Using DTPO: The Hard Part
Now that we understand why tags are for provenance and groups are for topics, as well as the weaknesses of metadata and content keywords, we can get to the nitty gritty – the data entry part.
Types of Documents
What types of documents do I store in my DTPO databases? Funny you should ask:
- A copy of an original document, could be either a primary or secondary source, although they will be tagged differently. Sources can be either transcripts in rich text format, or an image PDF – being able to view and use both in the same user interface is nice. When I want to import an original source into DTPO, I create a new tag under the appropriate parent tag (e.g. Primary>Published>biography>Lediard Life of John Duke of Marlborough, or Primary>Archives>BL>Add MSS>Add61175, or Secondary>book>Barnett First Churchill) and drag the document directly into it. If you want to keep a pristine copy of an original source, use PDF.
To make the documents a bit smaller, over time I’m parsing the original texts.It helps with the AI and proximity searches (assuming they’re in text format), and it also makes a “tighter fit” when grouping (as the library catalogers would say). Some journal articles and book chapters focus on a specific topic and don’t require parsing – they can be assigned a provenance tag and copied to a thematic group. But many books and source collections consist of multiple topics that need to be separated into their individual documents – not only for “one thought, one note”, but also to facilitate Classify/See Also. For the first pass I’m parsing by year, and will eventually break them down further into separate documents for each theater and EventID (e.g. siege of Douai 1710…).
Since documents can reside in a Tag without being in a Group, ideally I try to keep the original document in the tag hierarchy and not in groups, unless I don’t have time to take notes on a source, or unless the entire original source focuses on a single topic.
Some bloggers have suggested ‘bursting’ a document so each page is its own document. It seems like a good idea since it superficially follows the ‘keep notes short’ rule of note-taking. So I tried it, but it didn’t work for me. Way too many single-page documents to scroll through in both the document list pane and search results window, plus it destroys the proximity search capability if your proximate terms happen to range across two pages. It also breaks up the context of a sentence, making it difficult to find the previous (or next) page to see the beginning (or end) of a sentence in search results. Very annoying. Hopefully my (admittedly longer-term) parsing by substantive categories (by year, by theater, by EventID) will make proximity searching easier.
- Here’s an example of one source tag ‘folder’ (with a newer setup than the screenshot in the previous post):
Notice the Tag Bar in the above screenshot (underneath the page images) – blue bullets are Tags, while gray bullets indicate Groups. (Perhaps paradoxically, you can actually exclude Tags from Tagging, which deletes the blue tag from the Tag Bar and converts the yellow tag icon into a gray folder Group icon in the group list pane, yet they still remain with the tags in the group list pane – like the Branch through Place tags in the image above. This is helpful if, for example, you want to include a parent tag in the list hierarchy without cluttering up the Tag Bar.) In the document list pane I’ve started using the flag (flag icon) to identify archival volumes that I own.
- Catalog info for each archive volume. In the above screenshot I have a document in the Add28918 tag folder that reproduces the info from the British Library’s online catalog. I imported various archive catalogs and inventories into DT so even if I only have an image PDF of Additional MSS 61134, any search of “Somerset” will find its catalog description. The catalog document also includes a link to the ms images – adding such pointer links are useful whenever you might come across a single document in a search result.
- Bibliographic info for each published source. You can create a similar bibliographic document for every secondary source, akin to your bibliographic database metadata. Below is an early example I’ve been working on, which includes citations (for easy pasting), an official abstract of the work, and live links to Google Book’s Common Terms. You can use DT’s search, Concordance and Keyword features to display word frequency list and uncommon words with the text in the original of course. You can also create a document in this provenance tag folder to keep track of future archive volumes to consult.
This works for both primary and secondary sources. In the example below, I created a smart group (shown as a nested purple folder below the Churchill tag folder in the Group list pane) that lists all the documents that mention the text string “Winston Churchill” in their Content.
- Note taken on an original document. This is a separate rich text document with a summary/paraphrase of the original. Each note document is created in the source tag folder (alongside the original) and then copied to a thematic group – so you can find the note in either the related source tag folder or in the relevant group. If you have the original source in DT, your note can include a link to the original as well, for quick reference. You can even paste a quote from the original into the note document if it’s in full text. Hell, throw in an image if you’re feeling cheeky. You can also indicate the status of each document, e.g. I use a colored label to indicate documents that require parsing (when I’m not using them to indicate which project a document belongs to). As with the Winston Churchill example above, you can also add smart groups that list all the documents that cite this particular source – search for “<source title>” in Content.
- Thoughts on a topic. Distinct from notes on a topic. Thoughts are my musings (and I have many), whereas notes include summaries and quotations from other sources. I haven’t quite decided yet where to record the distinction between note and thought for each document – most likely, I’ll either add ‘thought’ in the title (as a suffix) or possibly in one of the metadata fields. I also decided to scan in all those musings I’ve scribbled on the back of conference programs over the years – maybe I’ll get around to transcribing them someday, but in the meantime they’re tagged and grouped and viewable with my Mark I eyeball.
- Drafts of my work, both published and unpublished drafts. You can also use See Also on these drafts to find other related documents. Did I mention I need to figure out how to distinguish documents as notes, thoughts, drafts?
- Reference documents. I’ve already mentioned documents that include bibliographic info on each source (catalog entries for archives) – especially helpful when you’d like to search the content of an image-only PDF.
But there’s all sorts of other information – dictionary entries from the ODNB, wikipedia entries (!), maps, and so on.
For tabular info, this could be a sheet (think Excel worksheet) with a sortable timeline, or a rich text document can include a table, say if you want to make a sheet that summarizes daily operational activities (with links to other documents as needed in various cells).
For visual information, you can import a jpg like a marked-up map or picture.
If you have a website that you frequently visit – a site to convert between Old Style and New Style dates, or the Lexicons of Early Modern English – you can create a document whose title is the URL. When clicked, DT will open the site in its built-in browser.
- To Do Lists. Often categorized by project. This might include a list of items to check next time I’m in archive X, or at a regional library, or what I need to do for a research project. If you’re into organizational cults like GTD, DT has a template for that as well. Not sure where best to locate them in my layout – some people use labels for this purpose.
- Saved searches (smart groups) that refresh every time you click on them. We’ve already seen the “Winston Churchill” example, but I could also have a smart group to pull together all the documents relating to a specific year (see the Chronology-Source by Date smart groups in the last screenshot above), a separate series of smart groups showing all those documents published in specific years, all those documents that mention (or were written by/to) Marlborough… Any query you run frequently can be saved as a smart group.
The real test of a note-taking system is whether it can perform the common tasks that you need to do, and make it easy to do them. After you’ve set up your groups and tags (or at least started), the next step is to get info into DT. There are numerous ways that you can get your notes into DT; a lot will depend on what type of notes, and what media your original sources are on.
If you’re taking notes off of a printed source, it’s as simple as creating a new RTF note document in DT, adding the provenance info, taking notes, and add tags/metadata appropriately. If you know exactly which group the note belongs in, create the note document in the appropriate group. Otherwise you could just create all your notes in the provenance tag folder and assign them to groups them later.
If you have PDFs or photographs of documents, drag them into the appropriate provenance tag folder and add the necessary tag/group/metadata. (Reminder: I keep the original source files in the provenance Tag folder, and create separate note files; I generally only put the note files in the Group folders unless the entire original relates to a specific group.) Realize, however, that the AI won’t work on such image documents.
Here’s what my workflow looks like at this stage, though it might change in the future:
- Import documents into DT.
- Think long and hard about your import strategy, and automate as much as your skills allow. My goal has been to get as many of my sources into DTPO as quickly as possible, importing all those documents that share a single tag together (drag lots of documents to the tag in one fell swoop; add other tags).
- If you first import documents to tags in your main database and then distribute them to the language-specific databases, the tags you’ve already assigned will appear in the other databases, although without the hierarchy.
- If you just drag a bunch of documents to your Inbox and then assign them tags, they’ll remain in your Inbox until you move them out.
- If you import documents directly into provenance tags they will bypass the Inbox and exist solely in your tag folder. If you then want to assign them to groups, be sure to replicate the document to a group rather than simply move it. Moving it will delete the tag; deleting the tag will delete the document if it isn’t in another tag or group. In the latest version (DT 2.6), you can assign groups to documents in a tag folder by typing the group name in the tag bar – previously that would move the document out of the tag.
- If you are importing documents, make sure to create your full group/tag hierarchy first, and then drag the document(s) to the lowest child tag/group. I learned the hard way that if you drag a bunch of documents to a parent tag and then later drag them to a child, the parent tag will remain (because it was manually assigned). Those documents will appear in the child folder, but also in the parent level. (Oddly, you can manually delete the parent tag and DT will automatically replace it with the same tag, but since it wasn’t manually created, the document won’t appear in the parent folder anymore.)
If you want to see all the documents underneath a parent, e.g. involving all aspects of being a Great Captain in a single document list pane (see all five child groups together), then simply select all the child groups in the group list pane.
- Transcribe an original source. You can have a window open showing the original source and another window open to transcribe into. You can also link the RTF transcript to the original PDF document.
- Take notes on an original source. Remember that each note should be its own document (“one thought, one note”), and each note should be put in a specific group, even if the original source file remains in its provenance tag folder. If you create the note within the provenance tag folder, it will automatically get tagged and appear with the original document. You can manually add tags anywhere. I like each note to be found in the source tag folder and in the thematic group(s) it relates to.
I don’t think you can markup image PDFs in DT, but I create separate (linked) RTF note documents, so you’re on your own if that’s your cup of tea. I create RTF notes using a variety of methods. Generally I’ll assume that, whenever possible, you want your sources digitized for full-text (via OCR or manual/verbal transcription), which means that you’ll have a DT document that contains the original text in RTF/PDF, and another RTF document (several actually) that contains notes about the content in the original. To do this:
- You can have a window open showing the original source and another window open to type your notes in. Always be sure to include the provenance info in your new note: at the top of the content (e.g. “Denman.234”), in the metadata, and/or with a tag. Whenever you need to check the original later on, you can do a quick search, or:
- If you want to take the extra step, you can also link a note to the original document. Not only will your note say Denman.234 at the top, but it will also include a hyperlink that will take you to the original copy of Denman in DT. If you link to the bibliography document as well, you’ll have a quick link to the citation that you can paste.
-To do this from the original, create a link from the original source document (Ctl-Opt-Cmd-C or right-click Copy Link Item), then create a new document (Ctl-Cmd-N), name it with a summary of the note’s content, paste in the copied link. Then type in your notes.
-Or you could start by creating a new note, make your note, and then link to the original. Copy a link from the original in the document list pane with the mouse (right-click Copy Link Item), then just paste directly into the new note (which will still have the focus). If you have the Automatic Wikilinks preference set, you can also type the file name in your note document and DT will automatically create a link to that document. It helps here to use a predictable yet unique title for your originals and other documents.
-You can add other info to your note as needed, such as pasting a copied quote from the original. You can add links to other documents, e.g. to the bibliographic document with the citation info.
-If you delete the provenance info from the file name, make sure the source info is included in the new note somewhere. The hyperlink will maintain the link to the original in any case, as will the provenance tag. But if you want to sort your notes by provenance, you’ll need to add it to the metadata (tags aren’t ideal for sorting since they automatically alphabetize and you can only sort by the first tag). If you create the note in the provenance tag folder, the new note will automatically receive the provenance tags. But if you don’t create the note in the provenance tag folder, you should transfer the source info to some of the metadata fields and/or tags.
-If you are taking notes on an original located in a group, assign a provenance tag if you haven’t already.
- You can create multiple notes from the same source, e.g. you want to parse an imported Word document full of notes on Denman into separate note documents. One way is to add provenance info and then simply duplicate the first note (Cmd-D), then delete the first note’s content from this second note. Duplicate the second note, then, while still in the second note, delete the content after the second note. Now go to the third note and repeat the process. This will reproduce the link to the original (you’d need to update the page links if you’re creating links to a specific page within the PDF) as well as all of the metadata and tags, so you only need to enter that once per source. Each duplicate document will shrink as you go along since you’re deleting the previous note content.
Another option if you don’t use metadata: within your document full of notes, separate each note (if they aren’t already separated by returns), add the provenance info to each paragraph, then type a short summary phrase above each paragraph to serve as the note document title. Then copy the first summary+note and use New with Clipboard (Cmd-N) – this will paste the selected text into a new note and name that note with the first line, i.e your summary. Rinse, lather and repeat. Always repeat.
- If you wanted to add another step and are working with text originals, you can paste a quote directly into your note, in addition to your summary/paraphrase. I think the DT forum also has some script that should allow you to create a link to a specific selection of text within a RTF document.
- FWIW, DT’s built-in note-taking options aren’t very useful. There is a Split Document command, but for some reason it doesn’t copy any of the metadata or tags to the new split document. So I don’t use it.
- DT also has an Annotation template that will create a new annotation doc with an automatic link to the original document. But there are a few problems. First, since the annotation file name is auto-generated as ‘Original file name (Annotation)’, you’ll need to manually rename it if you want your file name to have a meaningful title. More importantly, DT won’t allow more than one annotation document per source. Say it: “one thought, one note”. Granularity requires that each original source have more than one thought and therefore more than one note. Even if you rename the annotation and try to create another new annotation off the original, it will still open up the (renamed) original annotation. There may be some way to fix these issues with code, but at this stage annotations are a non-starter for we who worship at the Holy Altar of “One Thought, One Note.” Amen.
- If you are reading through a PDF and want to take a note on a particular page, you can select the page thumbnail, right-click Copy Page Link, then create a new note document, paste the link, and take your notes. Click the link to jump back to the PDF when you’re done.
- Add notes to your own work. If you are narcissistic like me and like to add new evidence to your work (“piling on”), you can do that as well. I create a separate note document for each new piece of evidence and then link to them in my article.
- Transfer marginal notes from a photocopy. You can create a separate note document for each marginal comment and then link it to the original, as in the Adding Additional Evidence example above.
- Record which pages cover a particular topic or event. Not quite sure about this yet. You could create a separate table of contents document with links to each section in the PDF. Or you could create a short document in the topic/EventID group and simply link to the specific page of the original PDF. I try to have the Index for each book in DT, I suppose I could store them as separate Index documents to search.
- Ad hoc note-taking. If you’re outside of DT and need to take a quick note, you can open up the Sorter (a little tab that allows you to import things into DT from elsewhere) and use the Take Note command to type up a quick note. If you want to quickly place a selection of text into DT, use the similar Copy Selection command. DT also has an add-in to web browsers that allows you to import a webpage. In fact, anytime you’re feeling lazy, you can simply make a note in your Inbox and it will be waiting for you to tag and group whenever you want (or never, if you’re that kind).
Miscellaneous Tips with DT
- Your group hierarchies should reflect your conceptual understanding of your research. For example, if you think there are five aspects to being a Great Captain, each aspect should be a child group in the Great Captain parent group.
- To avoid putting documents in a parent group (as recommended by DT), make a Miscellaneous child group.
- Spotlight Comments is the only metadata field that can be batch edited, so it could be used in the place of tags for batch importing, or batch keywording.
- Assigning tags and groups can be sped up by simply jumping to a document’s tag bar (Ctl-Return) and typing it in. In DT 2.5, if you tried to assign a group to a tagged document, it moved the document rather than replicate it, deleting the tag. But in DT 2.6 that seems to have been fixed.
- Feel free to temporarily create new groups, tags and databases on the fly, especially if you don’t want to use the Search results window – the font is quite small, even with a 27″ monitor. You can merge replicates of documents to a new document, and quickly move documents to a new group.
- If you use a nested hierarchy, you can selectively exclude some of the levels from tagging. For example, if you have a grouping of Battles>BattleID>BattleGe>Blenheim1704 (vs. Battles>BattleID>BattleLC>Ramillies, and Battles>Tactics), you can exclude the BattleID group from tagging, so it won’t appear in the tag bar. The same works for tags.
- Although you can edit PDFs within DT, deleting or rearranging pages for example, doing so will vastly increase the size of the file – multiplying its size several fold in some cases. So if you want to parse a PDF, or delete the large Google page (which will throw off the Automatically Resize view command), you might want to do it outside of DT and then import the PDF.
- If you like tags a lot, there’s a separate Ammonite software add-in you can purchase, which will provide tag clouds. It won’t be that useful if you’re using tags for provenance though.
Issues I still haven’t figured out:
- How best to combine tags and metadata. I waffle between putting provenance info in the tags or the metadata (I sometimes do both). Since tags are easy to assign to multiple documents (whereas you must enter all metadata, except Spotlight Comments, one document at a time), they’re probably preferable to metadata, especially when importing. Yet metadata would be best for sorting and precise searching without wildcards.
- I want to name my note documents substantively, e.g. a note based on Add 61134 f50 that talks about sieges should be titled “1702.03.24 Granville warns Marlborough to avoid sieges” even if I title the original source document “Add 61134 Granville”. Thus right now I’m using two distinct naming conventions for file names: provenance for the originals (though this is admittedly duplicating the tag info), and content summary for the note documents. Not really efficient, so I’ll see how it works going forward.
- I think I should create a separate child tag for each source (e.g. BL>Add MSS>Add62343, or my Churchill example above), secondary and primary. This way I can gather all the documents together (or do it with a smart group). This, however, requires creating a folder for each source (and there are 10,000+), so we’ll see how long I stay with this.
- DT’s use of dates is a mess, definitely not historian-friendly. In Access you can define a field as a date field, and it will treat it as a date – it can display the date in different formats (M/D/YY or YYYY.MM.DD), and it can perform date-like calculations, e.g. it can convert Old Style dates to New Style dates by adding 11 days, plus it knows that 3/4/1702 is after 2/20/1702 so you can perform a query like “find me all the letters written after 1706.” If a program doesn’t know a string is a date however, adding 11 days to 6/28 is just as likely to result in 6/39 – history makes it even more complicated because sometimes a document won’t have a day, just “May 1704”. There aren’t any custom date fields in DT, so people have suggested you reassign the document date metadata (date document created, last modified…) for your historical dates. That’s a bad idea for numerous reasons, not least because you may actually need to use those metadata fields for other purposes.
As a result I’m currently using a convoluted system: the date of an event discussed in the letter’s content is a prefix in the name, date of publication is in the Subject field (because it’s one of the few fields that works with both RTF and PDF). But I’m not yet sure how to record both the OS and NS dates. You need both to do different things – to find a particular document, you need the date of the English letter or newspaper as it appears on the document (in OS), yet when looking at a list of documents in chronological order, you need to standardize them in NS. Nor have I figured out what to do if a treatise was published in 1671 but actually written in 1650. In short, I haven’t figured out a good solution (Access made it simple to create a dozen date fields), but I am hoping that a future version of DT will at least include a custom date field – hopefully several.
- Timelines. You can use a smart group in DT to create a timeline, i.e. a list of documents sorted by date – either your document names need to start with the date, or you put it in one of the metadata fields. You can also manually create a summary timeline in a sheet or table. This runs into the problems mentioned in the above point however. If I ever want to do any kind of quantitative analysis of the correspondence, I’ll still need to use Access.
Other common tasks that I haven’t mentioned?
Coming up: From Notes to Draft.