Archive | Methodology RSS for this section

You might be Millner

If words like “Army”, “Camp”, “march”, “Day”, “pitch”, and “Leagues” outnumber many common stopwords…

Screenshot 2019-01-11 13.07.01.png

You might be a campaign journal.

And if the fifth-most common word token is “d”, and if “Duke” and “Prince” are close behind, and if you capitalize your common nouns, you are pretty well assur’d that you are, in fact, an 18th century Campaign Journal.

Millner’s Compendious Journal (1733), to be precise.

For those moderns sticklers for method, lowercasing the text doesn’t invalidate the point:

Screenshot 2019-01-11 13.25.30.png

Now, cleaning the dirty OCRed text? That’s another matter…

Advertisements

From historical source to historical data

Where I offer a taste of just one of the low-hanging fruits acquired over my past five months of Python: The Sabbatical.

Digital history is slowly catching on, but, thus far, my impression is that it’s still limited to those with deep pockets – big, multi-year research projects with a web gateway and lots of institutional support, including access to computer scientist collaborators. Since I’m not in that kind of position, I’ve set my sights a bit lower, focusing on the low-hanging fruit that’s available to historians just starting out with python.

Yet much of this sweet, juicy, low-hanging fruit is, tantalizingly, still just out of reach. Undoubtedly you already know that one of the big impediments to digital history generally, and to historians playing with the Python programming language specifically, is the lack of historical sources in a structured digital format. We’ve got thousands of image PDFs, even OCRed ones, but it’s hard to extract meaningful information from them in any structured way. And if you want to clean that dirty OCR, or analyze the text in any kind of systematic way, you need it digitized, but in a structured format.

My most recent python project has been to create some python code that automates a task I’m sure many historians could use: parsing a big long document of textual notes/documents into a bunch of small ones. It took one work day to create it, without the assistance of my programming wife, so I know I’m making progress! Eventually I’ll clean the code up and put it on my GitHub account for all to use. But for now I’ll just explain the process and show the preliminary results. (For examples of how others have done this with Python, check out The Programming Historian, particularly this one.)

Parsing the Unparseable: Converting a semi-structured document into files

If you’re like me, you have lots of historical documents – most numerous are the  thousands of letters, diary and journal entries from dozens of different authors. Each collection of documents is likely drawn from a specific publication or archival collection, which means they begin being all isolated in their little silos. If you’re lucky, they’re already in some type of text format – MS Word or Excel, a text file, what have you. And that’s great if you want just to search for text strings, or maybe even use regular expressions. But if you want more, if, say, you want to compare person A’s letters with person B’s letters over the same timespan, or compare what they said about topic X, or what they said on date Z, then you need to figure out a way to make them more easily compared, to quickly and easily find those few needles in the haystack.

The time-tested strategy for historians has been to physically split up all your documents into discrete components and keyword and organize those individual letters (or diary entries, or…). In the old days – which are still quite new for some historians – you’d use notecards. I’ve already documented my own research journey away from Word documents to digital tools (see Devonthink tag). I even created/modified a few Applescripts to automate this very problem in Devonthink in a rudimentary way: one, for example, can ‘explode’ (i.e. parse) a document by creating a new document for every paragraph in the starting document. Nice, but it can be better. Python to the rescue.

The problem: lots of text files of notes and transcriptions of letters, but not very granular, and therefore not easily compared, requiring lots of wading through dross, with the likelihood of getting distracted. This is particularly a problem is you’re searching for common terms or phrases that appear in lots of different letters. Wouldn’t it be nice if you could filter your search by date, or some other piece of metadata?

The solution: use Python code to parse the documents (say, individual letters, or entries for a specific day) into separate files, making it easy to hone in on the precise subject or period you’re searching for, as well as precise tagging and keywording.

Step 1:

For proof of concept, I started with a transcription of a campaign journal kindly provided me by Lawrence Smith, in a Word document. I’m sure you have dozens of similar files. He was faithful in his transcription, even to the extent of mimicking the layout of the information on the page with the use of tabs, spaces and returns. Newhailes_sample1.pngGreat for format fidelity, but not great for easily extracting important information, particularly if you want, for example, June to be right next to 20th, instead of on the line below, separated by a bunch of officers’ names. (‘Maastricht’ and ‘London’ are actually a bit confusing, because I’m pretty sure the place names after the dates are that day’s passwords, at least that’s what I’ve seen in other campaign journals. That some of the entries explicitly list a camp location reinforces my speculation.) Of course people can argue about which information is ‘important,’ which is yet another reason why it’s best if you can do this yourself.

Aside: As you are examining the layout of the document to be parsed, you should also have one eye towards the future. In this case, that means swearing to yourself that: “I will never again take unstructured notes that will require lots of regex for parsing.” In other words, if you want to make your own notes usable by the computer and don’t already have a sophisticated database set up for data entry, use a consistent format scheme (across sources) that is easy to parse automatically. For example, judicious use of tabs and unique formatting:

Early_formatting_ideas.png

Step 2:

Clean up the text, specifically: make the structure more standardized so different bits of info can be easily identified and extracted. For this document, that means making sure each first line only consists of the date and camp location (when available), that each entry is separated by two carriage returns, and adding a distinctive delimiter (in this case, two colons, ‘::’) between each folio – because you’ll ultimately have the top level of your structured data organized by folio, with entries multiple entries per folio (this is a one-to-many relationship, for those of you familiar with relational databases like Access). Cleaning the text can be easily done with regex, allowing you to cycle through and make the appropriate changes in minutes. Assuming you know your regular expressions, that is.

The result looks like this:Newhailes_sample.png

Note that this stage is not changing the content, i.e. it’s not ‘preprocessing’ the text, doing things like standardizing spelling, or expanding contractions, or what have you. Nor did I  bother getting rid of extra spaces, etc. Those can be stripped with python as needed.

For this specific document, note as well that some of the formatting for the officers of the day is muddled (the use of curly brackets seems odd), which might equal loss of information. But if that info’s important, you should take care to figure out how to robustly record it at the transcription stage. If you’re relying on the kindness of others, ‘beggars can’t be choosers.’ But, if you’re lucky, you happen to have a scanned reproduction of a partial copy of this journal from another source, which tells you what information might be missing from the transcription:

Newhailes_sample_BL_Add61404.png

Camp journal sample of above, from British Library, Add MS 61404, f. 45.

You probably could do this standardizing within your Python code in Jupyter Notebook, but I find it easier to interact with regex in my text editor (BBEdit). Your mileage may vary.

 Step 3:

Once you get the text in a standard format like the above, you read it into python and convert it into a structured data set. If you don’t know Python at all, the following details won’t make sense. So go read up on some Python! One of the big hurdles for the neophyte programmer, as I’ve discovered over and over, is to see how the different pieces fit together into a whole, so that’s what I’ll focus on here. In a nutshell, the code does the following, after you’ve cleaned up the structure of the original document in your text editor:

  1. Read the file into memory as one big, long string.
  2. Perform any other cleaning of the content you want.
  3. Then you perform several passes to massage the string into a dictionary with a nested list for the values. There may be a better, more efficient way to do this in fewer lines, but my beginner code does it in three main steps:
    1. Convert the document to a list, splitting each item at the ‘f. ‘ delimiter. Now you have a list with each folio as a separate item.
      list_items.png
    2. Always look at your results. For some reason, the first item of the resulting list is empty (it doesn’t seem to be an encoding error), so just delete that item from the list before moving on.
    3. Now, read the resulting list items into a python dictionary, with the dictionary key the folio number, and all of the entries on the folio as the value of that folio. Use the ‘::’ as the delimiter here, with the following line of code, a ‘comprehension’, as they call it. Notice how the strip and split methods are chained together, performing multiple changes on the item object in that single bit of code:
      dictionary.png
    4. Now you use a for loop to parse each value into separate list items, using the other delimiter of ‘\n\n’ (two returns) between entries, using the string of the value (since otherwise it’s a list item and the strip and split methods only work on strings). This gives you a dictionary with the folio as the dict key, and the value is now a nested list with each of the entries associated with its folio as a separate item, as you can see with folio 40’s four entries:
      dictionary_nested_list.png

That’s pretty much it. Now you have a structure for your text. Congratulations, your text has become data, or data-ish at least. The resulting python dictionary allows you to search any folio and it will return a list of all the letters/entries on that folio. You can loop through all those entries and perform some function on/with them.  So that’s a good thing to “pickle”, i.e. write it to a binary file, so that it can be easily read back as a python dictionary later on.

Once you have your data structured, and maybe add some more metadata to it, you can do all sorts of analysis with all of Python’s statistical, NLP, and visualization modules.

But if you are still straddling the Devonthink-Python divide, like I am, then you’ll also want to make these parsed bits available in Devonthink. Add a bit of code to write out each dictionary key-value pair to a separate file, and you end up with several hundreds of files:

Newhailes_finder_folder.png

Each file will have only the content for that specific entry, making it easy to precisely target your search and keywording. The last thing you want to do is cycle through several dozen hits in a long document for that one hit you’re actually looking for.

Newhailes_sample_entry.png

That’s it. Entry of May 8th, 1705 in its own file.

The beauty is that you can add more to the code – try extracting the dates and camps, change what information you want to include in the filename, etc. Depending on the structure of the data you’re using, you might need to nest dictionaries or lists several layers deep, as discussed in my AHA example. But that’s the basics. Pretty easy, once you figure it out, that is.

Even better: now you can run the same code, with a few minor tweaks, on all of those other collections of letters and campaign journals that you have, allowing you to combine Newhailes’ entries with Deane’s and Millner’s and Marlborough’s letters and… The world’s your oyster. But, like any oyster, it takes a little work opening that sucker. Not that I like oysters.

Where the historians are, 2017

“Shaving the yak” is a phrase used to describe the process of programming. It alludes to the fact that you often have to take two, or more, steps backward in order to eventually move one step forward. You want a sweater, so first you need to get some yarn, but to do that you have to… and eventually you find yourself shaving a yak. The reason why you even consider shaving a yak is that, once you’ve shaved said yak, you now have lots of yarn, which allows you to make many sweaters. This colorful analogy has a surprising number of online images, and even an O’Reilly book. It’s a thing.

I have been doing a lot of digital yak-shaving over the past four months. Come to think of it, most of my blog posts consist of yak shaving.

So if you’re interested in learning to code with Python but not sure whether it’s worth it, or if you just want to read an overview of how I used Python and QGIS to create a map like this from a big Word document, then continue reading.history_programs_ba_map.png

 

Read More…

Wars of Italy, pt 2

A few more random maps of the Wars of Italy, just because it’s all I’ve got time for.

First off, the locations of various combats (battles and sieges mostly) from 1494-1559, color-coded by war, with the Natural Earth topo layer as base map. It might be more useful to group the wars together into a smaller number of categories (make a calculated field). Or maybe make them small multiples by war. But it’s a start.

Screenshot 2018-03-16 10.42.18.png

Then, using the Data defined override and Size Assistant style in QGIS 2.18, you can add army sizes to the symbols (sizeA+sizeB), to create a multivariate map. Note, however, that I don’t have very many army size statistics (the no-data events are all those tiny dots), but you get the idea – add a continuous variable to a categorical variable, and you’ve got two dimensions.

Screenshot 2018-03-16 10.50.10.png

Remember, with GIS and a good data set, the world’s your oyster.

Next up – getting that good data set. In other words, setting up the Early Modern Wars database in MS Access. What? You want to see my entity-relationship diagram so far? Sure, why not:

EMEWars ER diagram.PNG

And, once sabbatical hits this summer, I’ll be appealing to y’all (just got back from Texas) to help me fill in the details, to share our knowledge of early modern European warfare with the world.

Historical Research in the 21st Century

So let’s say you’ve become obsessed with GIS (geographical information systems). And let’s also posit that you’re at a teaching institution, where you rotate teaching your twelve different courses plus senior seminars (three to four sections per semester) over multiple years, which makes it difficult to remember the ins-and-out of all those historical narratives of European history from the 14th century (the Crusades, actually) up through Napoleon – let’s ignore the Western Civ since 1500 courses for now. And let’s further grant that you are particularly interested in early modern European military history, yet can only teach it every other year or so.

So what’s our hypothetical professor at a regional, undergraduate, public university to do? How can this professor possibly try to keep these various periods, places and topics straight, without burdening his (errr, I mean “one’s”) students with one damned fact after another? How to keep the view of the forest in mind, without getting lost among the tree trunks? More selfishly, how can one avoid spending way too much prep time rereading the same narrative accounts every few years?

Why, visualize, of course! I’ve posted various examples before (check out the graphics tag), but now that GIS makes large-scale mapping feasible (trust me, you don’t want to manually place every feature on a map in Adobe Illustrator), things are starting to fall in place. And, in the process, I – oops, I mean our hypothetical professor – ends up wondering what historical research should look like going forward, and what we should be teaching our students.

I’ll break my thoughts into two posts: first, the gritty details of mapping the Italian Wars in GIS (QGIS, to be precise); and then a second post on collecting the data for all this.

So let’s start with the eye-candy first – and focus our attention on a subject just covered in my European Warfare class: the Italian Wars of the early 16th century (aka Wars of Italy). I’ve already posted my souped-up timechart of the Italian Wars, but just to be redundant:

ItalianWars1494-1532PPT

Italian Wars timechart

That’s great and all, but it really requires you to already have the geography in your head. And, I suppose, even to know what all those little icons mean.

Maps, though, actually show the space, and by extension the spatial relationships. If you use PowerPoint or other slides in your classes, hopefully you’re not reduced to re-using a map you’d digitized in AutoCAD twenty years earlier, covering a few centuries in the future:

ItalySPM

Instead, you’ve undoubtedly found pre-made maps of the period/place online – either from textbooks, or from other historian’s works – Google Images is your friend. You could incorporate raster maps that you happen across:

Screenshot 2018-02-17 13.59.49

Maybe you found some decent maps with more political detail:

Screenshot 2018-02-17 13.59.58

Maybe you are lucky enough that part of your subject matter has been deemed important enough to merit its own custom map, like this digitized version of that old West Point historical atlas:

campaigns_charles_7

If you’re a bit more digitally-focused, you probably noticed a while back that Wikipedia editors have started posting vector-based maps, allowing you to open them in a program like Adobe Illustrator and then modify them yourself, choosing different fills and line styles, maybe even adding a few new features:

Italian Wars 1494 map

Now we’re getting somewhere!

But, ultimately, you realize that you really want to be your own boss. And you have far more questions than what your bare-bones map(s) can answer. Don’t get me wrong – you certainly appreciate those historical atlases that illustrate Renaissance Italy in its myriad economic, cultural and political aspects. And you also appreciate the potential of the vector-based (Adobe Illustrator) approach, which allows you to add symbols and styling of your own. You can even search for text labels. Yet they’re just not enough. Because you’re stuck with that map’s projection. Maybe you’re stuck with a map in a foreign language – ok for you, but maybe a bit confusing for your students. And what if you want to remove distracting features from a pre-existing map? What if you care about what happened after Charles VIII occupied Naples in early 1495? What if you want to significantly alter the drawn borders, or add new features? What if you want to add a LOT of new features? There are no geospatial coordinates in the vector maps that would allow you to accurately draw Charles VIII’s 1494-95 march down to Naples, except by scanning in another map with the route, twisting the image to match the vector map’s boundaries, and then eye-balling it. Or what if you want to locate where all of the sieges occurred, the dozens of sieges? You could, as some have done, add some basic features to Google Maps or Google Earth Pro, but you’re still stuck with the basemap provided, and, importantly, Google’s (or Microsoft’s, or whoever’s) willingness to continue their service in its current, open, form. The Graveyard of Digital History, so very young!, is already littered with great online tools that were born and then either died within a few short years, or slowly became obsolete and unusable as internet technology passed them by. Among those online tools that survive for more than a five years, they often do so by transforming into a proprietary, fee-based service, or get swallowed up by one of the big boys. And what if you want to conduct actual spatial analysis, looking for geospatial patterns among your data? Enter GIS.

So here’s my first draft of a map visualizing the major military operations in the Italian peninsula during the Italian Wars. Or, more accurately, locating and classifying (some of) the major combat operations from 1494 to 1530:

Screenshot 2018-02-17 13.40.19

Pretty cool, if you ask me. And it’s just the beginning.

How did I do it? Well, the sausage-making process is a lot uglier than the final product. But we must have sausage. Henry V made the connection between war and sausage quite clear: “War without fire is like sausages without mustard.”

So to the technical details, for those who already understand the basics of GIS (QGIS in this case). If you don’t know anything about GIS, there are one or two websites on the subject.

  • I’m using Euratlas‘ 1500 boundaries shapefile, but I had to modify some of the owner attributes and alter the boundaries back to 1494, since things can change quickly, even in History. In 1500, the year Euratlas choose to trace the historical boundaries, France was technically ruling Milan and Naples. But, if you know your History, you know that this was a very recent change, and you also know that it didn’t last long, as Spain would come to dominate the peninsula sooner rather than later. So that requires some work fixing the boundaries to start at the beginning of the war in 1494. I should probably have shifted the borders from 1500 back to 1494 using a different technique (ideally in a SpatiaLite database where you could relate the sovereign_state table to the 2nd_level_divisions table), but I ended up doing it manually: merging some polygons, splitting other multi-polygons into single polygons, modifying existing polygons, and clipping yet other polygons. Unfortunately, these boundaries changed often enough that I foresee a lot of polygon modifications in my future…
  • Notice my rotation of the Italian boot to a reclining angle – gotta mess with people’s conventional expectations. (Still haven’t played around with Print Composer yet, which would allow me to add a compass rose.) More important than being a cool rebel who blows people’s cartographic preconceptions, I think this non-standard orientation offers a couple of advantages. First, it allows you to zoom in a bit more, to fit the length of the boot along the width rather than height of the page. More subtly, it also reminds the reader that the Po river drains ‘down’ through Venice into the Adriatic. I’m sure I’m not the only one who has to explicitly remind myself that all those northern European rivers aren’t really flowing uphill into the Baltic. (You’re on you own to remember that the Tiber flows down into the Tyrrhenian Sea.) George “Mr. Metaphor” Lakoff would be proud.
  • I converted all the layers to the Albers equal-area conic projection centered on Europe, for valid area calculations. In case you don’t know what I’m talking about, I’ll zoom out, and add graticules and Tissot’s indicatrices, which illustrate the nature of the projection’s distortions of shape, area and distance as you move away from the European center (i.e. the main focus of the projection):
    Screenshot 2018-02-17 14.21.17
    And in case you wanted my opinion, projections are really annoying to work with. But there’s still room for improvement here: if I could get SpatiaLite to work in QGIS (damn shapefiles saved as SpatiaLite layers won’t retain the geometry), I would be able to re-project layers on the fly with a SQL statement, rather than saving them as separate shapefiles.
  • I’m still playing around with symbology, so I went with basic shape+color symbols to distinguish battles from sieges (rule-based styling). I did a little bit of customization with the labels – offsetting the labels and adding a shadow for greater contrast. Still plenty of room for improvement here, including figuring out how to make my timechart symbols (created in Illustrator) look good in QGIS.
    After discovering the battle site symbol in the tourist folder of custom markers, it could look like this, if you have it randomly-color the major states, and include the 100 French battles that David Potter mentions in his Renaissance France at War, Appendix 1, plus the major combats of the Italian Wars and Valois-Habsburg Wars listed in Wikipedia:
    Screenshot 2018-03-01 14.18.11.png
    Boy, there were a lot of battles in Milan and Venice, though I’d guess Potter’s appendix probably includes smaller combats involving hundreds of men. Haven’t had time to check.
  • I used Euratlas’ topography layers, 200m, 500m, 1000m, 2000m, and 3500m of elevation, rather than use Natural Earth’s 1:10m raster geotiff (an image file with georeferenced coordinates). I wasn’t able to properly merge them onto a single layer (so I could do a proper categorical color ramp), so I grouped the separate layers together. For the mountain elevations I used the colors in a five-step yellow-to-red color ramp suggested by ColorBrewer 2.0.
  • I saved the styles of some of the layers, e.g. the topo layer colors and combat symbols, as qml files, so I can easily apply them elsewhere if I have to make changes or start over.
  • You can also illustrate the alliances for each year, or when they change, whichever happens more frequently – assuming you have the time to plot all those crazy Italian machinations. If you make them semi-transparent and turn several years’ alliances on at the same time, their overlap with allow you to see which countries switched sides (I’m looking at you, Florence and Rome), vs. which were consistent:
    Screenshot 2018-03-01 14.27.00.png
  • Plotting the march routes is also a work in progress, starting by importing the camps as geocoded points, and then using the Points2One plugin to connect them up. With this version of Charles’ march down to Naples (did you catch that south-as-down metaphor?), I only had a few camps to mark, so the routes are direct lines, which means they might display as crossing water. More waypoints will fix that, though it’d be better if you could make the march routes follow roads, assuming they did. Which, needless to say, would require a road layer.
    Screenshot 2018-03-01 14.44.52.png
  • Not to mention applying spatial analysis to the results. And animation. And…

More to come, including the exciting, wild world of data collection.

Voyant-to-web also a success

In case you need proof, here’s a link (collocate) graph from Voyant tools, based off the text from the second volume of the English translation of the “French” Duke of Berwick’s memoirs published in 1779: http://jostwald.com/Voyant/VoyantLinks-Berwick1.html. Curious which words Berwick used most frequently, and which other words they tended to be used with/near? (Or his translator, in any case.) Click the link above and hopefully you’ll see something like this, but interactive:

Screenshot 2017-06-25 14.49.23.png

After you upload your text corpus in the web version of Voyant, you can then export any of the tools and embed it in your own website using an iframe (inline frame). Note that you can also click on any of the terms in the embedded web version and it will open up the full web version of Voyant, with the corpus pre-loaded. Something like this, but oh-so-much-more-interactive:

Screenshot 2017-06-25 14.47.08.png

Apparently the Voyant server keeps a copy of the text you upload – no idea how long the Voyant servers keep the text, but I guess we’ll find out. There’s also a VoyantServer option, which you install on your own computer, for faster processing and greater privacy.

Never heard of Voyant? Then you’d best get yourself some early modern sources in full text format and head on over to http://voyant-tools.org.

Automating Newspaper Dates, Old Style (to New Style)

If you’ve been skulking over the years, you know I have a sweet spot for Devonthink, a receptacle into which I throw all my files (text, image, PDF…) related to research and teaching. I’ve been modifying my DTPO workflow a bit over the past week, which I’ll discuss in the future.

But right now, I’ll provide a little glimpse into my workflow for processing the metadata of the 20,000 newspaper issues (yes, literally 20,000 files) that I’ve downloaded from various online collections over the years: Google Books, but especially Gale’s 17C-18C Burney and Nicholls newspaper collections. I downloaded all those files the old-fashioned way (rather than scraping them), but just because you have all those PDFs in your DTPO database, that still doesn’t mean that they’re necessarily in the easiest format to use. And maybe you made a minor error, but one that is multiplied by the 20,000 times you made that one little error. So buckle up as I describe the process of converting text strings into dates and then back, with AppleScript. Consider it a case study of problem-solving through algorithms.

The Problem(s)

I have several problems I need to fix at this point, generally falling under the category of “cleaning” (as they say in the biz) the date metadata. Going forward, most of the following modifications won’t be necessary.

First, going back several years I stupidly saved each newspaper issue by recording the first date for each issue. No idea why I didn’t realize that the paper came out on the last of those dates, but it is what it is.

Screen Shot 2014-03-09 at 7.53.14 PM

London Gazette: published on Dec. 13 or Dec. 17?

Secondly, those English newspapers are in the Old Style calendar, which the English stubbornly clung to till mid-century. But since most of those newspapers were reporting on events that occurred on the Continent, where they used New Style dates, some dates need manipulating.

Automation to the Rescue!

To automate this process (because I’m not going to re-date 20,000 newspaper issues manually), I’ve enlisted my programmer-wife (TM) to help me automate the process. She doesn’t know the syntax of AppleScript very well, but since she programs in several other languages, and because most programming languages use the same basic principles, and because there’s this Internet thing, she was able to make some scripts that automate most of what I need. So what do I need?

First, for most of the newspapers I need to add several days to the listed date, to reflect the actual date of publication – in other words, to convert the first date listed in the London Gazette example above (Dec. 13) into the second date (Dec. 17). So I need to take the existing date, listed as text in the format 1702.01.02, convert it from a text string into an actual date, and then add several days to it, in order to convert it to the actual date of publication. How many days exactly?

Well, that’s the thing about History – it’s messy. Most of these newspapers tended to be published on a regular schedule, but not too regular. So you often had triweekly publications (published three times per week), that might be published in Tuesday-Thursday, Thursday-Saturday, and Saturday-Tuesday editions. But if you do the math, that means the Saturday-Tuesday issue covers a four-day range, whereas the other two issues per week only cover a three-day range. Since this is all about approximation and first-pass cleaning, I’ll just assume all the issues are three-day ranges, since those should be two-thirds of the total number of issues. For the rest, I have derivative code that will tweak those dates as needed, e.g. add one more to the resulting date if it’s a Saturday-Tuesday issue, instead of a T-R or R-S issue. If I was really fancy, I’d try to figure out how to convert it to weekday and tell the code to treat any Tuesday publication date as a four-day range (assuming it knows dates before 1900, which has been an issue with computers in the past – Y2k anyone?).

So the basic task is to take a filename of ‘1702.01.02 Flying Post.pdf’, convert the first part of the string as text (the ‘1702.01.02’) into a date by defining the first four characters as a year, the 6th & 7th characters as a month…, then add 2 days to the resulting date, and then rename the file with this new date, converted back from date into a string with the format YYYY.MM.DD. Because I was consistent in that part of my naming convention, the first ten characters will always be the date, and the periods can be used as delimiters if needed. Easy-peasey!

But that’s not all. I also need to then convert that date of publication to New Style by adding 11 days to it (assuming the dates are 1700 or later – before 1700 the OS calendar was 10 days behind the NS calendar). But I want to keep the original OS publication date as well, for citation purposes. So I replace the old OS date on the front of the filename with the new NS date, and append the original date to the end of the filename with an ‘OS’ after it for good measure (and delete the .pdf), and Bob’s your uncle. In testing, it works when you shift from one month to another (e.g. January 27 converts to February 7), and even from year to year. I won’t worry about the occasional leap year (1704, 1708, 1712). Nor will I worry about how some newspapers used Lady Day (March 25) as their year-end, meaning that they went from December 30, 1708 to January 2, 1708, and only caught up to 1709 in late March. Nor does it help that their issue numbers are often wrong.

I’m too lazy to figure out how to make the following AppleScript code format like code in WordPress, but the basics look like this:
–Convert English newspaper Title from OSStartDate to NSEndDate & StartDate OS, +2 for weekday
— Based very loosely off Add Prefix To Names, created by Christian Grunenberg Sat May 15 2004.
— Modified by Liz and Jamel Ostwald May 26 2017.
— Copyright (c) 2004-2014. All rights reserved.
— Based on (c) 2001 Apple, Inc.

tell application id “DNtp”
try
set this_selection to the selection
if this_selection is {} then error “Please select some contents.”

repeat with this_item in this_selection

set current_name to the name of this_item
set mydate to texts 1 thru ((offset of ” ” in current_name) – 1) of current_name
set myname to texts 11 thru -5 of current_name

set newdate to the current date
set the year of newdate to (texts 1 thru 4 of mydate)
set the month of newdate to (texts 6 thru 7 of mydate)
set the day of newdate to (texts 9 thru 10 of mydate)

set enddate to newdate + (2 * days)
set newdate to newdate + (13 * days)
tell (newdate)
set daystamp to day
set monthstamp to (its month as integer)
set yearstamp to year
end tell

set daystamp to (texts -2 thru -1 of (“0” & daystamp as text))
set monthstamp to (texts -2 thru -1 of (“0” & monthstamp as text))

set formatdate to yearstamp & “.” & monthstamp & “.” & daystamp as text

tell (enddate)
set daystamp2 to day
set monthstamp2 to (its month as integer)
set yearstamp2 to year
end tell

set daystamp2 to (texts -2 thru -1 of (“0” & daystamp2 as text))
set monthstamp2 to (texts -2 thru -1 of (“0” & monthstamp2 as text))

set formatenddate to yearstamp2 & “.” & monthstamp2 & “.” & daystamp2 as text

set new_item_name to formatdate & myname & ” ” & formatenddate & ” OS”
set the name of this_item to new_item_name

end repeat
on error error_message number error_number
if the error_number is not -128 then display alert “DEVONthink Pro” message error_message as warning
end try
end tell

So once I do all those things, I can use a smart group and sort the Spotlight Comment column chronologically to get an accurate sense of the chronological order in which publications discussed events.

This screenshot shows the difference – some of the English newspapers haven’t been converted yet (I’m doing it paper by paper because the papers were often published on different schedules), but here you can see how OS and NS dates were mixed in willy-nilly, say comparing the fixed Flying Boy and Evening Post with the yet-to-be-fixed London Gazette and Daily Courant issues.

DTPO Newspapers redated.png

Of course the reality has to be even more complicated (Because It’s History!), since an English newspaper published on January 1, 1702 OS will publish items from continental newspapers, dating those articles in NS – e.g., a 1702.01.01 OS English newspaper will have an article dated 1702.01.05 NS from a Dutch paper. So when I take notes on a newspaper issue, I’ll have to change the leading NS date of the new note to the date on the article byline, so it will sort chronologically where it belongs. But still.