Archive | Research RSS for this section

For the EMEMDH in your life

So now I have to add another letter to the abbreviation – Early Modern European Military Digital Historian. We are approaching LGBTQIA territory here – except narrowing instead of broadening.

And who leads the pack in this exciting sub-sub-sub-subfield? For my money, it would be Spanish scholar Xavier Rubio-Campillo, who’s already published an article using GIS for early modern siege reconstruction (Barcelona 1714), which I highlighted here several years back.

Now he’s applying computer modeling to early modern field battle tactics, during the War of the Spanish Succession, ‘natch: “The development of new infantry tactics during the early eighteenth century: a computer simulation approach to modern military history.” To reproduce his abstract from Academia.edu:

Computational models have been extensively used in military operations research, but they are rarely seen in military history studies. The introduction of this technique has potential benefits for the study of past conflicts. This paper presents an agent-based model (ABM) designed to help understand European military tactics during the eighteenth century, in particular during the War of the Spanish Succession. We use a computer simulation to evaluate the main variables that affect infantry performance in the battlefield, according to primary sources. The results show that the choice of a particular firing system was not as important as most historians state. In particular, it cannot be the only explanation for the superiority of Allied armies. The final discussion shows how  ABM can be used to interpret historical data, and explores under which conditions the hypotheses generated from the study of primary accounts could be valid.

Link at https://www.academia.edu/2474571/The_development_of_new_infantry_tactics_during_the_early_eighteenth_century_a_computer_simulation_approach_to_modern_military_history?auto=download&campaign=weekly_digest. Though it may require a subscription.

Maybe someday we military historians will collectively set our sights a little higher than tactics (note the military metaphor), and a little lower than grand strategy? Though, admittedly, that’ll require a lot of hard work at the operational level of war. And maybe even a better sense of what we call these different levels.

Advertisements

Playing around with GIS

More samples of maps I made in a few hours. These are drawn from my War of the Spanish Succession siege dataset, derived from the research appearing in my Vauban under Siege book. In that book I created some maps of the Low Countries theater using Adobe Illustrator – some were decent, others not so much. I’ve posted a few other examples of early modern European military maps here, mostly from the Iberian theater, which I discussed in a Spanish-language article I authored (some examples here).

But now, with QGIS in da house, I can make them a lot quicker. So here are a few examples of my entire WSS siege database mapped, with a few mistakes and a few errors, of course. Ideally, maps like this would’ve been in my dissertation, but that would have meant me graduating in late 2003 instead of late 2002.

The process, for those playing at home: I took my Excel spreadsheet listing 116 sieges (I deleted a few fort sieges because I didn’t want to have to research their lengths and locations), added a column identifying the modern country of each place, converted the spreadsheet into a UTF-8 csv file, then used QGIS’s MMQGIS Geocode plugin to get the lat and long coordinates from Google for each place, placing it as a new layer on top of the Natural Earth base map. I then had to change a few of the coordinates in the QGIS attribute table, mostly because either a) I didn’t specify which Castiglione (or Reggio or Aire) was besieged, or I thought it was Haguenau, Germany, when it was actually Haguenau, France. Fortunately, most of these were pretty obvious from looking at the map, given my knowledge of where the campaigns were conducted. You use the Numerical Vertex Edit plugin to edit coordinates – they cannot be edited in the attribute table. And, fortunately, changing the feature on any level updates it on all other layers.

Then I had to make a new calculated field for the siege length values, because they were imported in as a text string field rather than a decimal numerical field (‘3.8’ instead of 3.8). Once the data was cleaned up, I either used rule-based formatting or graduated symbols to display various attributes about the sieges. Now that I know the procedure, it’ll take just a few minutes to make variations of the map. No more calculating circle diameters in Excel and manually placing them on the map!

First, a map showing 116 siege locations during the war, with black circles indicating those sieges where the besiegers managed to capture the fortress (about 85% overall).

Screenshot 2017-08-27 15.35.04.png

 

Next, the same map (sans the Layers Panel), but with rule-based symbolism where red circles indicate Allied-conducted sieges, and blue circles indicate sieges undertaken by the Bourbons.

Screenshot 2017-08-27 15.31.21.png

Now, the same basic map, but this time we’re using the numerical siege length field to create graduated point symbols, so we can see the relative length of the sieges. I could, of course, define any min-max diameter for the circles, but if they get too large, you lose the smaller sieges.

Screenshot 2017-08-27 15.31.32.png

Of course, if you just want to be goofy, or simulate what my vision will be like in another ten years, you can make a raster heat map, using the Layer Style-Heatmap option, create your own color ramp from transparent to red, and make a smaller radius. That gives you a map that emphasizes regions which saw many sieges:

Screenshot 2017-08-27 15.41.59.png

Miscellaneous Notes:

I turned on the modern political boundaries, which helps distinguish the Iberian vs. Spanish sieges. Digitized early modern boundaries, and other features, will have to wait until sabbatical.

I haven’t offset those siege symbols for towns that were besieged more than once. Thus, for the first two maps, only one symbol is visible. This is particularly germane for Landau near the Rhine, which saw four sieges, but even the third map doesn’t help much, since three of the sieges lasted between 2.3 and 2.8 months and therefore all three have the same-sized point symbol stacked on top of each other. The heat map, however, emphasizes Landau’s four sieges.

That being said, I did change the render oder (Symbol Levels) on map 1 to have the white circles be drawn on top (Layer Order, white = 1, black = 0). I also put a white outline around each black circle for both maps 1 and 3, so you can see when the circles of several proximate, successful sieges overlap each other (for map 3, Layer Order with smallest/shortest circle drawn on top, with largest circle drawn on the bottom).

Most importantly, I haven’t yet figured out how to combine two attributes into one point symbol (e.g. size of circle as length and besieging side as color of the same circle), but you always need to have goals.

But wait, there’s more! There’s probably some way I could split the Allied and Bourbon into separate layers, make a raster heat map for each of those, and then overlay them.

Just spitballin’ here, but you could also calculate a siege index (maybe number of siege-days) and map that, possibly as a raster heat map. If you run the raster heat map on the siege length layer, you get a rasterized version of map 3:

Screenshot 2017-08-27 15.58.53.png

And, of course, the beauty of GIS is that you can combine this data in any way you’d like, combine it with other data, and focus on subsets of the data. Maybe you want a separate map for each campaign year. Throw in field battles, or the amphibious landings. Add in roads, fortresses, logistical centers, and so on. Maybe you want to spatially analyze these features. The world’s your oyster. Mine too.

Where have you been all my life?

Seriously though. I’ve known about the concept of ‘regular expressions’ for years, but for some reason I never took the plunge. And now that I have, my mind is absolutely blown away. Remember all those months in grad school (c. 1998-2000) when I was OCRing, proofing and manually parsing thousands of letters into my Access database? Well I sure do.

Twenty years later, I now discover that I could’ve shaved literally months off that work, if only I’d adopted the regex way of manipulating text. I’ll blame it on the fact that “digital humanities” wasn’t even a thing back then – check out Google Ngram Viewer if you don’t believe me.

So let’s start at the beginning. Entry-level text editing is easy enough: you undoubtedly learned long ago that in a text program like Microsoft Word you can find all the dates in a document – say 3/15/1702 and 3/7/1703 and 7/3/1704 – using a wildcard search like 170^#, where ^# is the wildcard for any digit (number). That kind of search will return 1701 and 1702 and 1703… But you’ve also undoubtedly been annoyed when you next learn that you can’t actually modify all those dates, because the wildcard character will be replaced in your basic find-replace with a single character. So, for example, you could easily convert all the forward slashes into periods, because you simply replace every slash with a period. But you can’t turn a variety of dates (text strings, mind you, not actual date data types) from MM/DD/YYYY into YYYY.MM.DD, because you need wildcards to find all the digit variations (3/15/1702, 6/7/1703…), but you can’t keep those values found by wildcards when you try to move them into a different order. In the above example, trying to replace 170^# with 1704 will convert every year with 1704, even if it’s 1701 or 1702. So you can cycle through each year and each month, like I did, but that takes a fair amount of time as the number of texts grow. This inability to do smart find-replace is a crying’ shame, and I’ve gnashed many a tooth over this quandary.

Enter regular expressions, aka regex or grep. I won’t bore you with the basics of regex (there’s a website or two on that), but will simply describe it as a way to search for patterns in text, not just specific characters. Not only can you find patterns in text, but with features called back references and look-aheads/look-backs (collectively: “lookarounds”), you can retain those wildcard characters and manipulate the entire text string without losing the characters found by the wildcards. It’s actually pretty easy:

Read More…

The Summer of Digital

Yep, it’s been a computational summer. Composed mostly of reading up on all things digital humanities. (Battle book? What battle book?) Most concretely, that’s meant setting up a modest Digital History Lab for our department (six computers, book-microfilm-photo scanners, a Microsoft Surface Hub touch display, and various software), and preparing for a brand new Intro to Digital History course, slated to kick off in a few weeks.

I’ve always been computer-curious, but it wasn’t until this summer that I fully committed to my inner nerdiness, and dove into the recent shenanigans of “digital humanities.” Primarily this meant finally committing to GIS, followed by lots of textual analysis tools, and brushing up on my database skills. But I’ve even started learning Python and a bit more AppleScript, if you can believe it.

So, in future posts, I’ll talk a little less about Devonthink and a bit more about other tools that will allow me to explore early modern European military history in a whole new way.

Stay tuned…

The Google Giveth and the Google Taketh Away

In other words, hopefully you’ve already downloaded all those tasty EMEMH works from Google Books, like I’ve warned. Because some of them are disappearing from Full View, as publishing companies (I’m guessing) pay Google some money to sell print copies on Amazon and elsewhere. (See, I knew my hoarding instincts and general obsessive-compulsiveness would come in handy.)

But all hope is not lost, for if you can still find interest EMEMH PDFs, Google Books has recently decided to include the OCRed text layer with the PDF download as well, which means they are searchable. Just don’t look too closely at the results…

Automating Newspaper Dates, Old Style (to New Style)

If you’ve been skulking over the years, you know I have a sweet spot for Devonthink, a receptacle into which I throw all my files (text, image, PDF…) related to research and teaching. I’ve been modifying my DTPO workflow a bit over the past week, which I’ll discuss in the future.

But right now, I’ll provide a little glimpse into my workflow for processing the metadata of the 20,000 newspaper issues (yes, literally 20,000 files) that I’ve downloaded from various online collections over the years: Google Books, but especially Gale’s 17C-18C Burney and Nicholls newspaper collections. I downloaded all those files the old-fashioned way (rather than scraping them), but just because you have all those PDFs in your DTPO database, that still doesn’t mean that they’re necessarily in the easiest format to use. And maybe you made a minor error, but one that is multiplied by the 20,000 times you made that one little error. So buckle up as I describe the process of converting text strings into dates and then back, with AppleScript. Consider it a case study of problem-solving through algorithms.

The Problem(s)

I have several problems I need to fix at this point, generally falling under the category of “cleaning” (as they say in the biz) the date metadata. Going forward, most of the following modifications won’t be necessary.

First, going back several years I stupidly saved each newspaper issue by recording the first date for each issue. No idea why I didn’t realize that the paper came out on the last of those dates, but it is what it is.

Screen Shot 2014-03-09 at 7.53.14 PM

London Gazette: published on Dec. 13 or Dec. 17?

Secondly, those English newspapers are in the Old Style calendar, which the English stubbornly clung to till mid-century. But since most of those newspapers were reporting on events that occurred on the Continent, where they used New Style dates, some dates need manipulating.

Automation to the Rescue!

To automate this process (because I’m not going to re-date 20,000 newspaper issues manually), I’ve enlisted my programmer-wife (TM) to help me automate the process. She doesn’t know the syntax of AppleScript very well, but since she programs in several other languages, and because most programming languages use the same basic principles, and because there’s this Internet thing, she was able to make some scripts that automate most of what I need. So what do I need?

First, for most of the newspapers I need to add several days to the listed date, to reflect the actual date of publication – in other words, to convert the first date listed in the London Gazette example above (Dec. 13) into the second date (Dec. 17). So I need to take the existing date, listed as text in the format 1702.01.02, convert it from a text string into an actual date, and then add several days to it, in order to convert it to the actual date of publication. How many days exactly?

Well, that’s the thing about History – it’s messy. Most of these newspapers tended to be published on a regular schedule, but not too regular. So you often had triweekly publications (published three times per week), that might be published in Tuesday-Thursday, Thursday-Saturday, and Saturday-Tuesday editions. But if you do the math, that means the Saturday-Tuesday issue covers a four-day range, whereas the other two issues per week only cover a three-day range. Since this is all about approximation and first-pass cleaning, I’ll just assume all the issues are three-day ranges, since those should be two-thirds of the total number of issues. For the rest, I have derivative code that will tweak those dates as needed, e.g. add one more to the resulting date if it’s a Saturday-Tuesday issue, instead of a T-R or R-S issue. If I was really fancy, I’d try to figure out how to convert it to weekday and tell the code to treat any Tuesday publication date as a four-day range (assuming it knows dates before 1900, which has been an issue with computers in the past – Y2k anyone?).

So the basic task is to take a filename of ‘1702.01.02 Flying Post.pdf’, convert the first part of the string as text (the ‘1702.01.02’) into a date by defining the first four characters as a year, the 6th & 7th characters as a month…, then add 2 days to the resulting date, and then rename the file with this new date, converted back from date into a string with the format YYYY.MM.DD. Because I was consistent in that part of my naming convention, the first ten characters will always be the date, and the periods can be used as delimiters if needed. Easy-peasey!

But that’s not all. I also need to then convert that date of publication to New Style by adding 11 days to it (assuming the dates are 1700 or later – before 1700 the OS calendar was 10 days behind the NS calendar). But I want to keep the original OS publication date as well, for citation purposes. So I replace the old OS date on the front of the filename with the new NS date, and append the original date to the end of the filename with an ‘OS’ after it for good measure (and delete the .pdf), and Bob’s your uncle. In testing, it works when you shift from one month to another (e.g. January 27 converts to February 7), and even from year to year. I won’t worry about the occasional leap year (1704, 1708, 1712). Nor will I worry about how some newspapers used Lady Day (March 25) as their year-end, meaning that they went from December 30, 1708 to January 2, 1708, and only caught up to 1709 in late March. Nor does it help that their issue numbers are often wrong.

I’m too lazy to figure out how to make the following AppleScript code format like code in WordPress, but the basics look like this:
–Convert English newspaper Title from OSStartDate to NSEndDate & StartDate OS, +2 for weekday
— Based very loosely off Add Prefix To Names, created by Christian Grunenberg Sat May 15 2004.
— Modified by Liz and Jamel Ostwald May 26 2017.
— Copyright (c) 2004-2014. All rights reserved.
— Based on (c) 2001 Apple, Inc.

tell application id “DNtp”
try
set this_selection to the selection
if this_selection is {} then error “Please select some contents.”

repeat with this_item in this_selection

set current_name to the name of this_item
set mydate to texts 1 thru ((offset of ” ” in current_name) – 1) of current_name
set myname to texts 11 thru -5 of current_name

set newdate to the current date
set the year of newdate to (texts 1 thru 4 of mydate)
set the month of newdate to (texts 6 thru 7 of mydate)
set the day of newdate to (texts 9 thru 10 of mydate)

set enddate to newdate + (2 * days)
set newdate to newdate + (13 * days)
tell (newdate)
set daystamp to day
set monthstamp to (its month as integer)
set yearstamp to year
end tell

set daystamp to (texts -2 thru -1 of (“0” & daystamp as text))
set monthstamp to (texts -2 thru -1 of (“0” & monthstamp as text))

set formatdate to yearstamp & “.” & monthstamp & “.” & daystamp as text

tell (enddate)
set daystamp2 to day
set monthstamp2 to (its month as integer)
set yearstamp2 to year
end tell

set daystamp2 to (texts -2 thru -1 of (“0” & daystamp2 as text))
set monthstamp2 to (texts -2 thru -1 of (“0” & monthstamp2 as text))

set formatenddate to yearstamp2 & “.” & monthstamp2 & “.” & daystamp2 as text

set new_item_name to formatdate & myname & ” ” & formatenddate & ” OS”
set the name of this_item to new_item_name

end repeat
on error error_message number error_number
if the error_number is not -128 then display alert “DEVONthink Pro” message error_message as warning
end try
end tell

So once I do all those things, I can use a smart group and sort the Spotlight Comment column chronologically to get an accurate sense of the chronological order in which publications discussed events.

This screenshot shows the difference – some of the English newspapers haven’t been converted yet (I’m doing it paper by paper because the papers were often published on different schedules), but here you can see how OS and NS dates were mixed in willy-nilly, say comparing the fixed Flying Boy and Evening Post with the yet-to-be-fixed London Gazette and Daily Courant issues.

DTPO Newspapers redated.png

Of course the reality has to be even more complicated (Because It’s History!), since an English newspaper published on January 1, 1702 OS will publish items from continental newspapers, dating those articles in NS – e.g., a 1702.01.01 OS English newspaper will have an article dated 1702.01.05 NS from a Dutch paper. So when I take notes on a newspaper issue, I’ll have to change the leading NS date of the new note to the date on the article byline, so it will sort chronologically where it belongs. But still.

There’s gotta be a better way

In preparation for a new introductory digital history course that I’ll be teaching in the fall, I’ve been trying to think about how to share my decades of accumulated computer wisdom with my students (says the wise sage, stroking his long white beard). Since my personal experience with computers goes back to the 80s – actually, the late 70s with Oregon Trail on dial-up in the school library – I’m more of a Web 1.0 guy. Other than blogs, I pretty much ignore social media like Facebook and Twitter (not to mention Snapchat, Instagram, Pinterest…), and try to do most of my computer work on a screen larger than 4″. So I guess that makes me a kind of cyber-troglodyte in 2017. But I think that does allow me a much broader perspective of what computers can and can’t do. One thing I have learned to appreciate, for example, is how many incremental workflow improvements are readily available, shortcuts that don’t require writing Python from the terminal line.

As a result, I’ll probably start the course with an overview of the variety of ways computers can help us complete our tasks more quickly and easily, which requires understanding the variety of ways in which we can achieve these efficiencies. After a few minutes of thought (and approval from my “full-stack” computer-programming wife), I came up with this spectrum that suggests the ways in which we can make computers do more of our work for us. Toil, silicon slave, toil!

Computer automation spectrum.png

Automation Spectrum: It’s Only a Model

Undoubtedly others have already expressed this basic idea, but most of the digital humanities/digital history I’ve seen online is much more focused on the extreme right of this spectrum (e.g. the quite useful but slightly intimidating Programming Historian) – this makes sense if you’re trying to distantly read big data across thousands of documents. But I’m not interested in the debate whether ‘real’ digital humanists need to program or not, and in any case I’m focused on undergraduate History majors that often have limited computer skills (mobile apps are just too easy). Therefore I’m happy if I can remind students that there are a large variety of powerful automation features available to people with just a little bit of computer smarts and an Internet connection, things that don’t require learning to speak Javascript or Python fluently. Call it kaizen if you want. The middle of the automation spectrum, in other words.

So I’ll want my students, for example, to think about low-hanging fruit (efficiency fruit?) that they can spend five minutes googling and save themselves hours of mindless labor. As an example, I’m embarrassed to admit that it was only when sketching this spectrum that I realized that I should try to automate one of the most annoying features of my current note-taking system, the need to clean up hundreds of PDFs downloaded from various databases: Google Books, Gale’s newspaper and book databases, etc. If you spend any time downloading early modern primary sources (or scan secondary sources), you know that the standard file format continues to be Adobe Acrobat PDFs. And if you’ve seen the quality of early modern OCR’d text, you know why having the original page images is a good idea.

But you may want, for example, to delete pages from PDFs that include various copyright text – that text will confuse DTPO’s AI and your searches. I’m sure there are more sophisticated ways of doing that, but the spectrum above should prompt you to wonder whether Adobe Acrobat has some kind of script or macro feature that might speed up deleting such pages from 1,000s (literally) of PDF documents that you’ve downloaded over the years. And, lo and behold, Adobe Acrobat does indeed have an automation feature that allows you to carry out the same PDF manipulation again and again. Once you realize “there’s gotta be a better way!”, you only need to figure out what that feature is called in the application in question. For Adobe Acrobat it used to be called batch processing, but in Adobe Acrobat Pro DC such mass manipulations now fall under the Actions moniker. So google ‘Adobe Acrobat Actions’ and you’ll quickly find websites that allow you to download various actions people have created. Which allows you to quickly learn how the feature works, and to modify existing actions. For example, I made this Acrobat Action to add “ps” (primary source) to the Keywords metadata field of every PDF file in the designated folder:

Screenshot 2017-05-10 18.52.17.png

I already copied and tweaked macros and Applescripts that will add Keywords to rich text files in my Devonthink database, but this Adobe solution is ideal after I’ve downloaded hundreds of PDFs from, say, a newspaper database.

Similarly, this next action will delete the last page of every PDF in the designated folder. (I just hardcoded to delete page 4, because I know newspaper X always has 4 pages – I can sort by file size to locate any outliers – and the last page is always the copyright page with the nasty text I want to delete. I can, for example, change the exact page number for each newspaper series, though there’s probably a way to make this a variable that the user can specify with each use):

Screenshot 2017-05-10 18.52.43.png

Computers usually have multiple ways to do any specific task. For us non-programmers, the internet is full of communities of nerds who explain how to automate all sorts of software tasks – forums (fora?) are truly a god-send. But it first requires us to expect more from our computers and our software. For any given software, RTFM (as they say), and then check out the software’s website forum – you’ll be amazed at the stuff you find. Hopefully all that time you save from automation won’t be spent obsessively reading the forum!