“Shaving the yak” is a phrase used to describe the process of programming. It alludes to the fact that you often have to take two, or more, steps backward in order to eventually move one step forward. You want a sweater, so first you need to get some yarn, but to do that you have to… and eventually you find yourself shaving a yak. The reason why you even consider shaving a yak is that, once you’ve shaved said yak, you now have lots of yarn, which allows you to make many sweaters. This colorful analogy has a surprising number of online images, and even an O’Reilly book. It’s a thing.
I have been doing a lot of digital yak-shaving over the past four months. Come to think of it, most of my blog posts consist of yak shaving.
So if you’re interested in learning to code with Python but not sure whether it’s worth it, or if you just want to read an overview of how I used Python and QGIS to create a map like this from a big Word document, then continue reading.
At least until the lights (or internet) goes down.
I’m preparing my appeal to you faithful skulkers to assist me in my quixotic quest to create a more robust and usable dataset on early modern European wars. I envision keeping it simple, at least at the start, posting a series of spreadsheets (possibly on Google Sheets) with information about various aspects of early modern warfare. We don’t want to start from scratch, so I’ve downloaded the basic information on the period’s wars and combats (“battles”) from Wikipedia, via Wikidata queries using SPARQL. And I’ve been learning about graph databases in the process, which someone might consider a bonus.
Wikipedia??? Well, the way I see it, they’ve already entered in a lot of basic information, and many of the factual details are probably correct, at least to a first order approximation. So it should speed up the process and allow us to refine and play around with the beta data (say that fast three times) before it’s “complete,” however that’s defined.
Once the data sheets are up online, we can clean that information, I can collate it, and then we can open it to the world to play with – analyze, map, chart, combine with other data, whatever one’s heart desires. If someone wants to deal with the Wikipedia bureaucracy, they can try to inject it back into The Source of All Knowledge.
In the meantime, if you’re curious as to what someone with some programming skills and an efficiency-oriented mindset can create, you should check out the following blog post, wherein a data scientist collects all of the wars listed in Wikipedia (Ancient to recent), and then explores their durations and a few other attributes. Very cool stuff, and you gotta love the graphics. Check it out at https://www.gokhan.io/post/scraping-wikipedia/. And just imagine what one could do with more granular data, and possibly more accurate data as well! Hopefully we’ll find out.
In the meantime, here’s a real simple map from a SPARQL query locating all of the “battles” listed in Wikidata (that have location information).
I’ll let you decide whether Europe and the eastern US really were that much more belligerent than the rest of the world. To the Methodology!
Where I’m at now, after reading more on GIS, historical and Quantum. Here we have the beginnings of my Low Countries theater map, for operational military history.
Features include rivers, the (modern) coastline, capital cities, fortifications (fortresses and forts) by side of garrison, a light tracing of the pré carré fortresses in northern France, and, for kicks, the woods of northern Belgium traced from the Austrian Ferraris maps, c. 1770s.
And more to trace, e.g. from the Pelet 1837 atlas:
Still lots of work to do, cleaning things up and adding additional features, like army marches and camps. Eventually, I’ll even work up to Print Composer and stop taking screenshots.
But in the meantime, progress moves forward.
A few more random maps of the Wars of Italy, just because it’s all I’ve got time for.
First off, the locations of various combats (battles and sieges mostly) from 1494-1559, color-coded by war, with the Natural Earth topo layer as base map. It might be more useful to group the wars together into a smaller number of categories (make a calculated field). Or maybe make them small multiples by war. But it’s a start.
Then, using the Data defined override and Size Assistant style in QGIS 2.18, you can add army sizes to the symbols (sizeA+sizeB), to create a multivariate map. Note, however, that I don’t have very many army size statistics (the no-data events are all those tiny dots), but you get the idea – add a continuous variable to a categorical variable, and you’ve got two dimensions.
Remember, with GIS and a good data set, the world’s your oyster.
Next up – getting that good data set. In other words, setting up the Early Modern Wars database in MS Access. What? You want to see my entity-relationship diagram so far? Sure, why not:
And, once sabbatical hits this summer, I’ll be appealing to y’all (just got back from Texas) to help me fill in the details, to share our knowledge of early modern European warfare with the world.
So now I have to add another letter to the abbreviation – Early Modern European Military Digital Historian. We are approaching LGBTQIA territory here – except narrowing instead of broadening.
And who leads the pack in this exciting sub-sub-sub-subfield? For my money, it would be Spanish scholar Xavier Rubio-Campillo, who’s already published an article using GIS for early modern siege reconstruction (Barcelona 1714), which I highlighted here several years back.
Now he’s applying computer modeling to early modern field battle tactics, during the War of the Spanish Succession, ‘natch: “The development of new infantry tactics during the early eighteenth century: a computer simulation approach to modern military history.” To reproduce his abstract from Academia.edu:
Computational models have been extensively used in military operations research, but they are rarely seen in military history studies. The introduction of this technique has potential benefits for the study of past conflicts. This paper presents an agent-based model (ABM) designed to help understand European military tactics during the eighteenth century, in particular during the War of the Spanish Succession. We use a computer simulation to evaluate the main variables that affect infantry performance in the battlefield, according to primary sources. The results show that the choice of a particular firing system was not as important as most historians state. In particular, it cannot be the only explanation for the superiority of Allied armies. The final discussion shows how ABM can be used to interpret historical data, and explores under which conditions the hypotheses generated from the study of primary accounts could be valid.
Link at https://www.academia.edu/2474571/The_development_of_new_infantry_tactics_during_the_early_eighteenth_century_a_computer_simulation_approach_to_modern_military_history?auto=download&campaign=weekly_digest. Though it may require a subscription.
Maybe someday we military historians will collectively set our sights a little higher than tactics (note the military metaphor), and a little lower than grand strategy? Though, admittedly, that’ll require a lot of hard work at the operational level of war. And maybe even a better sense of what we call these different levels.
More samples of maps I made in a few hours. These are drawn from my War of the Spanish Succession siege dataset, derived from the research appearing in my Vauban under Siege book. In that book I created some maps of the Low Countries theater using Adobe Illustrator – some were decent, others not so much. I’ve posted a few other examples of early modern European military maps here, mostly from the Iberian theater, which I discussed in a Spanish-language article I authored (some examples here).
But now, with QGIS in da house, I can make them a lot quicker. So here are a few examples of my entire WSS siege database mapped, with a few mistakes and a few errors, of course. Ideally, maps like this would’ve been in my dissertation, but that would have meant me graduating in late 2003 instead of late 2002.
The process, for those playing at home: I took my Excel spreadsheet listing 116 sieges (I deleted a few fort sieges because I didn’t want to have to research their lengths and locations), added a column identifying the modern country of each place, converted the spreadsheet into a UTF-8 csv file, then used QGIS’s MMQGIS Geocode plugin to get the lat and long coordinates from Google for each place, placing it as a new layer on top of the Natural Earth base map. I then had to change a few of the coordinates in the QGIS attribute table, mostly because either a) I didn’t specify which Castiglione (or Reggio or Aire) was besieged, or I thought it was Haguenau, Germany, when it was actually Haguenau, France. Fortunately, most of these were pretty obvious from looking at the map, given my knowledge of where the campaigns were conducted. You use the Numerical Vertex Edit plugin to edit coordinates – they cannot be edited in the attribute table. And, fortunately, changing the feature on any level updates it on all other layers.
Then I had to make a new calculated field for the siege length values, because they were imported in as a text string field rather than a decimal numerical field (‘3.8’ instead of 3.8). Once the data was cleaned up, I either used rule-based formatting or graduated symbols to display various attributes about the sieges. Now that I know the procedure, it’ll take just a few minutes to make variations of the map. No more calculating circle diameters in Excel and manually placing them on the map!
First, a map showing 116 siege locations during the war, with black circles indicating those sieges where the besiegers managed to capture the fortress (about 85% overall).
Next, the same map (sans the Layers Panel), but with rule-based symbolism where red circles indicate Allied-conducted sieges, and blue circles indicate sieges undertaken by the Bourbons.
Now, the same basic map, but this time we’re using the numerical siege length field to create graduated point symbols, so we can see the relative length of the sieges. I could, of course, define any min-max diameter for the circles, but if they get too large, you lose the smaller sieges.
Of course, if you just want to be goofy, or simulate what my vision will be like in another ten years, you can make a raster heat map, using the Layer Style-Heatmap option, create your own color ramp from transparent to red, and make a smaller radius. That gives you a map that emphasizes regions which saw many sieges:
I turned on the modern political boundaries, which helps distinguish the Iberian vs. Spanish sieges. Digitized early modern boundaries, and other features, will have to wait until sabbatical.
I haven’t offset those siege symbols for towns that were besieged more than once. Thus, for the first two maps, only one symbol is visible. This is particularly germane for Landau near the Rhine, which saw four sieges, but even the third map doesn’t help much, since three of the sieges lasted between 2.3 and 2.8 months and therefore all three have the same-sized point symbol stacked on top of each other. The heat map, however, emphasizes Landau’s four sieges.
That being said, I did change the render oder (Symbol Levels) on map 1 to have the white circles be drawn on top (Layer Order, white = 1, black = 0). I also put a white outline around each black circle for both maps 1 and 3, so you can see when the circles of several proximate, successful sieges overlap each other (for map 3, Layer Order with smallest/shortest circle drawn on top, with largest circle drawn on the bottom).
Most importantly, I haven’t yet figured out how to combine two attributes into one point symbol (e.g. size of circle as length and besieging side as color of the same circle), but you always need to have goals.
But wait, there’s more! There’s probably some way I could split the Allied and Bourbon into separate layers, make a raster heat map for each of those, and then overlay them.
Just spitballin’ here, but you could also calculate a siege index (maybe number of siege-days) and map that, possibly as a raster heat map. If you run the raster heat map on the siege length layer, you get a rasterized version of map 3:
And, of course, the beauty of GIS is that you can combine this data in any way you’d like, combine it with other data, and focus on subsets of the data. Maybe you want a separate map for each campaign year. Throw in field battles, or the amphibious landings. Add in roads, fortresses, logistical centers, and so on. Maybe you want to spatially analyze these features. The world’s your oyster. Mine too.
Seriously though. I’ve known about the concept of ‘regular expressions’ for years, but for some reason I never took the plunge. And now that I have, my mind is absolutely blown away. Remember all those months in grad school (c. 1998-2000) when I was OCRing, proofing and manually parsing thousands of letters into my Access database? Well I sure do.
Twenty years later, I now discover that I could’ve shaved literally months off that work, if only I’d adopted the regex way of manipulating text. I’ll blame it on the fact that “digital humanities” wasn’t even a thing back then – check out Google Ngram Viewer if you don’t believe me.
So let’s start at the beginning. Entry-level text editing is easy enough: you undoubtedly learned long ago that in a text program like Microsoft Word you can find all the dates in a document – say 3/15/1702 and 3/7/1703 and 7/3/1704 – using a wildcard search like 170^#, where ^# is the wildcard for any digit (number). That kind of search will return 1701 and 1702 and 1703… But you’ve also undoubtedly been annoyed when you next learn that you can’t actually modify all those dates, because the wildcard character will be replaced in your basic find-replace with a single character. So, for example, you could easily convert all the forward slashes into periods, because you simply replace every slash with a period. But you can’t turn a variety of dates (text strings, mind you, not actual date data types) from MM/DD/YYYY into YYYY.MM.DD, because you need wildcards to find all the digit variations (3/15/1702, 6/7/1703…), but you can’t keep those values found by wildcards when you try to move them into a different order. In the above example, trying to replace 170^# with 1704 will convert every year with 1704, even if it’s 1701 or 1702. So you can cycle through each year and each month, like I did, but that takes a fair amount of time as the number of texts grow. This inability to do smart find-replace is a crying’ shame, and I’ve gnashed many a tooth over this quandary.
Enter regular expressions, aka regex or grep. I won’t bore you with the basics of regex (there’s a website or two on that), but will simply describe it as a way to search for patterns in text, not just specific characters. Not only can you find patterns in text, but with features called back references and look-aheads/look-backs (collectively: “lookarounds”), you can retain those wildcard characters and manipulate the entire text string without losing the characters found by the wildcards. It’s actually pretty easy: