Back in the fall of 2004 I found myself staring at the Wikipedia page for Mali, trying to code something about the nature of Tuareg rebels. This was for a stats class where I needed to supplement Nick Sambanis’ civil war dataset (I think I was trying to explain which civil wars end up conventional and which end up insurgencies; data later lost to hard drive crash pre-Dropbox). I found the whole affair fairly absurd, since I knew nothing about Mali, or anywhere else. Which is when I decided to focus in on more case-specific analyses, and eventually ended up specializing in South Asia.
In the last few years I’ve come back to the big quantitative, cross-national datasets. It began after I published a conceptual piece in 2012 on “wartime political orders,” laying the early basis for my current book project. I had grand ambitions of pulling some of the existing data on civil wars, armed groups, peace deals, ceasefires, and the like off the shelf (Walter, Fortna, UCDP, PRIO, Sambanis, Fearon and Laitin, etc.) to code these wartime political orders, and complement them with my own case studies – and voila, book!
But it quickly turned out that the landscape of political violence I was familiar with in South Asia was only partially, at best, represented in the most-cited and important datasets – some groups and conflicts didn’t exist in the datasets, some were coded radically differently across datasets (depending on who you believe among two of the most prominent early 2000s datasets, India’s Northeast got itself a civil war either in 1952 or 1990), others popped in and out for only specific years of their much longer existences, most of the ceasefires and some of the peace deals I was seeing in the cases were missing, etc.
Much of this was because I was interested in a broader phenomenon – the full range of armed group-government interactions, across levels of violence and state-group cooperation. So it was unfair to blame the datasets for that: they were just measuring something different. But some of the issues were because there were no or very thin/inappropriate sources, which led to lots of missingness even where there should have been data. It became clear that if I wanted to actually measure this stuff, I’d have to do it as a complement to existing data, rather than using them directly.
The toughest thing was nailing down what exactly the dependent variable was I wanted to measure – I became dissatisfied with wartime political orders (what about non-wartime contexts?; the active/passive/no cooperation thing wasn’t ideal; the territorial segmentation wasn’t all that theoretically interesting in retrospect) – but didn’t know what to put in its place precisely, much less how to measure it.
This triggered a long process of conceptualizing, operationalizing, and, finally, now, building new data to represent these state-group armed orders, as part of a broader approach I call armed politics. A 2015 JCR piece, ostensibly about militias but really about all kinds of groups, started to think through these outcomes, but didn’t figure out how to measure anything and remained conceptually fuzzy in some important ways (was I looking at government strategy? Dyadic orders?). Fieldwork in India, Thailand, and Burma/Myanmar (plus some awesome libraries in Singapore), reading books, looking over datasets, and accumulating government reports and press accounts is how I had to figure this out.
The first piece to offer an empirical agenda for systematically(-ish?) measuring these orders just came out in the Journal of Peace Research. I’ve disaggregated the JPR version of coding slightly (as I note in a footnote, military hostilities can be broken further into containment and total war orders), but the basic approach is what a research team has been working on for the last year in my Armed Orders in South Asia (AOSA) project. They’ve been writing up case studies on each state-group dyad based on a variety of sources, many of them local or specialist. Some of the key characteristics we get at in the case studies are then quantified (in a way that allows many of the state-group dyads to be directly linked to Uppsala conflict dyads, though many of my dyads don’t show up in UCDP). Eventually both the cases and dataset will get publicly posted with lots of footnotes and bibliographies.
So what have I learned from all of this?
- Taking a look under the hood of prominent datasets, and especially looking at their sourcing documents (if they exist), is incredibly valuable. There can be some interesting stuff under there. We face powerful incentives to grab data and run with them, while sidestepping questions of quality and sourcing. That’s fine and tons of great papers get into top journals doing so, but there are some issues that are rarely discussed. Open up the sourcing and take a hard look at the cases you know well, and see what you think – and what its implications are for description and inference.
- Most of the cross-national and cross-dyad data are based on case studies that form the basis for quantitative data. Good qualitative evidence, in this context, is necessary for good quantitative evidence; if the qualitative data is bad, there’s no reason to expect good quantitative data. This is especially problematic if we have reason to think the quality of the qualitative sources is systematically biased, across regions, time, types of conflicts, etc (I have a sneaking suspicion that Manipur, Shan State, and Balochistan get way less international attention and scholarly coverage than, say, Northern Ireland or Iraq). All datasets, including mine, will miss or get things wrong; the question is how much bias there is in these misfires, and what you can live with for the questions you’re asking.
- There are hard trade-offs involved, though – doing really thorough deep dives into local sources, the secondary literature, etc. is only possible for a subset of cases, rather than being able to cover everything all the time. Choices have to be made, none will be perfect, and there are a variety of legitimate bets to make on what will be most fruitful.
- Going from concept to theory to data takes a very long time. Trying to figure out a new dependent variable at any of these levels is tricky. I don’t have the methodological firepower of many of my (brilliant) peers, so I make my living, in part, by coming up with interesting new ideas and concepts. At various points this ended up taking me down dead ends, aimless wanderings, and data approaches that unambiguously failed. And once I ended up getting grant money to do the broader armed politics data project and figured out the timeline involved, I realized the book would take a lot longer than I expected. The best-laid plans often fall quickly by the wayside; that’s just part of the business.