Thanks for reading or listening! If you enjoyed this post, you can show your support by liking and sharing, buying me a coffee, or subscribing!
This is going to be a nerdy post.
Now, I get it, this is a Substack about using data to make sense of international relations, foreign policy, and other global issues. How much nerdier could it get?
A lot more.
A ways back I wrote a post about country dissatisfaction with the benefits it currently receives from the international system and how this predicts international conflict. The premise was simple: the less a country gets out of the international system than it thinks it ought to, the more it’s likely to intervene to change the system in its favor, and one way to go about intervening is to use military force. Hence, a simple hypothesis: more dissatisfaction leads to an increase in the likelihood that a country initiates international conflicts.
In that post, I found strong support for that hypothesis, which is all well and good. One thing I didn’t do was spend much time thinking about alternative ways of measuring dissatisfaction. The measure I used came from a PhD dissertation by Daniel Kent completed in 2020. Kent’s approach involved quantifying two concepts that he argued were relevant to measuring dissatisfaction: (1) the benefits a country expects to get out of the international system, and (2) the benefits a country actually gets out of the international system. Once those two concepts were quantified, calculating dissatisfaction would be as simple as taking the difference between expectations and benefits.
One of the hardest things to do as an international relations scholar is quantify concepts of interest to international relations scholars. I know this first-hand because it’s what I tried to do in my own dissertation, which blessedly later ended up published in a solid academic journal (though in much shorter form and with the absence of the game theoretic model I developed to work out my argument). All that to say, I have mad respect for what Kent did in his dissertation.
So in today’s post, I want to tread with all due caution and reverence for the work that went into quantifying dissatisfaction as I simultaneously question whether all that work yielded much additional benefit beyond what a much simpler approach would have provided.
Let’s dig in.
What did you expect?
Kent’s measure, as I mentioned, is composed of two parts: expectations and benefits. Today I want to home in on the first of those — expectations. I’m going to take Kent’s measure of benefits as given in this post. This measure seems sound to me. Basically he defined benefits as a country’s average centrality in global trade, arms, alliance, and diplomacy networks. The more central a country across any one of these dimensions, the more it benefits from its position in the international system. Sounds good to me, and as long as you have data for those things, centrality in these networks is pretty trivial to quantify.
The measure I most want to interrogate is expectations — not because I think Kent’s approach was necessarily wrong, but because it was highly technical and computationally intensive.
He captured expectations by multiplying two things with each other. Thing 1 was a country’s Composite Index of National Capabilities (or CINC) score. This index is produced by folks at the Correlates of War Project, and it’s composed of six component measures:
A country’s military expenditures.
A country’s total number of military personnel.
A country’s iron and steel production.
A country’s primary energy consumption in coal-ton equivalents.
A country’s total population.
A country’s urban population.
These measures are combined and converted into a 0-1 scale where a 1 means a country has all the power in the world along these six dimensions and a 0 means it has none.
CINC certainly has its critics, but it remains the workhorse measure of power in the field of international relations, so Kent wasn’t doing anything particularly controversial by using it.
Thing 2 that he incorporated was a refined version of a new measure recently introduced to the field known as Dispute Outcome Expectations (or DOE). I thought this choice was cool because one of the people responsible for developing it, Rob Carroll, was a member of my dissertation committee. The idea behind DOE was to go beyond measuring power in pure material terms by thinking about its reputational dimensions. Power isn’t just a raw resource; it’s a thing that you have a reputation for using either effectively or ineffectively. Using a variety of machine learning algorithms, DOE scores are based on the aggregated predictions from these different machine learners for the likelihood that a country would win in a militarized dispute with another country. The factors used to make this prediction are the separate components of CINC. Importantly, the original DOE scores were unique to pairs of countries. Kent decided to average them by country to yield a single measure of a country’s average chance of winning a conflict in general in a given year.
Kent then defined expected benefits as the product of CINC and a country’s averaged DOE score. Expectations are one part material power and one part reputational, and the idea is that more material power combined with a reputation for using that power effectively to win disputes ought to make a country expect to get more benefits from its position in the international system.
You still with me? I hope so, because this is where I’m going to throw a wrench in the works. There are easier ways to quantify expectations, and I want to consider some simple alternatives that anyone with some basic data wrangling skills could put together.
K.I.S.S.
I’ve written before about competing measures of power, in particular about the difference between CINC and another measure called surplus domestic product (or SDP). Whereas CINC captures something akin to realized capabilities, SDP is closer to capturing potential capabilities — those latent resources a country can call upon to increase its realized capabilities if it wanted or needed to.
Here’s my question: what if I just used a country’s yearly share of SDP as a measure of expectations? Would this give me an otherwise similar measure of dissatisfaction? And would it perform just as well in predicting conflict initiation as Kent’s original measure?
To answer this question, I got my hands on the dataset Kent put together for his dissertation and added to it a measure of SDP, which I normalized to a 0-1 scale by taking a given country’s SDP and dividing it by the sum total of all countries’ SDPs in a given year.
I played with the data a bit, too, and tried out four different approaches to measuring expectations: using CINC scores and SDP shares on their own, and then multiplying each by DOE scores. The below figure shows the smoothed average trend in each of these measures of expectations over the past couple of centuries. There are two patterns that stand out in the data. The first is that all of these measures trend downward over time. That’s to be expected. Each is a measure of relative power, and over time more and more countries have come into existence. The second thing to note is that all of the measures track with each other quite closely. Raw CINC and SDP shares nearly overlap, and the versions of each multiplied by averaged DOE scores also overlap; though these modified versions are a bit lower than the unadjusted CINC and SDP shares in the first century and a half worth of data.
All in all, there isn’t that much difference on average between these alternative measures of expectations. But when we go a step further and use them to quantify dissatisfaction (taking the difference between each and actual benefits) we do see some differences, as you can see in the next figure. When using raw CINC or SDP shares to measure dissatisfaction, countries look on average more satisfied in the first 150 years in the data. When you multiply each of those by DOE scores, their measured dissatisfaction is much higher in those same 150 years. You only start to see a clear convergence in these measures after the 1950s. But, where differences are apparent, the magnitude is actually quite small. The dissatisfaction measures I’ve created are just the simple difference between expectations and benefits, both of which are on the 0-1 scale. That means in the extremes dissatisfaction can range from -1 to 1. The average differences in these measures over time are on the order of 0.015 points max.
The absence of dramatic differences in these measures, at least on average, suggests that keeping it simple might be just as good as doing something more complicated. The real test, however, will come from seeing how well these different measures do in predicting conflict initiation. So I estimated some simple regression models where I just alternated which measure I used to capture dissatisfaction. In each, I controlled for country population, democracy scores, and random country-level differences. You can see the results in the figure below which shows the chance of conflict initiation by level of dissatisfaction. Each of the measures of dissatisfaction predicts an increase in the chances that a country initiates a militarized dispute against another country, and you really have to squint to see much difference in the results.
As an added step to see if there’s much material difference in the predictive power of each of these measures, I checked the root mean squared error in the model predictions — this is just a way to quantify how different the predicted probabilities from the model are from actual discrete outcomes (did a country initiate a dispute or not?). You have to go to the fourth (yes, the forth!) decimal point to find a difference between these competing models. Practically speaking, using SDP shares to measure expectations yields predictions of no better or worse quality than using Kent’s more complex product of CINC and averaged DOE scores, which require diverse measures to construct and, for the DOE scores in particular, intensive computational power to calculate.
The Hard Lesson of Hard Work
There’s a good lesson to be learned from this exercise. Sometimes, hard work adds up to very little added benefit. But just as important: you have to do the hard work to find out. If we academics were really doing our jobs well, you’d see slews of studies talking about how researchers tried an idea and it didn’t make much difference. Unfortunately, that kind of study rarely sees the light of day. This, of course is why there’s a replication crisis in scientific fields. Editors and reviewers like seeing statistically significant results that support most or all of the hypotheses tested by authors. And because people have to publish research to keep their jobs, this creates a vicious cycle.
But the replication crisis is beside the point. Kent’s dissertation is very good, and his proposed measure of dissatisfaction is, as far as I’m aware, the first attempt to quantify how dissatisfied a country is with its position in the international system. That’s a major contribution. What I’ve stumbled onto here is the realization that measuring dissatisfaction probably didn’t have to be as hard as Kent made it out to be. His measure of benefits is quite straightforward; his measure of expectations could have been, too. The allure of fancy techniques has the same power over researchers that a flame has over moth. The more time I spend doing research, to more I realize that simple is often best.
You can access the code to replicate the analysis in this post here.
Thanks for reading! If you enjoyed this post, you can show your support by liking and sharing, buying me a coffee, or subscribing!