The Metaculus Threat To Democracy Index

In recent posts on Trump and dictatorship, people have asked me - how do you know you’re not suffering from Trump Derangement Syndrome?

I take this seriously; we’ve all lost loved ones to this condition. The best check on my reasoning would be an objective measure of the health of American democracy. There are several “democracy indices” that purport to do this, but they have a mixed reputation. My impression is that most current accusations of bias are relatively weak - I agree with Claude’s analysis here - but they rely enough on “expert” opinion that I don’t expect them to convince a skeptic.

The newest entrant in this space - Metaculus Democracy Threat Index - works differently, and deserves a closer look.

Metaculus is a prediction site - like a prediction market, except that no money changes hands. People can record their guesses for how future events will turn out, which get aggregated by an algorithm (currently just a recency-weighted median, although they’ve done fanicer things in the past).

Their Democracy Threat Index is a collection of 153 questions relevant to US democracy. For example:

This question says there’s a 3.5% chance that a political party will keep an opponent off the ballot in a state election in 2026. You can see that 86 people have made forecasts about this. When probabilities like this go up on the 153 questions, the index gets higher.

This has some advantages over traditional democracy index design. It’s transparent: you can see all the questions and how they fit together. It’s crowdsourced, so there’s limited opportunity for ideologues or biased experts to put their fingers on the scale. It does a good job limiting itself to things which naturally seem democracy-related, resisting the pressure to add “and they support my preferred policy” to the definition of democracy.

But the biggest change is that Metaculus leverages modern forecasting science (popularly called “superforecasting” - although I think the people with that trademark would be unhappy to hear me use it this way). Does this help?

We care most about questions like “Did they cancel elections?” or “Did they murder protesters?” But these are coarse binary outcomes - when someone wants to know if democracy is “under threat”, they want to know when there’s increased chance of these things happening, even though they haven’t happened yet. As a forecasting platform, Metaculus can distinguish not just between “keeps elections” vs. “cancels elections”, but between 10% chance of cancelling the next elections vs. 50% chance. This both lets them focus on the big questions (rather than the small questions that are most likely to differ between one mildly-concerning regime and another) and give finer-grained estimates (a 10% vs. 50% chance of elections getting cancelled, rather than just ‘not cancelled yet’).

What are the remaining risks/biases?

Susceptibility to crowd attack: If a hundred Democrats join Metaculus and give maximally pessimistic answers to all democracy-related questions, would that make Trump look worse? Currently Metaculus has near-perfect security through obscurity - nobody cares about this index enough to attack it. I asked them what happened if that changed, and they said they had good security, and that medians are naturally more secure than means to this kind of threat. I’m still disappointed to have to rely on the security of one centralized site.

These problems could be solved by transitioning from Metaculus to a true prediction market. Prediction markets’ main advantage over Metaculus-style forecasting engines is resilience to attack (because new bettors are incentivized to come take the attacker’s money). Unfortunately right now Metaculus is focusing on being a responsible academic institution, and prediction markets are focusing on attracting sports-gambling-obsessed degens, so we’ll have to settle for the former.

That still leaves one possible source of risk:

Susceptibility to question selector bias: Who decides which questions get added to the Index? Right now, it’s a “nonpartisan group” called Bright Line Watch. Are they really nonpartisan? You can see some discussion here, but even if we trust them, it’s disappointing that the question of trust has to come up at all, given the otherwise-trustless design.

I thought about this most when seeing the question about whether government departments will change policy in response to bribes, which mentions as a possible bribe vector an “expenditure to a cryptocurrency” - for example, some high official creates a crypto token, and the would-be-bribe-giver buys into it. Democrats and Republicans are corrupt in different ways, and privileging memecoin buyers is so far a uniquely Republican form of corruption. If they’d missed that vector, the bribery question would have been biased toward finding corruption in Democrats (who so far haven’t use memecoins as much). But how chould we prove that they didn’t include that vector specifically because they searched very hard for Republican-coded forms of bribery and made sure to include them, while thinking less hard about Democrat-coded forms? Are they allowed to use their judgment and say that there seem to be more things like this on one side of the aisle than the other?

Maybe it would be safer to stick to really obvious questions like “will they cancel elections?” But even this might be subjective - maybe Republicans are more likely to cancel elections, but Democrats are more likely to censor speech. Is censoring speech as much of a “threat to democracy” as canceling elections? Hard to say.

As far as I can tell, questions like these make it hard to conclude anything from the index’s “historical backfilling”:

That is: the index was started in 2025. The 2021-2022 and 2023-2024 data points come from retrospectively checking how many of the questions they invented in 2025 had already happened by 2021-2022 etc. There are many reasons it could be fewer besides declining democracy. For example, it might have been boring to start with questions that had already happened. Conversely, it could be especially exciting to consider new (as of 2025) threats to democracy like memecoin corruption.

On the other hand, as far as I can tell the third and fourth data points - the squares - are pretty believable. These represent forecasters’ opinions when the index was first formed, and their forecasts about what will be true two years later. They seem to expect that democracy as measured by these questions won’t get any worse over the remainder of Trump’s term - which I find reassuring.

Here’s another interesting graph:

The expected index level for 2027-2028 over time. We see that concerns about democracy peaked in November/December 2025, when the expected value of the index was 47%. Then they started declining in mid-December, and are currently stable at 39%. What changed? Maybe the results of various November special elections (both the elections going without a hitch, and Trump’s preferred candidates losing), or the failure of a grand jury to indict Trump opponent Letitia James in a trial widely perceived as politicized. I respect this graph for telling me a non-obvious thing (the level of predicted threat to democracy decreased in December 2025) that makes some sense in retrospect.

As time goes on, the Index will become more valuable insofar as it gets further from any possibility of biasing the questions. That is: any biases in the question set are the biases of 2025, when (for example) Republicans are more into memecoin-based corruption than Democrats. But as time goes on, the biases will drift - the biases of 2035 will be correlated <1 with the biases of 2025 (for example, Democrats might take up memecoin-based corruption, or Republicans might drop it) and as long as the question set is stable, biased question-selectors can’t optimize it for the most up-to-date biases.

Suppose that the 2028 election is Newsom vs. Vance, and that each party claims the other’s nominee will destroy democracy. We could have a conditional prediction market - what will the Democracy Index be in 2032 conditional on Vance/Newsom getting elected? Or we could just watch how the Democracy Index responds to shocks in election outcome percentages: for example, if Newsom has a bad debate and his chances drop 5% on Polymarket, does that drive the Democracy Index up or down?

What would we need to make this really work? My wish list looks like:

Duplicates of some of these questions on Polymarket and Kalshi, as a backup against attacks on Metaculus.
More thought put into the questions, including a policy on how new questions will (or won’t) be added.
More forecasters, and existing forecasters adjusting their guesses more often. There’s now $10,000 in prize money on the line, so consider participating!