Why is judgmental forecasting still underused by policy-makers and business decision-makers? Forecasters’ discussions about that are likely as old as the forecasting community itself. So, how old, exactly? If it was up to me to choose when the forecasting community came into life, I would personally choose the year 2015. This was the year when several important things happened:
Good Judgment Project was concluded and transformed into Good Judgment Inc.
GJOpen.com and Metaculus.com websites were launched.
Tetlock and Gardner published “Superforecasting”, the book which popularized the concept that regular people can learn how to produce high-quality forecasts which are useful for the government, private companies and NGOs.
It is more than 7 years now since all these events. Our community has grown. Good Judgment Inc., Metaculus, and other companies like Maby, have sold numerous forecasting projects to their clients. Good Judgment Inc, Metaculus and INFER have some clients in administration and government who are willing to listen and use their high-quality forecasts. However, we still don’t see a wide adaptation of forecasting in companies and government decision-making processes. Analysis of the causes of this situation would be lengthy enough for a separate article. Here, I will focus on factors affecting applicability and usefulness for its end user. Some problems with typical forecasting questions in that regard are:
The forecasts’ end user must not only grasp the concept of subjective probability and probabilistic thinking, but they must also see the value in using it.
The end user must not only trust the source of the forecast, but also understand and be able to use communicated probabilities. Do they even grasp the difference between 10% probability and 20% probability, or just see this as unlikely in both cases?
Well-written forecasting questions with precise resolution criteria might be seen as too specific, focusing on details too much, when decision-makers are more interested in seeing the big picture of the future. A group of many such questions can create a big-picture view, but unless we have a summary article or analytical report, readers need to do a lot of work to see the big picture themselves, not to mention keeping track of how this picture changes over time.
Forecasting platforms were designed with forecasters as the users in mind, they are only now adapting also to the needs of readers who would like to explore and use forecasts.
Usually, the decision-maker sees only the probability of some event happening, they are not given any suggestions about ways to reduce or improve these chances.
Contrary to the intelligence reports, crowd forecasts do not have clearly stated assumptions and a list of key judgments. Decision-makers can see the end result - the probability, but it is not always clear how this was achieved, even if rationales or comments from some forecasters are available, they are not necessarily representative of the crowd as a whole. As the majority of forecasters are usually not subject-matter experts on a given topic, the need to check assumptions and understanding on which the reasoning was based might be critical, especially if we are forecasting about complicated and highly specialized topics in which understanding is built on large chunks of domain-specific knowledge.
Sometimes decision-makers can have some information which is classified or at least non-public, which can make forecasts based only on open sources less relevant at times, and more often could make it look like that, as people with access to the secrets are known to rely on these sources too much.
Most of the time they have one huge advantage over forecasters - they know what they think and what are their dilemmas and goals. It is great if a forecasting organization gets insights into this perspective. Even better if they share it with their forecasters, but this is rarely the case, and forecasters need to make educated guesses and assumptions. INFER is a noteworthy exception here that I know about, as they try to do that by organizing online chats between its government clients/experts and their pro forecasters.
Binary (yes/no) forecasting questions, which are the easiest to understand, are about some arbitrary timeframe, which does not always correspond to a client perspective.
Typically, subjective probabilities of something happening decrease as we get closer to the resolution date because there is less time for something to happen. These adjustments make it harder to compare probabilities over time - does the new evidence suggest that the risk has increased in comparison to 6 months ago?
Questions about quantities and time are not that easy to interpret from the charts for someone who is not trained.
Conditional forecasting - a new hope
In order to make their predictions more relevant for decision-makers and eppecially policy-makers, forecasting companies have recently turned more to conditional forecasts. As Robert de Neufville, Good Judgment Project superforecaster and the author of Telling the Future substack described it in his highly recommended article titled “Possible Worlds”:
One approach is to pose conditional questions. That is, you can ask forecasters what’s likely to happen, all else being equal, if any one of a range of different policies were enacted. You can then compare their answers for each different policy scenario, as well as to their forecast for the base case scenario with no policy intervention. You can in this way essentially forecast the effectiveness of a range of prospective policies.
This itself is not a new tool. For example, there was the “Possible Worlds Series” created on Metaculus by Ought Inc. in August 2020. Robert de Neufville also mentioned an “experimental tournament to test conditional forecasting questions” that took place in 2020.
However, as Swift Centre for Applied Forecasting entered the stage in early 2022, they used conditional forecasts very skillfully combined with high-quality articles written with both policy-makers and the general public in mind. Examples of the conditional questions they used are:
As you can see, they are not only making these conditionals from the perspective of possible policies that could be undertaken to achieve or avoid something. To create conditionals, they do look for events that could have a significant impact on the probabilities of their main questions for each topic. They also look for possible significant consequences of these key events.
In February 2023 Metaculus team introduced a new type of question called “conditional pairs”, which brings clarity of presentation and better usability to conditional questions published on their platform. With this development, everyone can now create conditional forecasting questions, and we can expect that this approach to forecasting would become even more popular.
Here are some examples of conditional pairs from Metaculus:
I believe that the wider use of conditional forecasting is a step in the right direction. There are however, some limitations and challenges to this way of forecasting which I would like to discuss here along with some possible solutions.
Thinking in terms of possible worlds
The first group of problems I see is related to the “all else being equal” part of the above definition, which Robert provided. In his fascinating article “Common errors in reasoning about the future: Three informal fallacies” Adam Dorr wrote:
The ceteris paribus fallacy is the error of attempting to reason about the future by considering a single aspect of change while holding “all else equal”. (…)
We scientists are particularly vulnerable to the ceteris paribus fallacy because our training emphasizes the importance of isolating variables in order to examine their relation to observations in a controlled fashion. After all, reasoning in the form of analysis means to gain an understanding of something complex by breaking it into its component parts. Ideally, analysis of a complex system, object, or substance is then followed by synthesis (…) But it is important to recognize that analytic reasoning can only be meaningfully applied to the present state of a complex system, object, or substance; predicting future states requires synthetic reasoning. Visions of the future that suffer from the ceteris paribus fallacy make the mistake of considering only one technological, social, economic, political, or ecological change – one variable – at a time. (…)
the longer the timeframe of any prediction about our global coupled human and natural system, the less plausible the assumption of ceteris paribus becomes.
When we are dealing with complex systems where change is happening quickly, the “all else being equal” assumption only holds very close to the present day. From the forecaster's perspective, any policy decision will be happening in the future, after their forecast is completed and published. In most cases, this decision will not be immediate, so we have to make assumptions about when the policy could be adopted and forecast the state of the world at that specific time. To avoid the ceteris paribus fallacy, as Adam Dorr wrote, we must use synthetic reasoning. That means, we must take into consideration all key factors that will likely change with the passage of time and also how these factors will interact. Some forecasters could use scenario-like thinking to come up with different times of decision and its implementation and different possible states of the world at different times. It gets complicated, but it is still possible to come up with a weighted average of these possibilities.
But what if time itself has a huge impact on the probabilities of the positive outcome - like with epidemics, where we deal with exponential growth? In a situation like that, an average success probability from different possible times of intervention could be very misleading for decision-makers.
One solution I see for such a case could be to provide probabilities for different timeframes, that is to provide a pair of conditional forecasts for each period of time which we see as meaningfully different for our assessment of these probabilities.
So let’s look again at our task. We need two probabilities of some desired outcome happening - the first one is in the world without the considered policy being introduced, and the second one is in the world where this policy was implemented. The problem is, that from a forecaster's perspective these two worlds (or rather two distributions of possible worlds) might be significantly different.
Tom Liptay, Good Judgment Project superforecaster and Product Director at Metaculus explained this to me with a great example. I will try to reconstruct his reasoning from our conversation which took place 5 months ago.
Let’s say that we are back in early 2019, we don’t yet know much about SARS-CoV2, but the government is asking us what would be the number of deaths among those who tested positive in two conditions - with a lockdown and without a lockdown.
Straightforward logic would suggest that lockdown should not make the spread worse and cause more deaths. That is at least if people have access to water, food and basic medicine and the lockdown is not long enough to critically affect a significant number of people who need hospital treatment for other serious diseases like cancer.
But if we do not know a lot about COVID just yet, we can have a different perspective. We can ask ourselves - is COVID a big deal? How infectious and deadly it is? With that perspective, we can see a different picture. Back then, lockdowns were seen as a serious, disruptive and unpopular measure. So if COVID turned out to be really deadly we would more likely see lockdowns, than if it was not. Following that logic, in possible worlds where lockdowns were introduced, an average number of deaths among those who tested positive was higher than in those without lockdowns. So, thinking in terms of possible worlds, we can arrive at a forecast of more deaths for lockdown condition. This doesn’t mean that lockdowns cause an increase in the number of deaths, and that decision-makers should avoid them.
Another example I can think of would be a fictional question we could have asked at the beginning of this year:
If at least one NATO member country will provide F-16s or F-18s to the Armed Forces of Ukraine by the end of 2023, will Ukraine retake control of Crimea by the end of 2024?
As forecasters, we would have to first make some assumptions (or consider different scenarios) like about:
the number of delivered fighter jets,
which type and series (generation) they are, as this affects their performance,
what kind of training for pilots and ground crews will be provided and how long will it take,
will western contractors be sent for maintenance,
what weapon systems for these fighters and what support equipment will be provided with them
when will the decision be made, when will the delivery happen,
will the Armed Forces of Ukraine use these fighters skillfully (not only as in having skilled pilots but also as having good ideas of how to use them, in what kind of missions, where and how to deploy them etc.)
The possible worlds’ perspective brings other important questions.
If these fighter jets are provided to Ukraine, then:
what other advanced weapon systems are likely being provided as well? Long-range missiles maybe? I would assume that support like that is correlated (if we are providing more weapons, we are more likely to provide something else as well unless it would be redundant). The question doesn’t mention any other advanced weapon systems. Still, if their delivery is, in fact, more likely in the worlds where delivery of these fighter jets happens, then the probabilities of the desired outcome being achieved take these other systems into consideration as well in most worlds. That means that the difference in probabilities between these two conditions does not represent only the fact of the delivery of this single weapon system.
what is the Ukrainian military situation at the time of delivery? Are these deliveries a result of Ukraine winning on the battlefield? Or maybe they are delivered because there is a stalemate and Western supporters hope to overcome it with new weapons systems? Or maybe even because Ukraine is losing at the moment, and NATO is providing what it can to keep Ukraine in the battle?1 If we see the delivery of these fighter jets by the West as more likely in one of these scenarios rather than in another, this impacts forecasters’ calculations and makes the resulting probabilities less useful for the decision-maker.
With these examples, you can see the difference between the probabilities based on thinking in terms of possible worlds and the ideal but unlikely case of cause and effect, and policy being the only difference between the two worlds where all else is equal. The most important point is that this kind of situation can confuse a decision-maker, who expects a single point of difference between the two scenarios and likely assumes that the differences between probabilities come only from the policy being applied or not.
How can we address this challenge? My idea is to identify all the other key variables affecting the picture and either use them to create scenarios as combinations of these factors or to state them as assumptions explicitly. We can, for example, add to the question about western Ukraine's military support an additional factor regarding the battlefield situation at the time of fighter jets delivery:
“Ukraine is winning”,
“There is a stalemate”,
“Ukraine is losing”.
That way we could have a modified version of this question - six pairs:
Given that Ukraine is winning at the time, if [at least one NATO member country | none of NATO members] will provide F-16s or F-18s to the Armed Forces of Ukraine by the end of 2023, will Ukraine retake control of Crimea by the end of 2024?
Given that there is a stalemate at the time, if [at least one NATO member country | none of NATO members] will provide F-16s or F-18s to the Armed Forces of Ukraine by the end of 2023, will Ukraine retake control of Crimea by the end of 2024?
Given that Ukraine is losing at the time, if [at least one NATO member country | none of NATO members] will provide F-16s or F-18s to the Armed Forces of Ukraine by the end of 2023, will Ukraine retake control of Crimea by the end of 2024?
All of these additional conditions should have some operationalization - for example, the first one could probably be operationalized as “Ukraine is retaking between X and Y square km of their territory on average a month” or something like that.
We can also state the key assumptions like the number of fighter jets delivered, their generation and other things like training, support equipment and organization of the ground support services. This part could require input from subject-matter experts. If we know from decision-makers that different options are considered for some of these factors, it would be best to turn them into additional conditions for our question sets. If the number of these conditions gets too big, we might select some of these scenarios (combinations of conditions) and skip the others which are less likely to occur or less important to forecast about. Even if the situation changes, the decision-maker knows if the forecast applies to it or do they require an update. You can think of that approach as of selecting some of the possible worlds as a base for your forecasts and openly communicating this fact.
In this article, I presented some of the challenges in making forecasting useful for policy-makers. I also shared some of my ideas about addressing these issues. My only hope is that this will be useful. I would appreciate your feedback.
As Thomas C. Theiner said in the Mriya Report podcast, the Western-made fighter jets might be a necessity for Ukraine simply because there are not enough air defense systems available to replace post-soviet S-300 and S-400 systems which Ukraine is using because they are running out of missiles and the only producer of these missiles is Russia.