November 25, 2019 Leave your thoughts

Forecasting, better together

by Dardo Ferreiro and Guillaume Dezecache

Predicting the future is very hard. On TV, newspapers and social media many claim to predict the results of political elections, football games, etc. but serious research has shown that over time, celebrity pundits and star analysts are no better than flipping a coin.

To appreciate the difficulty of forecasting, consider these two questions:

  • Who won the presidential elections held in Argentina on 27 Oct 2019?
  • What was the price of a Bitcoin (in euros) on November 1st 2019 at 1pm CET?

These are not easy questions to answer, unless you follow Latin-American politics and buy, sell (or mine) Bitcoins. You may, however, ask Google and Wikipedia. Now imagine how much harder it would be if you had to answer these questions in mid-October when the Argentinian Elections had not yet happened and Nov 1st was still in the future. Google and Wikipedia would have been unable to help then.

You may say I do not know anything about Argentina or Bitcoin. Is there any way at all that I could have made a good prediction? In our research in CrowdCognition, we address this question and ask: would predictions get any better if, before they predict the future, people discuss with one another what they think will happen?

Last October, we ran a small pilot experiment to test this idea. We had 23 people forecast future events such as the above 2 questions (more to come in future crowdCognition blog posts). Each person answered the questions 3 times:

  • First privately (we call this prediction i1, standing for Individual response 1),
  • Then as part of a group of 3 people, to reach a consensus (C)
  • Finally privately again (i2). This second individual stage to allow people to change their mind if they wanted to.

We asked 12 questions. Every participant gave their individual answers (i1 and i2) to all questions. Each group, however, only discussed and answered 6 of the 12 questions (different subset for each group with some overlap). With this design, we were able to gather two different types of i2 responses, depending on if the question had been discussed (D) or un-discussed (UD) at the group stage.

This experimental idea comes from a rich tradition in social psychology that has recently been revived by Joaquin Navajas’ great work in collaboration with CrowdCognition.

Of all the 12 predictions participants made in the experiment, so far we only have the ground-truth answer for what actually happened for the two questions posted above: elections in Argentina and price of Bitcoins.

For Argentinian elections, our participants reported the probability (on a scale of zero to 1) that Mauricio Macri will be re-elected. For the price of Bitcoin, they gave a positive number.

The correct answers are:

  • No, on 27 Oct 2017 Macri was NOT re-elected.
  • 1 Bitcoin = 8386.59 Euros

We wanted to know whether discussion changed the accuracy of predictions. For the election question, to calculate the accuracy of probabilistic predictions, we compared Brier Scores of i1, C and i2. This score, roughly speaking, shows the squared Error of the prediction. If I predict that A will happen with high probability and A does happen, then my Brier score is near zero. If I say that A will certainly not happen and it does, then my score is close to 2. The more accurate the prediction the smaller the error (you can read more about Brier score here).

Figure 1. Forecasting Error (Brier scores) for Elections in Argentina calculated for each prediction. Lower values mean less error, and a better prediction. Dotted line shows the median of the Baseline = i1. Left panel shows the median i2 for those who did not discuss the question. Error bars show the median absolute deviation. Group discussions improve the accuracy of the forecast: the Consensus (middle) and individual final forecasts after Discussion (right) had lower Brier scores.

In both cases, be it a political or an economic forecast, group discussions improved the accuracy of the predictions. This tendency was not only true for the consensus answer, but also had a lasting effect in the i2 answers after group discussions.

Figure 2. Prediction accuracy for price of Bitcoin on 1 Nov 2019. The green horizontal line indicates the correct answer (8386 Euro). The dotted black line shows the median of the initial individual (i1) opinions. Left bar shows the median i2 response for individuals who did not discuss the question. Right bar shows i2 for individuals who discussed the question. Error bars show the median absolute deviation. The consensus (middle) and i2 after discussion show marked improvement in forecast accuracy.

Of course, the statistical comparisons did not reach significance at this point (Kolmogorov-Smirnov p-value>0.05), but our sample size is very small (23 participants and 2 out of 12 questions) at this point.

Forecasting, as we said, is very difficult. In fact so difficult that there are many forecasting tournaments run by serious organisations who are ready to pay good money for reliable forecasts. If you think you are good at this, we suggest you try your luck here. Over the course of the next few weeks, as the correct answers to more of our questions are determined one after another, we will be posting more of our pilot results here.

Leave a Reply

Your email address will not be published. Required fields are marked *