A few weeks ago, Dardo Ferreiro and Guillaume Dezecache reported the first results of a pilot about the advantage of consensus and group discussion in forecasting future events. When I shared the blog post on facebook, an old friend of mine (Keyvan Mallahi Karai) asked a question that triggered a very useful conversation in the comments. Here I have reproduced and edited that conversation and added some new results that Dardo Ferreiro has provided.
As a reminder, the key claim was that group discussions improved the accuracy of the forecasts. This tendency was not only found for the consensus answer, but also had a lasting effect in the revised opinion of individuals after group discussions.
Keyvan Mallahi Karai (KMK): It sounds strange to me to determine how good a prediction was based on a one-shot trial. If I predict the outcome of a coin flip based on the outcome of another coin flip, I will be right with probability 1/2, but I would not call it a good prediction if that happens. Put differently, a reliable method of prediction must prove itself many times to deserve that badge.
Bahador Bahrami (BB): This is indeed true which is why the sources linked in the blog post test reliability over extended period of time with multiple events. Here too, we did the same (with 12 events) but that report was meant to describe the outcome of first 2 event.
Here, in Figure 1-2 below we have two more questions’ results. We were indeed very delighted to see the consistency between the results of these two and the previous ones.
After a private audience in Buckingham Palace a few days ago, Boris Johnson symbolically kissed the queen’s hand and was therefore royally confirmed as the prime minister. Assuming he will stay in office till the end of the year, this gives us the answer to another question from our experiment:
– Will there be a new prime minister of the United Kingdom before 1 January 2020?
Given he was already the prime minister since July, the correct answer of the question then is ‘no’. Since we asked for participants to report a probability, then the correct answer is ‘zero’. Same as before, we calculated the Brier score, to compare forecasting accuracy when people decided alone, in group and after having discussed the question. Remember, the lower the Brier score, the better the prediction. In a manner spectacularly similar to what we found for the question 1 about the presidential elections in Argentina, group discussions improved the accuracy of the predictions. This tendency was not only true for the consensus answer, but also had a lasting effect in the i2 answers after group discussions.
We also asked people if France will win the Davis cup in Tennis this year. Spain won the tournament and therefore the correct anwer was No.
Guillaume Dezecache (GD): Note that the success of a prediction is also measured in terms of the person’s judgment of the likelihood of the event. So you may only be a very good forecaster if you say something happened (which did happen) with 100% confidence (and obviously, if you are able to do that over a number of times).
KMK: I think for something like this to work one needs each participant to have some clue about what they were supposed to predict. If this is the case I can see how the aggregation of these can lead to a better prediction. But if they are all completely clueless, there is really no way that the aggregate prediction is any better than each one. As an example, there is no way that a group does better than an individual in predicting a coin flip or anything that is as random as a coin flip to them.
BB: This is exactly what the working hypothesis is. The reason why this intuitive idea is non-trivial and worth writing about is that there is a lot of evidence to the contrary (in social psychology, etc) that a group of interacting individuals becomes a stupid “herding mass” that loses its common sense in the process of social influence (for example by normative conformity). In addition, there is a key point in your comment about “each participant”. Our working hypothesis is that with deliberation and discussion, you can do better even if the assumption of “each participant must have a better than chance probability” does not hold.
KMK: OK, now I agree that this is very reasonable. I assume that that herd behaviour presupposes some sort of strong dependence between “clues” each or most of participants have or receive. For instance, suppose that they all get their clue from an oracle who tells the truth 51 percent of times and lies the rest of times. It is clear that everyone is going to follow the oracle, which leads exactly to the herd behaviour you mentioned. But my understanding is that the deliberation and exchange seems to matter a lot in your model. I am not sure how to model that probabilistically.
BB: The correlation between clues and how members of collectives learn from them (to make better collective or individual choices) is a matter of both mind-boggling complexity and super-exciting research. One of my favourite papers is Kao and colleagues work. Intentionally, they choose a very simple model for their individual agents (fish). Those types of models, however, are not very useful when we turn to humans who talk to each other and affect one another’s opinion.