Bayesian Decision Making in Human Collectives with Binary Choices

Here we focus on the description of the mechanisms behind the process of information aggregation and decision making, a basic step to understand emergent phenomena in society, such as trends, information spreading or the wisdom of crowds. In many situations, agents choose between discrete options. We analyze experimental data on binary opinion choices in humans. The data consists of two separate experiments in which humans answer questions with a binary response, where one is correct and the other is incorrect. The questions are answered without and with information on the answers of some previous participants. We find that a Bayesian approach captures the probability of choosing one of the answers. The influence of peers is uncorrelated with the difficulty of the question. The data is inconsistent with Weber’s law, which states that the probability of choosing an option depends on the proportion of previous answers choosing that option and not on the total number of those answers. Last, the present Bayesian model fits reasonably well to the data as compared to some other previously proposed functions although the latter sometime perform slightly better than the Bayesian model. The asset of the present model is the simplicity and mechanistic explanation of the behavior.


I. INTRODUCTION
The process of information aggregation in social systems gives rise to emergent phenomena like the wisdom of crowds [1,2].In order to understand such phenomena a quantitative understanding of the mechanisms by which information is aggregated and used in opinion formation and decision making is needed.In the case of the wisdom of crowds, which refers to having a better estimation of the solution to a question when the opinions of multiple heterogeneous agents are aggregated, it has been shown that social interaction can lead to misleading estimations [3].The issue of information aggregation is a hot topic which is expected to give insights into the solution of many societal problems.For example, 2014's World Economic Forum's meeting has the title "Leveraging collective intelligence for unprecedented challenges".
Models of opinion dynamics are based on assumptions on the decision making process on interacting individuals.Simple decision making rules employed in these models include proportional imitation (i.e., the rate of the opinion conversion is proportional to the number of peers possessing the different opinion), employed in the voter model, majority rules (i.e., the same rate is a superlinear function), thresholding rules (i.e., thresholding function), reinforcement rules (i.e., adaptive function depending on experiences of agents), and homophily rules (i.e., similar individuals more likely interact) [4][5][6].The type of the employed decision making rule affects the possibility, final state, speed, and other dynamical phenom- * Electronic address: victor@ifisc.uib-csic.esena of collective opinion formation.However, in physics and even social sciences literature, justification of these different types of models is at best based on a qualitative assessment of human behavior.Beyond opinion dynamics, social dilemmas, which in many cases are based on binary decision making, also offer an opportunity to bridge theory to experiments [7,8].
For animals in groups, recent work in this direction has identified Bayesian inference as a mechanism behind their collective behavior [9][10][11][12][13][14][15][16][17][18][19][20][21].In humans, experimental evidence of Bayesian inference has been provided in the realm of perceptual and cognitive domains [22,23].Effects of Bayesian types of inference on collective behavior have been investigated with the use of mathematical and individual-based models [16,[24][25][26][27][28][29].Toward quantitative understanding of social decision making of humans, the seminal experiment by Milgram and colleagues [30] designed to assess the probability to stop by a group of bystanders has recently been reproduced [31] whose results are fitted by a heuristic function.There are also other recent studies attempting to fit Bayesian (see the references above), evolutionary dynamical [32], and other [33,34] models to behavioral data.The wisdom of crowds when interaction among participants is allowed is also a target of recent experimental studies [3,35,36].However, a unifying quantitative framework to infer models of social decision making on the basis of behavioral data of humans is still lacking and much preceded by accumulating modeling frameworks for social animals [11,15,17].
In the present study, we address the potential of the Bayesian approach to explain human decision making under social interaction.We focus on subjects answering questions with binary options, one of which is correct.This situation contrasts with that of the previous stud- ies on the wisdom of crowds that allowed virtually real values of answers [3,35,36].We examine binary choices because many options in nature are discrete, as exemplified by voting, purchasing, and deciding where to live.
In many of such situations, extrapolation from continuous settings is not obvious.We use previously published data sets in which the participants first answer in the absence of social information and later with the information about the answers submitted by the r previous respondents; r gradually increases for the same question [33,34].The participants answer in a sequence, the situation akin to that for previous Bayesian models of the emergence of herd behavior [24,37].We show that simple Bayesian models reasonably explain the behavioral data.

A. Model
We denote the two options of a question by A and B Without loss of generality, we assume that A and B are the correct and wrong answers of the question q, respectively.We label the N agents 1, . . ., N and denote the option that agent i (i = 1, . . ., N ) selects in question q by x i (q) ∈ {A, B}.We denote by P [x i (q) = A] the strength of the belief (hereafter, simply the belief), with which agent i believes in A. A parallel definition is applied to We update the agent i's belief as follows.We assume that the answer of the previous respondent j, i.e., x j (q), is generated according to the probability specified by the belief of agent j, i.e., P [x j (q) = A], which equals 1 − P [x j (q) = B].Then, by using the Bayes' theorem, agent i is assumed to update the belief on the basis of the old belief and x j (q).The posterior belief of agent i is given by where P [x i (q) = A] pre and P [x i (q) = B] pre are prior beliefs summing up to unity.Parameter c ≡ P [x j (q) = A|x i (q) = A] (1/2 ≤ c < 1) represents the flexibility of agent i in response to agent j's answer.If c is close to unity, P [x j (q) = B|x i (q) = A] = 1 − c is small such that 1 − P [x i (q) = A] post , i.e., P [x i (q) = B] post is large once agent i observes x j (q) = B for a given P [x i (q) = A] pre .If c is close to 1/2, P [x i (q) = A] post is insensitive to x j (q).By symmetry, we assumed that P Iterative application of Eq. ( 1) leads to and where n A and n B are the accumulated numbers of A and B responses of the previous respondents observed by agent i, respectively.The initial belief of agent i in option A is denoted by It should be noted that the order in which the previous responses are observed does not affect i's behavior.The belief of each agent i is uniquely determined by n A − n B and the initial belief.We can rewrite Eq. (2) as where . Previous studies used Eq. ( 3) to account for consensus decision making by fish [38,39].

B. Data set
In the present study, we use the two data sets collected in Refs.[33,34].The first data set, which we denote by D 1 , consists of two sets of face-to-face experiments [33].Data set D 1 consists of the results obtained from two populations of subjects each of which contains N = 31 subjects (KUE-A and KUE-B in Ref. [33]).Each subject went through 100 questions.Each question allowed binary options, one being correct and the other being incorrect.Generally speaking, the subjects were asked to answer each question more than once under different information conditions.We refer to a sequence of answering sessions under a given question q (1 ≤ q ≤ 100) and information condition parameterized by r as a round.Subjects went through several rounds for each question in general.
The number of rounds that a subject experienced for each question depends on the subject.The N subjects in a population were randomly assigned labels 1, 2, . .., N .In the first round, all subjects answered the question without referring to others' responses.This is the memoryless condition (r = 0).If everybody answered within the allocated time, there were N data points for each population and question.
The second round was implemented as follows.First, subject 1 left this question without participating in the second and following rounds.Second, subject 2 observed the answer of subject 1 in the first round and possibly updated the private answer.Similarly, subject i observed subject (i − 1)'s answer in the first round and possibly updated the answer, where i runs from i = 3 to i = N in an ascending order.In the best case whereby everybody answered, N − 1 data points were collected in the second round.The collected data correspond to information condition r = 1.
The third round, corresponding to r = 2, was implemented as follows.First, subject 2 left without participating in the third and further rounds.Second, subject 3 observed the number of answers (n A , n B ) submitted most recently by the previous r = 2 respondents and answered the question again.It should be noted that (n A , n B )= (2, 0), (1, 1), or (0, 2).To calculate (n A , n B ), the answer of subject 1 in the first round and that of subject 2 in the second round were used.This is because subject 1 already left the question before the second round.In other words, the answer of subject 1 is assumed to be quenched to that made in the first round in the subsequent (i.e., second and later) rounds.Third, subject i answered after observing (n A , n B ) calculated on the basis of the most recent choice of subjects i − 1 and i − 2, where i runs from 4 to N .There are at most N − 2 answers obtained from the third round.
After the third round was completed, further rounds were carried out with r = 3, 5, 7, 9, and ∞ in this order, where r = ∞ implies that the subjects can refer to the most recent answers of all the preceding respondents.Subject 3 had left before the fourth round, corresponding to r = 3, started.Subjects 4 and 5 had left before the fifth round, corresponding to r = 5, started.There are eight rounds in total.The labels of the subjects were fixed throughout the 100 questions.
The second data set, which we denote by D 2 , consists of two sets of web-based experiments.They are denoted by HUE-A and HUE-B in Ref. [33] and the O and C treatments, corresponding to r = 0 and r > 0, respectively, in Exp-II in Ref. [34].Data set D 2 consists of the results obtained from two subject populations each of which contains N = 52 subjects.Each subpopulation of subjects went through 120 questions.In D 2 , each subject experienced up to 6 rounds, i.e., r = 0, 1, 5, 11, 21, and ∞ for each question.The labels of the subjects were randomly shuffled in the beginning of each question.

III. RESULTS
Let us first consider the aggregate results for each experiment.As described previously, a subject answers a question after observing the number of the correct answer, n A , and that of the incorrect answer, n B , from the last r = n A + n B respondents.By the aggregate results we mean that we aggregate the number of correct answers across questions for the same condition (n A , n B ).We denote by R(n A , n B ) the number of answers obtained under condition (n A , n B ), summed over respondents i and questions q.Out of these answers, the number of answer A, denoted by N A (n A , n B ), is given by The fraction of A answers under condition (n A , n B ) is given by N A (n A , n B )/R(n A , n B ).This fraction for various (n A , n B ) pairs is plotted in Fig. 1(a) and 1(b) for D 1 and D 2 , respectively.We fit P [x i = A] given by Eq. ( 3) to the experimental data, where we suppress q in the argument of x i because we have aggregated the data over the questions.We estimate the values of p and s by an exhaustive sampling in the parameter space.For each sampled (p, s) pair, we calculate the error by the total square distance between Eq. ( 3) and the empirical values summed over the available (n A , n B ) pairs.The parameter values yielding the smallest error are adopted.The results of the best fitting are shown by the solid curves in Fig. 1(a) and 1(b) for D 1 and D 2 , respectively.For D 1 , the best fit is obtained for p = 0.81, s = 0.75 which lead to a root mean squared error RMSE≈ 0.042.For D 2 , we obtain p = 0.82, s = 0.87 leading to RMSE≈ 0.059.Figure 1 indicates that Eq. ( 3) fits both data sets reasonably well.The value of the RMSE as a function of both parameters is shown in Fig. 6.An alternative hypothesis of collective decision making is that P [x i = A] obeys Weber's law such that it is a function that only depends on (n A − n B )/(n A + n B ), or equivalently, n A /(n A + n B ) [15,17].To test this hypothesis, we aggregate the data over q and i using the same aggregation as that used in Fig. 1, but separately for r to examine the effect of r on the decision making, and plot P [x i = A] as a function of n A /(n A +n B ).The results are shown in Fig. 2(a) and 2(b) for D 1 and D 2 , respectively.Each color corresponds to a value of r = n A + n B .If Weber's law holds true, all curves collapse on a single curve.Figure 2 indicates that it is not the case.To be more quantitative, in Fig. 2(c), we plot the slope of the curves obtained by applying the least square method to the data shown in Fig. 2(a) and 2(b).The figure indicates that the slope increases with r(= n A + n B ) and seems to saturate.That would mean that Weber's law is correct for sufficiently large r values.Nevertheless, for the r values accessed by the experiment, Weber's law does not hold.With data for larger r values one could assess if Weber's law holds and from which r value on.
We have a reasonable fit of the data to Eq. ( 3) even without aggregation over the questions.To show this, for a given question, we calculate the fraction of the correct answers N q A (n A , n B )/R q (n A , n B ), where R q (n A , n B ) is the number of answers to question q obtained under condition (n A , n B ), and N q A (n A , n B ) ≡ i x i (q, n A , n B ) is the corresponding number of answer A. The relationship between P [x i (q) = A] and z = (n A −n B ) ln s q +ln p q for different questions is plotted in Fig. 3(a).If Eq. ( 3) holds true, the results for different questions should collapse on a single curve P shown by the solid line.The results for the different ques- tions do roughly collapse on this curve.The estimated values of p q and s q for individual questions are shown in Fig. 3(b).As before, we obtained parameter values p q and s q by sampling the parameter space and finding the values giving the smallest error.Figure 3(b) shows that the estimated parameter values depend on the question to a large extent.For some questions, p > 1, implying that the initial belief in the correct answer is worse than the random coin flip, i.e., P 0 [x i (q) = A] < 0.5.For a majority of questions, however, the initial belief is better than the random coin flip, and for some questions, it is quite accurate (for example, p = 0.1 corresponds to P 0 [x i (q) = A] = 0.91).Another remark is that p and s are apparently uncorrelated.This implies that the flexibility of the opinion change does not depend on the difficulty of the question.
The different colors correspond to nB = 0 (black), In the literature one can find different models that propose different functional forms for P [A|n A , n B ]. Following [17], we fitted several of them [17,[39][40][41][42] to the current data.The quality of fitting is shown in Figs. 4 and  5 for data sets D 1 and D 2 , respectively, in different colors for different values of n B .For D 1 , the best results are produced with the model in Ref. [42] with a RMSE ≈ 0.035, followed closely by the model presented in this paper (RMSE ≈ 0.042).For D 2 , the best fitting (RMSE ≈ 0.046) is produced with the model in Ref. [42], and followed closely by the models in Refs.[40,41] (RMSE ≈ 0.053) and Ref. [17] (RMSE ≈ 0.054), with none of them being the one in this paper.See Table I for more information.
It should be noted that the first model in Table I is equivalent to a special case of our model (i.e., p = 1).Therefore, the fitting cannot be better than the present model.Note also that we fitted the model in Ref. [17] with k = 0.The result of fitting with k as a free parameter gives rise to very small values of k (k = 0.04 for D 1 and k = 0.065 for D 2 ), in the order of 10 −2 .The parameter is insensitive to the small value of k being different to 0 ( = 1.60 for D 1 and = 1.30 for D 2 ), while parameter δ is a much more sensitive (δ = 7.04 for D 1 and δ = 8.72 for D 2 ) (compare to results in Table I), as the minimum in the optimization is flatter in the direction of the δ parameter, as happens also for parameter p in the present model (see Fig. 6).The quality of the fittings is of the same order as when using k = 0 (RMSE = 0.054 for D 1 and RMSE = 0.054 for D 2 ).This also happens for the zebrafish data in Ref. [17].

IV. DISCUSSION
We showed that the simple Bayesian model provides a quantitative agreement with behavioral data of humans sequentially answering questions with binary options.At least two other studies used the same model as ours to be fit to data in different contexts.In Ref. [39], sequential choices by fish between two identical refugia are modeled.Depending on whether the two refugia are identical or nonidentical (i.e., only one arm was with a replica predator), the unbiased prior (p = 1 in our notation) or a biased one (p = 1) is used, respectively.In both unbiased and biased prior cases, the authors concluded s ≈ 0.4 (and the results are robust for 0.25 ≤ s ≤ 0.5), translating into c = 1/(s + 1) ≈ 0.7 in our notation.In another experiment with a different fish species, where fish individuals chose one of the two arms of a maze to avoid replica predators, Ward and colleagues [38] estimated s ≈ 1/e 0.478 ≈ 0.62, translating into c ≈ 0.62.In contrast, our results indicate s ≈ 0.7 − 0.8 and hence c ≈ 0.56 − 0.59.This difference may result from different species; humans may have lower responsitivity to social stimuli (i.e., c value closer to 0.5) than fish (see Ref. [32] for related experiments).The type of the task may also contribute to this difference.In the current study, the data set used are quizzes asking general knowledge of the participants.By contrast, in the fish experiments, the binary choice between two pathways that were identical except for the possible presence of a replicator predator was made by fish.
Quantitatively, some models fit better to our data than the present model does, in particular for data set D 2 (Table I).However, it should be noted that some of these previous models were proposed as fits, without particular mechanistic derivation [40][41][42].Another model, i.e., the fourth model in Table I [17], which results from the Taylor expansion of the model proposed in Ref. [15], has mechanistic underpinning.However, the model is derived from ant's random walk on a specific arena [15].In particular, the exit point that corresponds to the decision of one of the two alternatives is literally the spatial exit point of the animal.That may be why this model [15,17] does not fit well to the present data.Compared to Arganda's model [17] (fifth model in Table I), the present model fits better to data set D 1 and worse to D 2 .
A way to differentiate between models is to have data on the behavior for large number of information sources (large r).In that limit the different models provide different functional forms for P [x = A].Therefore, the models from Table I give rise to different limits r → ∞.The first and the last one (model used in this paper) give rise to a step function.The second model converges to x /(x + (1 − x) ), where x is the fraction of A responses, which coincides with Weber's law for = 1.However, the values of estimated for our data are much larger than unity.The third function for large r approximates the fraction of A responses.The fourth function gives 1/2 + δ(2x − 1), which is a good approximation of the previous model given that the fitting parameter δ 1/2 for our datasets.The fifth model gives a constant value P [x = A] = 1/2 in the limit r → ∞.More experimental data for large r would enable the further validation of models.
There are some limitations of the present study.First, we ignored the individuality of the respondents.In fact, for each question, there should be those who know the correct answer and those who do not.Such personal knowledge can be incorporated to models for sequential answering [24,33].Clarifying this issue warrants future work.Second, we tried to incorporate the information about the previous responses into our model.However, the design of the experiment makes it difficult to cope with this issue.The answers offered to subject i in each round are not a random sample from the pool of responses in the previous round, but are the responses of the previous respondents i−1, i−2, ..., i−r as initially labeled, which represents a biased sampling.Together with the influence of the history of self-responses on the new decision, these features affect the decision making process of the subjects and thus the evolution of the fraction of correct answers.Indeed in many situations individuals are not making decision from a tabula rasa but they are shaping decisions continuously from social interactions and external signals.Future developments of the theory are expected to incorporate these ingredients to deal with more realistic situations.Besides, large scale experiments taking advantage of the new technologies available would be welcome to confront with decision making theories.(δ+n A ) (δ+n A ) +(δ+n B ) δ = 3.90, = 1.95 0.054 δ = 3.66, = 1.52 0.053 [40,41]

FIG. 1 :
FIG. 1: Bayesian inference and experimental data.We plot the probability to report a correct answer A as a function of nA − nB for various (nA, nB) pairs: (a) Data set D1, (b) Data set D2.The circles correspond to the data.The solid curves indicate the best fits of Eq. (3): (p, s) = (0.80, 0.75) in (a) and (0.82, 0.87) in (b).

FIG. 2 :
FIG. 2: Dependence on the fraction of correct answers.(a) Probability to answer correctly as a function of the fraction of correct answers of the previous respondents for data set D1. Black r = 1, red r = 2, green r = 3, blue r = 4, yellow r = 7, and brown r = 9.(b) Same results for data set D2. Black r = 1, red r = 5, green r = 11, and blue r = 21.(c) Slope as a function of r obtained by the least square method applied to the plots in panels (a) and (b).The closed and open circles correspond to D1 and D2, respectively.

FIG. 3 :
FIG. 3: Dependence on the question.(a) Probability of correct answers as function of rescaled accumulated answers of previous respondents (nA − nB) ln(s) + ln(p).Each symbol represents a question.Black, open symbols correspond to D1, and blue filled circles to D2.We estimated the s and p values for each question by applying the least square method to the data for the corresponding question.(b) Estimated p and s values for different questions.The top and side panels show the distributions of p and s, respectively.

FIG. 6 :
FIG. 6: Parameter estimation.Root mean squared error associated to the fitting of the model given by Eq. eq:final to datasets (a) D1 and (b) D2.The contour line shows a level of 0.05 and 0.07 in (a) and (b), respectively.

TABLE I :
Fitting results for different models