The Wisdom of Crowds in Matters of Taste

The Wisdom of Crowds
in Matters of Taste
Johannes Müller-Trede
Rady School of Management, University of California, San Diego
Shoham Choshen-Hillel
The University of Chicago Booth School of Business
Ilan Yaniv, & Meir Barneron
Department of Psychology, Hebrew University of Jerusalem
Aggregating predictions
How far is it from Mihaylo Hall to
John Wayne Airport?
20 mi
27 k
9k
5 mi
16 mi
13 mi
Aggregating predictions
How far is it from Mihaylo Hall to
John Wayne Airport?
27 k
9k
5 mi
Average:
12.7 mi
20 mi
16 mi
13 mi
14.9 mi
Aggregating taste predictions
How enjoyable will the Friday afternoon
session at the conference be?
very
very
can‘t wait
so-so
9/10
8/10
Aggregating taste predictions
How enjoyable will the Friday afternoon
session at the conference be?
very
very
so-so
9/10
Average:
???
can‘t wait
8/10
???
Can we benefit from aggregating predictions in
matters of taste?
The taste prediction problem
8/10
1
Taste predictions are imperfect. Errors can be random
(noise) or systematic (bias) (March, 1978; Gilbert, 2006).
2
We consider the problem from the point of view of
single individuals predicting their own tastes.
9/10
The taste aggregation problem
3
In predicting their own tastes, should individuals take into
account others‘ predictions of their respective tastes?
9/10
very
can‘t wait
9/10
a bit
8/10
very
Prediction accuracy
MSE
We define accuracy as the squared error between taste
predictions and taste criteria.
3
5
9
4
9
9
...
x1
u1
(x1 – u1)²
x2
u2
(x2 – u2)²
xi
ui
(xi – ui)²
Modeling assumptions
5
9
1. Predictions are unbiased:
xi = ui + εi, with E[εi] = 0 and Var[εi] = σ2εi
2. Prediction errors are not correlated with tastes:
Cov [ui, εj] = 0 for all i, j
Simple averages
The Wisdom of Crowds
We show that simple averages of taste predictions can lead to
accuracy gains when predictability is low (i.e., σ2ε is large).
3
5
4
9
x1
u1
x2
u2
((x1+x2)/2 - u1)² < (x1- u1)²
Optimal weights
Taste Similarity
We show that individuals should often place a larger weight on
their own predictions and on predictions of others who share
their tastes (i.e., r(u1, u2) is large).
3
5
4
9
x1
u1
x2
Minw1,w2 ((w1x1+w2x2) - u1)²
u2
w 1 > w2
Study 1: Music
Method
N = 104 (108) undergraduate participants.
Stimuli. Participants listened to 22 1-minute excerpts from a
variety of musical pieces including different styles such as
classical music, national and international pop music, and ethnic
music from Africa. The 22 pieces consisted of 11 pairs (e.g., 2
orchestral pieces by Bach, 2 songs from a Bob Dylan album…).
Procedure. After listening to each piece, participants rated how
much they liked it, and how familiar they were with it on 10point Likert scales.
Averages of n random others‘ ratings
Averages of n random others‘ ratings
n
Decomposing inaccuracy
1
Mean squared errors can be decomposed into different parts
(Lee & Yates, 1992; Theil, 1966):
MSEx
= (Mx – Mu)²
= (Mx – Mu)²
+ (Sx – Su)²
+ (Sx – raSu)²
+ 2 (1 – ra) Sx Su
+ (1 – ra²) Su²
2
We use this decomposition to identify the nature of the error in
the (averages of the) taste predictions.
Decomposing inaccuracy
1
Mean squared errors can be decomposed into different parts
(Lee & Yates, 1992; Theil, 1966):
MSEx
= (Mx – Mu)²
= (Mx – Mu)²
1 Bias:
Systematic overor underprediction.
+ (Sx – Su)²
+ (Sx – raSu)²
+ 2 (1 – ra) Sx Su
+ (1 – ra²) Su²
2 Variability:
Predictions should
“regress to the
mean”.
3 Correlation:
Between the
predictions and
the criteria.
Decomposing inaccuracy
Averages of n similar others‘ ratings
Averages of n similar others‘ ratings
nn
Averages of n similar others‘ ratings
Study 1, results
1
Averaging can be beneficial in matters of taste.
2
Averaging effects are more pronounced for similar others.
Study 1, results
1
Averaging can be beneficial in matters of taste.
2
Averaging effects are more pronounced for similar others.
…
Participants in Study 1 did not make predictions, though.
Study 2: Short films
Method
N = 62 (66) undergraduate participants.
Session 1. Participants viewed 10-second clips from 7 short
films along with a brief description of the films. They then made
predictions on a 100-point scale regarding how much they
expected to enjoy each of the films.
Session 2. Two weeks later, participants returned to the lab and
were shown the 7 short films. They then had to rate how much
they enjoyed each of the films on a 100-point scale.
Averaging n taste predictions
1800
Averaging n taste predictions
Total MSE = Bias + Variability Bias + Lack of Correspondence
20
30
40
50
800
10
800
MSE
1200
1000
800
600
0
800
vMSESim
1400
1600
Average of the predictions of
n randomly chosen other participants
Average of the predictions of the
n most similar other participants
Benchmark: Own prediction
60
n
1800
Decomposing accuracy gains
Total MSE = Bias + Variability Bias + Lack of Correspondence
MSE
600
800
1000
1200
vMSESim
1400
1600
Average of the predictions of
n randomly chosen other participants
Average of the predictions of the
n most similar other participants
Benchmark: Own prediction
40
0
10
20
30
N
40
50
60
50
60
200
400
600
Lack of Correspondence
0
200
400
vrbSim
600
Variability Bias
0
200
400
vSbSim
600
Bias
0
vMbSim
30
800
20
800
10
800
0
0
10
20
30
N
40
50
60
0
10
20
30
N
40
50
60
Applications and Implications
1
Preference predictions inform decisions. By taking into account
what other people might like, DMs can make better decisions.
2
Psychological foundations for similarity-based algorithms in
recommender systems (e.g., collaborative filtering, Ansari,
Essegaier, & Kohli, 2000; Koren & Bell, 2011).
Conclusions
1
Aggregating predictions can lead to accuracy gains even in the
context of taste predictions and other „subjective truths“.
2
Taste predictability and taste similarity determine the
potential for-, and the nature of these accuracy gains.
Taste subjectivity: optimal weights
0.6
w*
0.5
0.4
0.3
0.2
0.1
0
Weight Self
Weight Other 1
Weight Other 2