Quality Assurance and Crowd Access Optimization: Why does diversity matter? Besmira Nushi Adish Singla Anja Gruenheid Andreas Krause Donald Kossmann ETH Zurich, Universit¨ atstrasse 6, 8092 Zurich, Switzerland [email protected] [email protected] [email protected] [email protected] [email protected] Abstract lenges and is applicable in a wide range of use cases. Quality assurance is amongst the most important challenges in crowdsourcing. Assigning tasks to several workers to increase quality can be expensive if no target-oriented strategy is applied. Hence, efficient crowd access optimization methods are crucial to the problem. This work argues that optimization needs to be aware of diversity and correlation of information within groups of individuals. Based on this intuitive idea, we introduce a novel crowd model that leverages the notion of access paths as an alternative way of retrieving information. Moreover, we devise a greedy optimization algorithm that works on this model and finds a good approximate plan to access the crowd. In current crowdsourcing platforms, redundancy (i.e. assigning the same task to multiple workers) is the most common and straightforward way of confirming results. Simple as it is, redundancy can be expensive if used without any target-oriented approach, especially if the errors of workers are correlated. Asking people whose answers are expected to converge in the same opinion is neither efficient nor insightful. For example, in a sentiment analysis task, one would prefer to consider opinions from di↵erent non-related groups of interests before forming a final interpretation. This is the basis of the diversity principle introduced by Surowiecki (Surowiecki, 2005). The principle states that the best answers are achieved from discussions and contradiction rather than agreement and consensus. The Access Path Model (APM) that we describe here explores the crowd diversity not on the individual worker level but on the common bias of workers while performing a task. In this context, an access path is a way of retrieving a piece of information from the crowd. The configuration of access paths may be based on the source of information of the answer (e.g. book, yellow pages, web page), workers’ demographics (e.g. profession, group of interest, age) or task-specific attributes (e.g. time of completion, task-design). 1. Introduction Crowdsourcing is applied to integrate humans in collaboratively solving problems that are difficult to handle with machines only. This field of research has attracted the interest of many communities such as machine learning, database systems and human computer interaction. Two very crucial challenges in crowdsourcing, irrespectively of the field of application, are quality assurance and crowd access optimization. Both are important for building strategies that can proactively plan and ensure the quality of the results deduced from crowdsourced data. In this work, we propose a novel crowd model, named Access Path Model (APM), which seamlessly tackles both chalICML Workshop on Crowdsourcing and Human Computing, Beijing, China, 2014. Example 1 Jane is researching the impact of lifestyle on the development of the Alzheimer disease. More specifically, she wants to answer the question: “Can physical exercise prevent Alzheimer’s disease?”. She can ask three di↵erent groups of people: Access Path Neurologist Personal trainer Alzheimer patient Error rate Cost 10% 20% 25% $20 $15 $10 Quality Assurance and Crowd Access Optimization Each of the groups brings a di↵erent perspective to the problem and has an associated error rate and cost. Considering that Jane has a limited budget to spend and that she can ask more than once on the same access path, she is interested in finding the optimal combination that will give her the best answer. Throughout this paper, a combination of access paths will be referred to as an access plan and it tells how many people to ask on each available access path. Previous work in quality assurance and crowd access optimization estimates the individual perfomance of each worker and targets those with the best accuracy. This scheme is useful for spam identification and pricing but unfortunately does not reveal coarse-grained diversity of the crowd and may risk to fall into partial consensus traps. For instance, in the previous example, spending the whole budget on doctors only is not going to make use of the personal experiences of real Alzheimer patients and training professionals. Moreover, the crowd participation is dynamic which makes it difficult to accurately estimate the errors of individuals. For example, a single worker might not have enough sample answers to evaluate his or her skills. Additionally, in a free and competitive marketplace, the vote of a particular person is never guaranteed. The model that we propose overcomes these difficulties by planning the optimization on groups rather than individuals. In summary, this work makes the following contributions: • Modelling the crowd. We design the Access Path Model as an acyclic Bayesian Network where each of the access paths is represented as a latent random variable. To the best of our knowledge, APM is the first model able to capture and utilize the crowd diversity from a non-individual viewpoint. We show that such a model is present in real crowdsourcing settings and that it’s results are more qualitative than relying only on error estimates of separate workers or simple majority votes. • Crowd access optimization. We devise a greedy algorithm for the crowd access optimization problem. The algorithm leverages the Access Path Model and produces non-adaptive access plans by using information gain as an objective function for reducing the uncertainty. We compare our model and optimization technique with Na¨ıve Bayes approaches and Majority Vote. Our experiments cover tasks from two di↵erent domains: sport events prediction and species classification. 2. Related Work The reliability of crowdsourcing and relevant optimization techniques are longstanding issues for human computation platforms. We identify the following directions as the ones that are closer to our study: Query optimization. Crowdsourced databases extend the functionalities of a conventional database system to support crowd-like information sources. Quality assurance and crowd access optimization are envisioned as part of the query optimizer which in this special case needs to estimate the query plans not only according to the cost but also to their accuracy and latency. Most of the previous work in this area (Franklin et al., 2011; Marcus et al., 2011; Parameswaran et al., 2012) focuses on building declarative query languages with particular support for processing crowdsourced data. The proposed optimizers here take care of (1) defining the order of execution of operators within query plans and (2) mapping the crowdsourcable operators to micro-tasks, while the quality of the results is then ensured only by requiring a minimum amount of responses for the same micro-task. In our work, we propose a more fine grained approach by first ensuring the quality of each single database operator executed by the crowd. Access path selection. Even though the idea of access paths is one of the basic pillars in query optimization for traditional databases (Selinger et al., 1979), in crowdsourced databases this abstraction is not fully explored. One of the few studies which investigates this is presented in Deco (Parameswaran et al., 2012). Deco uses the concept of a fetch rule to define the way how the data can be obtained either from humans or other external sources. In this regard, our access path concept is analoguous to a fetch rule with a very important distinction that an access path is associated with extra knowledge such as error rate and cost which helps the database optimizer to use them for quality assurance purposes. Quality assurance and control. One of the central works in this field is presented by David and Skene in (Dawid & Skene, 1979). In an experimental design where observers can make errors, these authors suggest to use the Expectation Maximization algorithm (Moon, 1996) to obtain maximum likelihood estimates for the observer variation. This has served as a foundation for several following contributions (Wang & Ipeirotis, 2013; Liu et al., 2012; Whitehill et al., 2009) which put the algorithm of David and Skene in the context of crowdsourcing and enrich it for building performance-sensitive pricing schemes. (Zhou et al., 2012) has used minimax entropy principle for label Quality Assurance and Crowd Access Optimization aggregation from crowds. The main subject of these studies is the crowd workers, while in our quality definition, the error rates of the workers are also a↵ected by the access path that they follow. A work that follows a similar line and targets the tasks not to specific workers but to groups is introduced in (Li et al., 2014). One subtle di↵erence between this method and ours is that our optimization technique does not immediately discard access paths which do not prove to be the best ones. Instead, for the sake of information diversity as well as optimal planning, access plans may contain more than one access path. Crowd access optimization. The problem of finding the best plan to access the crowd is similar to the problem of expert selection in decision-making. Nevertheless, di↵erently from the expert selection case, for crowd access optimization, the assumption that the selected individuals will answer no longer holds even in paid forms of crowdsourcing. Some previous studies based on this assumption are (Karger et al., 2011; Ho et al., 2013). The proposed techniques are e↵ective for task recommendation, spam detection, and performance evaluation of workers, but they can easily run into situations of low participation and consequently cannot guarantee quality. Instead, the optimization algorithm that we devise chooses the workers according to the access paths and is less prone to low participation. Relevant works in the management science domain (Lamberson & Page, 2012; Hong & Page, 2004) define the notion of types to refer to forecasters that have similar accuracies and high error correlation. Crowd access strategies can run either in adaptive or non-adaptive mode. In the adaptive mode (Ho et al., 2013) the optimization is performed after each step of crowdsourcing and the decisions adapt to the latest retrieved samples. The non-adaptive mode (Chen & Krause, 2013) produces global plans which do not change with new crowd evidence. Although these strategies are static compared to the adaptive ones, they allow for a higher degree of parallelization. 3. Problem Statement In a traditional query optimizer, access paths to a relational table may have di↵erent execution times but they are equivalent in terms of output. Also, the incoming data does not include any kind of uncertainty. In contrast, a crowdsourced database has to deal with uncertain information coming from noisy observations. Thus, an access path Zi is associated not only to the respective monetary cost but also to an error rate. Moreover, the observations coming from di↵erent access paths have to be aggregated into a single decision which is not required in a traditional RDBMS. Being aware of these subtle di↵erences, we define the problems that we are aiming to solve with this work as follows. Problem 1 Given a task Y and a set of votes collected from crowd workers, what is a good model that can express diversity and compute high-quality predictions? The model that we are looking for should be able to abstract the common bias that comes with the access path usage. The main assumptions to be represented are (1) the error correlation within access paths and (2) the independence of error accross access paths. These assumptions mimic situations when groups of people make similar decisions because they read the same media, follow the same lecture, have a common cultural background etc. Furthermore, the model should o↵er a decision function whose predictions are not only accurate but also linked to meaningful confidence levels. For example, Na¨ıve Bayes models may o↵er reasonable accuracy but the predictions are always highly confident and consequently difficult to interpret. Problem 2 Given a task Y that can be solved following N di↵erent access paths Z = Z1 , . . . , ZN and a budget constraint B, which is the best access plan P that ensures the most qualitative decision with respect to accuracy and diversity. An access plan tells how many people to ask in each of the access paths. In Example 1 the plan P1 = [1, 2, 3] will ask one neurologist doctor, two personal trainers and three patients. Similarly to access paths, each plan is also associated to a cost c(P ) and quality q(P ). For P3 example, c(P1 ) = i=1 P1 [i] ⇤ ci = $80 where ci is the cost of getting one single answer through access path Zi . The definition of the quality of an access plan is also the objective function to be used in the optimization scheme. As we argue later, the choice of this function is crucial to the solution. Another interesting optimization problem to tackle but beyong the scope of this paper is the following. Problem 3 Given a task Y that can be solved following N di↵erent access paths Z = Z1 , . . . , ZN and a targeted quality Q, which is the less expensive plan P that satisfies the quality constraint? Besides these two base problems, a related research question concerns the discovery and the design of access paths if no intuitive configuration is available. Possible tools that can help in this regard include structural learning based on conditional independence tests and information gain (De Campos, 2006). Quality Assurance and Crowd Access Optimization Y Z1 0 1 ✓1 ✓ = @ ✓2 A ✓3 Z2 Y 1 0 P (Y = 1) P (Y = 0) 2.1 Zi Z3 Y 1 0 0 1 P (Zi = 1|Y = 0) P (Zi = 1|Y = 1) 1 1 P (Zi = 1|Y = 0) P (Zi = 1|Y = 1) 2.2 Xij ... X11 ... X1P [1] X21 ... X2P [2] X31 X3P [3] Figure 1: Bayes Network Model for Access Paths - APM 4. Access Path Model The crowd model presented in this section aims to fulfill the requirements specified in the definition of Problem 1 and also enables our method to learn the error rates from historical data and then aggregate worker votes. We design the triple <task, access path, worker> as an acyclic hierarchical Bayesian Network. Figure 1 shows an instantiation of the APM model for three access paths. The task Y in the root of the model represents the random variable for the real outcome of the task. The second layer of the network contains the random variables for the access paths Z1 , Z2 , Z3 . Each access path is represented as a latent variable since its values are not observable. They are considered as a channeling mechanism for the bias of workers that follow the access path. Due to the tail-to-tail network configuration, each pair of access paths is conditionally independent on Y . As we mention in the baseline description, it is possible to have a simpler model that groups the workers according to the access path and does not include the middle latent layer. This variant, although it is able to make more accurate predictions compared to other baselines it cannot make predictions with meaningful confidence. Finally, the last layer contains the random variables for the votes of the workers grouped by the access path they are following. APM can handle two types of use cases: 1. Workers solve the whole task. The workers’ votes are guesses at the true answer and belong to the same domain as the task. This is the case of Example 1. 2. Workers solve subtasks. Often, complicated tasks are decomposed into smaller ones. Each subtask type can serve as an access path. Each subtask type can serve as an access path and bring its own signal in the model. Zi 1 0 0 1 P (Xij = 1|Zi = 0) P (Xij = 1|Zi = 1) 1 1 P (Xij = 1|Zi = 0) P (Xij = 1|Zi = 1) 2.3 Figure 2: Parameters ✓ of the Access Path Model 4.1. Parameter learning The main prerequisite for applying the Access Path Model is that the task should be repetitive such that the model can adjust its own parameters, i.e. the conditional probability of each variable with respect to its parents. We will refer to the set of all model parameters as ✓. Figure 2 shows an example of ✓ for a pure binary setting of the network. Given a training dataset D from historical data of the same type of task, the goal of the parameter learning stage is to find the maximum likelihood estimate for ✓MLE that maximizes likelihood of the training set. Definition 1 ✓MLE is a Maximum Likelihood Estimate for ✓ if ✓MLE = arg max✓ p(D|✓). For a training set containing K samples: p(D|✓) = K Y k=1 p(sk |✓) (1) If all the variables in the model < Y, Z, X > were observable then the likelihood of a sample sk training set given ✓ would be: p(sk |✓) = p(yk |✓) N ⇣ Y i=1 Pk [i] p(zik |yk , ✓) Y j=1 p(xijk |zik , ✓) ⌘ (2) where Pk [i] is the number of votes in access path Zi for the sample. Since maximizing the likelihood estimation is equivalent to minimizing the negative log likelihood, the problem in Definition 1 can be written as: K X ✓MLE = arg min log p(sk |✓) (3) ✓ k=1 For this setting, the estimate for ✓Zi |Y can be computed by taking the derivative on both sides in order to find the inflection point for the minimum: K @ log p(D|✓) X @ log p(zik |yk ) = @✓Zi |Y @✓Zi |Y k=1 (4) Quality Assurance and Crowd Access Optimization For fully observable Zi the best estimate would be: ✓Zi =z|Y =y = PK (Zik = z, yk = y) PK k=1 (yk = y) k=1 (5) Here, () is an indicator function which returns 1 if the training example fulfills the conditions of the function, and 0 otherwise. Since in our model Zi is not observable, counting with the indicator function is not possible. For this purpose, we apply the Expectation Maximization algorithm (Moon, 1996). Below we show the instantiation of the EM algorithm for our model. E-step: Calculates the expected value for the log likelihood of the latent variables given the current ✓0 . For a binary Zi , for each sample this would be: p(zik = 1, yk , xk |✓0 ) p(zik = 1, yk , xk |✓0 ) + p(zik = 0, yk , xk |✓0 ) (6) M-step: Recomputes ✓ by maximizing the expected log likelihood found on the E-step. Di↵erently from what is shown in Equation 5, the counter for the latent variable is replaced by its expected value. E[zik ] = ✓Zi =1|Y =y PK ✓Xij =x|Zi =1 PK k=1 (yk = y)E[zik ] PK k=1 (yk = y) (xijk = x)E[Zik ] PK k=1 E[Zik ] k=1 (7) ✓Xij =x|Zi =1 k=1 PP [i] j=1 PK (xijk =x) E[Zik ] P [i] k=1 E[Zik ] prediction = arg max p(yc |xt ) (10) yc 2Y p(yc |xt ) = P p(yc , xt ) y2Y p(y, xt ) (11) Since the test samples contain only the values for the variables X, the joint probability between the candidate outcome and the test sample is performed by marginalizing over all possible values of Zi . Due to the conditional independence of the access paths given Y , it is possible to do this in polynomial time as follows: p(y, xt ) = p(y) N ⇣ X Y i=1 zi 2Zi Pt [i] p(zi |y) Y j=1 p(xijt |zi ) ⌘ (12) Besides inferring the most likely outcome, we are also interested in the confidence of the prediction. In other words, we would also like know what is the likelihood that the prediction is accurate. For our model (APM) confidence corresponds to p(prediction|xt ) computed as in Equation 11. As we will demonstrate in the experimental section, the model that we propose is able to distinguish the confidence of the predictions in contrast to Na¨ıve Bayes variants whose predictions are always strongly confident. Confidence and accuracy can be used together to define loss functions for evaluating the performance of di↵erent models and baselines. (8) 5. Crowd Access Optimization Notice that Equation 8 models the situation when the votes are always ordered by the id of the workers. This scheme works if the set of workers involved in the task is sufficiently stable to provide enough samples for computing the error rates of each worker (i.e. ✓Xij |Zi ). Since in many of the crowdsourcing applications (as well as in our experiments and datasets) this is not always the case, we assign to the workers an average value: PK the votes in the test sample xt . (9) This enables us to later apply on the model an optimization scheme agnostic with respect to the identity of workers. 4.2. Inference After learning the parameters, the model is used to infer the answer of a task given the available votes on each access path. The inference procedure computes the likelihood of each candidate outcome yc 2 Y given The crowd access optimization problem is crucial for both paid and non-paid forms of crowdsourcing. While in paid platforms it is clear that the goal is to acquire the best quality for the given monetary budget, in nonpaid applications the necessity for optimization comes from the fact that highly redundant accesses to the crowd might decrease the user satisfaction and increase the response latency. In this section, we describe how to estimate the quality of possible plans and how to finally choose the plan with the best expected quality. 5.1. Information Gain as a measure of quality The concern of the crowd access optimization problem is not related to accuracy only. The issue in real-world crowdsourcing is that there is no perfect access path and even the best ones get saturated early if the correlation within the access path is strong. As a result, the quality specification should describe a plan not only in terms of accuracy but also information gain and diversity. Based on this analysis, we are going to use the information gain of the variable Y in our model for a plan P as a measurement of plan quality as well as an Quality Assurance and Crowd Access Optimization objective function for our optimization scheme. Formally, this is defined as the joint information gain: IG(Y ; P ) = H(Y ) H(Y |P ) (13) P as an access plan determines how many X variables to choose from each Zi access path. Since information gain is based on the conditional entropy H(Y |P ), access paths that have a lower accuracy than the best one might still be part of the optimal plan. This can happen in two situations: (1) if better access paths are relatively exhausted and asking one more question in less accurate ones reduces the entropy more than continuing to ask on paths that were previously explored (2) the very low accuracy of an access path can improve the quality of a prediction if interpreted in the opposite way. Similar metrics have been widely used in the field of Bayesian experimental design aiming to optimally design experiments under uncertainty. In targeted crowdsourcing the concept has been recently applied from (Li et al., 2014) and (Ipeirotis & Gabrilovich, 2014). The computation of the conditional entropy H(Y |P ) is a #P-hard problem (Krause & Guestrin, 2012) and the full calculation would require the enumeration of the whole possible instantiations of the plan with votes. Thus, we choose to follow the sampling approach presented in (Krause & Guestrin, 2012) which randomly generates samples that satisfy the access plan and that follow the parameters of the Bayesian Network. The final conditional entropy will then be the average value of the conditional entropies of the generated samples. The method is proven to provide absolute error guarantees for certain levels of confidence if enough samples are generated. In addition, it in runs polynomial time if the cost of sampling and the probabilistic inference from the network can also be done in polynomial time. Both conditions are satisfied from our model due to the hierarchical tree-like configuration of the Bayesian Network. They also hold true for the Na¨ıve Bayes baselines described in Section 6.2 as simpler tree versions of our model. 5.2. Optimization scheme After having determined the joint information gain as an appropriate quality measure for a plan, the crowd access optimization problem is to compute: arg max IG(Y ; P ) s.t. P 2P N X i=1 ci ⇤ P [i] B (14) where P is the set of all possible plans that satisfy the budget constraint B. An exhaustive search would QN consider |P| = i=1 cBi possible plans out of which the Algorithm 1 Greedy Crowd Access Optimization Input: budget B, bound ↵, step s Output: best plan Pbest b=0 while b < B do Ubest = 0 for i = 1 to N do Ppure = GetP ureP lan(s, Zi ) if ci b and IsBound(Pbest [Ppure , ↵B) then IG(Y, Pbest ) IG = IG(Y ; Pbest [ Ppure ) if cIG > U then best i Ubest = cIG i Pmax = Pbest [ Ppure end if end if end for Pbest = Pmax b = cost(Pbest ) end while return Pbest ones that are not feasible have to be eleminated. Afterwards, for each of the feasible plans the one with the maximum information gain has to be selected. Nevertheless, efficient approximation schemes can be constructed given the similarity of the problem with analogous maximization problems for submodular functions under budget constraints (Khuller et al., 1999). Based on the non-decreasing properties of information gain we devise a greedy technique as illustrated in Algorithm 1 that incrementally finds a local approximation for the best plan. In each step, the algorithm evaluates the trade-o↵ U between marginal information gain and cost for all possible access paths feasible to access. The marginal information gain is the improvement of information gain by adding to the current best plan s pure votes from one access path. In our experiments we set s = 1 as it results to a better approximation. Nevertheless, it is possible to spend the budget in larger chunks for faster execution. In cases when the number of available votes in each access path is bound by design, the algorithm stops asking anymore questions if the predefined bound ↵B is reached. In the worst case, when all access paths have unit cost, the computational complexity of the algorithm is O(↵B 2 N 2 S) where S is the number of samples generated to compute information gain. The cost of the sampling process alone is O(BN S) while generating a larger number of samples guarantees a better approximation rate. Quality Assurance and Crowd Access Optimization 6. Experimental Evaluation We experimentally evaluated our work on two realworld datasets, one for each use case described in Section 3. The main goal of the experiments is to validate the proposed model and the optimization technique. 6.1. Dataset description Both datasets that we describe here consist of real votes gathered from people. For experiments with restricted budget, we repeat the learning and the prediction several times by randomly selecting from the votes and via k-fold validation. CUB-200 birds classification. The dataset is built in the context of attribute-based classification of bird images (Welinder et al., 2010). Since this is a difficult task to carry out, the crowd workers are not directly asked to find out the category of the bird but whether a certain attribute is present in the image or not. Each attribute brings a piece of information for the problem and we treat them as access paths in our model (for example, yellow beak ). The dataset contains 5-10 answers foreach of the 288 available attributes. ProbabilitySports. This data is based on a crowdsourced betting competition (Probability Sports) on NFL games. The participants in the competition voted with a certain belief to the question:“Is the home team going to win? ” for 250 events within the season. Not all the participants voted on all the events and di↵erent seasons have a di↵erent popularity. We designed the access paths based on the accuracy of each worker during the season. Since the workers’ accuracy in the dataset follows a normal distribution, we divide this distribution into three intervals where each interval corresponds to one access path (worse than average, average, better than average). In this configuration, the access paths will have a decreasing error rate. Consequently, for experimentation, we assigned them an increasing integer cost (2, 3, 4), although the competition itself was originally not based on money. 6.2. Baseline models For the purpose of our work we analyze di↵erent crowd models with respect to diversity awareness and the level of granularity for diversity. More specifically, we will consider Majority Vote (MV), Na¨ıve Bayes Individual (NBI) and Na¨ıve Bayes for Access Paths (NBAP). Majority Vote. Being the simplest of the models and also the most popular one, majority voting is able to produce fairly good results if the crowdsourcing re- Y ... X11 ... X1P [1] X21 ... X2P [2] X31 X3P [3] ✓1 ✓2 ✓3 Figure 5: Na¨ıve Bayes Model for Access Paths dundancy is sufficient. Nevertheless, majority voting considers each vote as equal with respect to quality and does not have any sense of diversity. Na¨ıve Bayes Individual. This model assigns individual error rates to each worker and uses them to weigh the incoming votes and form a decision. This means that the results highly depend on the assumption that each worker has solved fairly the same number of tasks and that each task has been solved by the same number of workers. This assumption generally does not hold for open crowdsourcing markets where the “vote not guaranteed ” circumstances are commonly faced. As it is also shown in the experimental evaluation, this is harmful not only for estimating the error rates but also for crowd access optimization; the targeted workers might not participate by wasting this way the budget or increasing the latency. Furthermore, even in cases of fully committed workers, this model does not provide the proper logistics to optimize the budget distribution since it does not capture the shared bias between the workers. Na¨ıve Bayes for Access Paths. For correcting the e↵ects of non-stable participation of individual workers we propose yet another baseline (Figure 5) very similar to our original model. The votes of the workers here are still grouped according to the access path but the access paths themselves are not represented through the intermediate latent variables. For inference purposes then, each vote xij is weighed with the average error rate ✓i of the access path it comes from. This means that all the votes that belong to the same access path behave as a single random variable. Note that this generalization is obligatory for this model and only optional for the Access Path model. Similarly to NBI and to all Na¨ıve Bayes classifiers, this model is not able to make predictions with meaningful confidence. 6.3. Model evaluation For evaluating the Access Path Model regardless the optimization process, we performed experiments first using all the votes available in the datasets and then equally distributing the budget accross all access paths. The comparison is based on two measures: Quality Assurance and Crowd Access Optimization APM NBAP MV ODDS NBI Accuracy Accuracy 0.7 0.6 APM NBAP NBI MV 0.9 0.8 0.7 0.6 0.5 0.5 2000 2001 2002 2003 2004 2005 1 10 11 12 114 118 122 161 182 195 Year — ProbabilitySports Species Id — CUB-200 Figure 3: Accuracy with unconstrained budget. 0.65 0.6 0.55 0.5 101 102 APM NBAP NBI MV 103 -logLikelihood Accuracy 600 APM NBAP 400 NBI MV 200 101 102 103 Budget — ProbabilitySports Budget — ProbabilitySports Figure 4: Accuracy and negative loglikelihood for equally distributed budget in ProbabilitySports (year 2002) accuracy and negative loglikelihood. Accuracy corresponds to the percentage of correct predictions. Negative loglikelihood is computed as the sum over all test samples of the negative loglikelihood that the prediction corresponds to the real outcome. The closer a prediction is to the real outcome the lower is its negative loglikelihood. X logLikelihood = log p(prediction = yt |xt ) st (15) Hence, the negative loglikelihood measures not only the correctness of a model but also its ability to output meaningful probabilities for a prediction to be correct. Unconstrained budget. Figure 3 shows the accuracy of all models for both datasets by using all the votes available. The aim of the experiment is to test the robustness of APM in case of very high redundancy. For ProbabilitySports we also show the accuracy of the odds provided by the betting parties before the matches took place. As expected, for the betting scenario it is challenging to improve over Majority Vote. Nevertheless, we notice a 4%-8% enhacement of APM over majority while the Na¨ıve Bayes baselines cannot achieve significant improvement. For the birds classification dataset (CUB-200), it is not possible to compare APM and NBAP with MV and NBI because the votes of the workers do not solve the final task as in the betting dataset. For this rea- son, we performed further one vs. all experiments on Mechanical Turk based on the same images as the original dataset. Each hit consisted of a photo to be classified from the worker as well as a good quality sample photo from the species to be identified (the latter was included to train the workers and simplify their task). Each photo was assigned to 10 di↵erent users. After this comparison, it can be observed that access path models generally perform better than individual models and majority. In specific cases, when the bird has a specific feature that makes it distinguishable from other species (e.g speciesId 12, yellow headed blackbird) there is no di↵erence between the models. Note that sometimes accuracy for NBI is very close to MV because of the non-stable participation of MTurk workers throughout all the photos from the same species. Constrained budget. For this experiment we varied the total budget and equally distributed it accross all access paths. Figure 4 shows that while the improvement of APM accuracy over NBI and MV is stable, NBAP starts facing the overconfidence problem for high values of available budget. The phenomenon is better visible in the negative loglikelihood graph. Another expected observation is the improvement of majority in terms of negative likelihod and not in terms of accuracy. This reflects the robustness of majority to provide meaningful confidence levels even for highly noisy data if enough votes are provided. Quality Assurance and Crowd Access Optimization ·10 2 OPT GA AP2 AP3 AP1 No. accesses IG 6 4 2 AP1(cost=2) 10 AP2(cost=3) AP3(cost=4) 5 0 5 10 15 20 25 0 30 5 10 15 20 25 30 35 40 Budget — ProbabilitySports Budget — ProbabilitySports Figure 6: Information gain and budget spent accross access paths in the best approximate plan. (year=2002) APM + GA NBAP + GA NBI + RND MV + RND NBI + GA APM + GA NBAP + GA NBI + RND MV + RND NBI + GA -logLikelihood Accuracy 300 0.65 0.6 0.55 0.5 0 10 20 200 100 0 30 0 Budget — ProbabilitySports 10 20 30 Budget — ProbabilitySports 30 -logLikelihood Accuracy 0.9 0.8 0.7 0.6 APM + GA 0.5 0 20 10 NBAP + GA 10 APM + GA 20 0 0 NBAP + GA 10 20 Budget — CUB-200 Budget — CUB-200 Figure 7: Greedy optimization results for ProbabilitySports(year=2002) and CUB-200(speciesId 118, spotted catbird) 6.4. Optimization scheme evaluation In this set of experiments, we evaluate the efficiency of the proposed greedy approximation scheme to accurately choose plans of high quality that take into account diversity. For a fair comparison, we adapted the same scheme to the simpler baselines NBI and NBAP. Greedy approximation. Figure 6 depicts the development of information gain with varying budget for the optimal plan (OPT), the approximate plan computed by the greedy algorithm (GA) and three pure plans which take votes only from a single access path. The quality of GA approximation is very close to the optimal plan. The third access path in ProbabilitySports (contains users with more than average relative score for the season) reaches the highest information gain compared to the others. Nevertheless, the quality of the plan is saturated for higher budget values which encourages the optimization scheme to select votes from other access paths as well. For the same experiment, NBAP model with the same optimization strategy chooses votes only from the third access path. Crowd access optimization. Finally, we combine the model and the optimization techniques to evaluate the impact of both. Figure 7 shows results for both datasets. In ProbabilitySports, APM and NBAP improve over MV and NBI with respect to accuracy and negative loglikelihood. Since the plans for NBI target concrete users in the competition, the accuracy for budget values less than 10 is low because not all the targeted users voted for all the events. Thus, we also present the performance of NBI with random access of votes (NBI+RND). Also, in this dataset and configuration the mixed plans do not o↵er clear improvement in terms of accuracy but only in terms of negative loglikelihood. This happens because the access paths here are inherently designed based on the accuracy of workers. In contrast, for CUB-200 where the division of access paths is based on attributes, the discrepancy between NBAP and APM is higher. Quality Assurance and Crowd Access Optimization 7. Conclusion In this work, we introduced a new approach for representing crowd diversity named as Access Path Model. We showed that this model can be used to seamlessly handle critical problems in crowdsourcing such as quality assurance and crowd access optimization. Experimental evaluation on real-world datasets demonstrated that leveraging APM along with greedy approximation schemes can improve the quality of results compared to individual models and majority vote. As future work, we plan to investigate on the problem of automatically discovering and filtering access paths when no intuitive configuration is available. References Krause, Andreas and Guestrin, Carlos E. Nearoptimal nonmyopic value of information in graphical models. arXiv preprint arXiv:1207.1394, 2012. Lamberson, PJ and Page, Scott E. Optimal forecasting groups. Management Science, 58(4):805–810, 2012. Li, Hongwei, Zhao, Bo, and Fuxman, Ariel. The wisdom of minority: Discovering and targeting the right group of workers for crowdsourcing. In Proc. of the 23rd WWW, 2014. Liu, Qiang, Peng, Jian, and Ihler, Alexander T. Variational inference for crowdsourcing. In NIPS, pp. 701–709, 2012. Chen, Yuxin and Krause, Andreas. Near-optimal batch mode active learning and adaptive submodular optimization. In Proc. of ICML, 2013. Marcus, Adam, Wu, Eugene, Karger, David R, Madden, Samuel, and Miller, Robert C. Crowdsourced databases: Query processing with people. In CIDR, 2011. Dawid, Alexander Philip and Skene, Allan M. Maximum likelihood estimation of observer error-rates using the em algorithm. Applied statistics, pp. 20– 28, 1979. Moon, Todd K. The expectation-maximization algorithm. Signal processing magazine, IEEE, 13(6):47– 60, 1996. De Campos, Luis M. A scoring function for learning bayesian networks based on mutual information and conditional independence tests. JMLR, 7:2149– 2187, 2006. Franklin, Michael J, Kossmann, Donald, Kraska, Tim, Ramesh, Sukriti, and Xin, Reynold. Crowddb: answering queries with crowdsourcing. In Proceedings of the 2011 ACM SIGMOD, pp. 61–72. ACM, 2011. Ho, Chien-Ju, Jabbari, Shahin, and Vaughan, Jennifer W. Adaptive task assignment for crowdsourced classification. In Proc. of ICML, pp. 534–542, 2013. Hong, Lu and Page, Scott E. Groups of diverse problem solvers can outperform groups of highability problem solvers. Proceedings of the National Academy of Sciences of the United States of America, 101(46):16385–16389, 2004. Ipeirotis, Panos and Gabrilovich, Evgeniy. Quizz: Targeted crowdsourcing with a billion (potential) users. In WWW, 2014. Karger, David R, Oh, Sewoong, and Shah, Devavrat. Budget-optimal crowdsourcing using low-rank matrix approximations. In 49th Annual Allerton Conference, pp. 284–291. IEEE, 2011. Khuller, Samir, Moss, Anna, and Naor, Joseph Seffi. The budgeted maximum coverage problem. Information Processing Letters, 70(1):39–45, 1999. Parameswaran, Aditya Ganesh, Park, Hyunjung, Garcia-Molina, Hector, Polyzotis, Neoklis, and Widom, Jennifer. Deco: declarative crowdsourcing. In Proc. of CIKM, pp. 1203–1212, 2012. Probability Sports. www.probabilitysports.com. Selinger, P Griffiths, Astrahan, Morton M, Chamberlin, Donald D, Lorie, Raymond A, and Price, Thomas G. Access path selection in a relational database management system. In Proceedings of the 1979 ACM SIGMOD, pp. 23–34. ACM, 1979. Surowiecki, James. The wisdom of crowds. Random House LLC, 2005. Wang, Jing and Ipeirotis, Panagiotis. Quality-based pricing for crowdsourced workers. 2013. Welinder, Peter, Branson, Steve, Mita, Takeshi, Wah, Catherine, Schro↵, Florian, Belongie, Serge, and Perona, Pietro. Caltech-ucsd birds 200. 2010. Whitehill, Jacob, Ruvolo, Paul, Wu, Tingfan, Bergsma, Jacob, and Movellan, Javier R. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In NIPS, volume 22, pp. 2035–2043, 2009. Zhou, Dengyong, Platt, John C, Basu, Sumit, and Mao, Yi. Learning from the wisdom of crowds by minimax entropy. In NIPS, pp. 2204–2212, 2012.
© Copyright 2024