Decision support using social media: how to deal with different opinions? Robin De Mol Promotor: prof. dr. Guy De Tré Begeleider: Ana Tapia Rosero Masterproef ingediend tot het behalen van de academische graad van Master in de ingenieurswetenschappen: computerwetenschappen Vakgroep Telecommunicatie en Informatieverwerking Voorzitter: prof. dr. ir. Herwig Bruneel Faculteit Ingenieurswetenschappen en Architectuur Academiejaar 2012-2013 Faculty of Engineering and Architecture Department Telecommunicatie en Informatieverwerking Decision support using social media: how to deal with different opinions? by Robin De Mol Promotor: prof. dr. G. De Tr´e Thesis coach: ir. A. Tapia Master thesis submitted to obtain the degree of master of science in computer science: software engineering 2012–2013 Decision support using social media: how to deal with different opinions? by Robin De Mol Master thesis submitted to obtain the degree of master of science in computer science: software engineering 2012–2013 Ghent University Faculty of Engineering and Architecture Promotor: prof. dr. G. De Tr´e Summary Making decisions is part of our every day life. Some decisions are more important than others, and some are more complex. They are made in different contexts, ranging from individual to group decisions, both from a social and a business point of view. Due to the increase in communication capabilities between people worldwide through the rise of the internet, and the amount of resources made available this way, group decision support is becoming more and more actual. The development of group decision support software is currently very scarcely documented as it is usually created in-house. In this work, an extension to an already existing modern decision support algorithm is presented allowing multiple applications of group decision support. To this avail, aggregation techniques are discussed. Due to the fact aggregation techniques come with information loss, we define a new measure to minimize this loss by reintroducing it as a degree of confidence. The concept of pre-aggregation is explained in depth. Nowadays, using soft computing techniques, the opinions of several experts could be expressed through membership functions setting their preference levels providing a flexible way to express the desired values for the attributes of the problem. Regardless of their expertise, the experts are i clustered into groups based on their opinions’ similarity. These are further merged into a single result, which is then used as an input to an already proven decision support system. This provides the luxury to keep existing decision support algorithms as they are, with only a slight need for modification. The clustering step uses a new technique based on the shape-similarity of membership functions. The representativity of the cluster, or the degree in which it represents what is more desired by the consulted experts is referred to as the confidence. It is calculated based on: the fraction of experts in a cluster, the weights of these experts and the degree of similarity of the opinions of the experts in a cluster. In a second iteration, the confidence of each cluster can be readjusted based on its distance to other clusters. For each of these steps there are alternative approaches, which are compared in the discussion. The confidence levels are propagated through the regular decision support aggregation structure and will result in a new output parameter for the evaluated systems, the overall confidence. This represents the measure in which we can trust the result to be representative for the experts’ opinions. The way this is calculated is also covered. To illustrate the research, a case study is introduced with generated test data. The results of the calculations using the proposed techniques are presented and discussed. Finally, there is a section on future work introducing other techniques that can be investigated, and their advantages and disadvantages. I would like to thank professor De Tr´e for allowing me to conduct this research at the faculty, Ana and her colleagues for coaching me during this thesis. The author gives permission to make this master dissertation available for consultation and to copy parts of this master dissertation for personal use. In the case of any other use, the limitations of the copyright have to be respected, in particular with regard to the obligation to state expressly the source when quoting results from this master dissertation. Keywords: LSP, GDSS, Aggregation, Confidence, Soft Computing. ii Decision support using social media: how to deal with different opinions? Robin De Mol Supervisor(s): prof. dr. Guy De Tr´e, Ana Tapia-Rosero Abstract— In this work we propose an extended group decision support technique to deal with a group of possibly different opinions rather than single person’s input. Information can be gathered from social media using a soft computing technique that allows persons to express their desires in the form of membership functions. These focus on working with linguistic terms over mathematically “crisp” values and are grouped into clusters based on their shape-similarity, for which the calculation thereof a new technique is introduced. These clusters represent groups of people with opinions that are alike. We continue by defining a measure for the gravity of each cluster to represent how important it is. This helps finding the “average” opinion of the consulted people which is then used in the evaluation process of an existing decision support system. Through a slight modification it generates an additional output parameter which reflects the accuracy of the results, moreover the representativity of the output with respect to the actual average opinion. This technique allows us to evaluate large amounts of possibly different opinions using existing group decision support systems which was previously impossible. Keywords—Logic Scoring of Preferences (LSP), Group decision support systems (GDSS), Aggregation, Confidence, Soft Computing I. I NTRODUCTION ODAY , people can reach anyone anywhere thanks to the internet. This creates a lot of new possibilities, also from a business point of view. Companies now have to compete with each other globally. To be the most successful, they need to create products that are be innovative and user-friendly. There has been a clear trend of shifting funding from the production line to the research department. One of the techniques that is gaining popularity is including as many clients as possible in the business decision making process concerning products in development. The clients are asked to give their opinions so they have a share in the selection of certain features, something which has long been done only by experts. This new way of gaining information can rapidly lead to huge amounts of different opinions, which are troublesome to handle. We will discuss a few techniques that help the analysis of large amounts of data. Furthermore, we will show how this can be used in decision support systems (DSS). First, we introduce some general terminology. After that, we introduce an aggregation methodology to analyse the information based on clustering. This is followed by an explanation of a new concept called confidence. Finally, there is a short conclusion. T II. T ERMINOLOGY In the context of decision making and DSS some terms are generally used to refer to certain concepts. Usually, there is a given problem for which we want to find the best solution. The analysis of problem leads to a hierarchically structured requirements tree of which the leaves are called the performance variables. Possible solutions to the problem are called (candidate) systems. Normally we have a set of systems and we want to find out which is the best to solve the problem. For each performance variable, a scoring function called the elementary criterion is specified, which indicates the values a system should have to satisfy the corresponding requirement. Evaluating a system will produce a set of elementary preference scores, one per performance variable, indicating how well the system satisfies all requirements individually. These are combined through an aggregation structure, which leads to the system’s global preference score. This indicates how well the system is fit to solve the problem as a whole. After calculating this for each system, the results are analysed by the decision maker, a person (or possibly a group) responsible for selecting the best system. What differentiates a group decision support system is the fact the elementary criteria of more than one person are used in the evaluation. The consulted people are called experts, regardless of their actual expertise. They represent their opinions using membership functions. These are functions which define a degree of satisfaction between 0 and 1 for each value in the range of the performance variable. This specifies which of the values are desired and which are not. Any intermediary value between 0 and 1 indicates a partial tolerance for the value. In the context of decision support, this degree of satisfaction represents the elementary preference score of the corresponding performance variable of a system. Fig. 1. Example membership functions where the Y-axis has been converted to a percentage of satisfaction, illustrating how a membership function can be used to imply a range of desired values for a variable. III. AGGREGATION Our goal is to aggregate the large amounts of information in order to reduce the complexity of the data to a handleable size. For this, we have chosen for a “pre-aggregation” technique, where the aggregation step occurs before the decision support algorithm evaluates the systems. The results of doing this are twofold: • the experts are combined into a single, merged expert which represents the “average opinion” before the decision support algorithm evaluates the systems; • we can use existing DSS more or less without changes. The representative of the average opinion is only an approximation. The correctness of its representativity is indicated by the confidence parameter, which is discussed later on. To aggregate the elementary criteria based on their similarity we first need to define the similarity between two membership functions. For this, we introduce an alternative notation for them based on two aspects: their shape and the relative lengths of their components. This is called the shape-symbolic notation and consists of a shape-string and a feature-string. A. The shape-string The shape-string describes the shape of an elementary criterion. Generally, an elementary criterion consists of several typical components such as high values (indicating preferred values for the performance variable), low values (indicating nonpreferred values) and slopes connecting them. These components can be easily identified and the shape-string can be constructed on sight by stringing together the characters that represent these typical components. B. The feature-string The feature-string contains information about the relative lengths of the components of an elementary criterion. We use a soft computing technique to evaluate the length of a component. Instead of measuring its exact value we classify it into one of the following ranges: extremely short, very short, short, normal, long, very long, extremely long. The actual implied range of valued for these terms depends on the total length of the elementary criterion being translated. These intervals were chosen based on a research which has shown that between seven and nine distinct levels are the optimal amount for visually-based human decomposition. C. The shape-symbolic notation and similarity Together, the shape-string and the feature-string represent the shape-symbolic notation of an elementary criterion, which consists of shape-symbolic characters. These are couples of a shape component and a length component. The similarity between two elementary criteria is based on a modified version of the Levenshtein distance between two words. Originally, this works with characters and basic actions such as insertion, deletion and replacement. We extended these to work with shape-symbolic characters. Each action induces a penalty defined by the gravity of the action. For example, replacing an extremely short character with a very long character will have a high cost, which will be even higher if the shape component is also different. Using this measure and given a set of elementary criteria, we can calculate a similarity matrix showing the pairwise similarity between every pair of two criteria. This matrix is one everywhere on the main diagonal and is symmetric. D. Clustering After calculating the similarity matrix, the elementary criteria are grouped into clusters. This is done by using a hierarchical bottom-up clustering technique, combining the most similar criteria first. This approach has the interesting property that the Fig. 2. Example of the shape-symbolic notation of a membership function, showing its symbolic characters. resulting clusters are unique. It is important to keep a clear view on what we have done so far. We have gathered information from a large crowd of people. They have all given their preferred range of values for each of the performance variables of the problem. These are represented using elementary criteria. We then cluster those into groups of people with similar opinions per performance variable based on the shape-similarity method. In order to go to the next step of the decision support system, we need to elect one representative per performance variable for evaluation. To facilitate this decision we introduce the concept of confidence. IV. C ONFIDENCE The concept confidence appears at different stages in the decision making process, each with its own definition. We have already mentioned it indicates the representativity of the average opinion of the consulted experts after clustering. It also facilitates the election process in finding this average opinion. Confidence is also used in the presentation of the results. Much like the aggregation of the elementary preference scores into one global preference per system, the confidence in each of the representative elementary criteria, which is the same as the confidence of the cluster it is selected from, is propagated and aggregated into a global confidence value per system. Then we can interpret this parameter as a degree of trust we can put in the accuracy of the calculated global preference. The three distinct instantiations of confidence are listed: • cluster confidence for each cluster • elementary confidence for the elementary preference score of the elected representative criterion • global confidence for each system (also called system confidence) A. Cluster and elementary confidence The first time we use the confidence concept is after the clustering algorithm has run. The goal is to find a measure for the gravity of each of the clusters. Therefore we define the cluster confidence as a combination of the relative size of that cluster and the degree of similarity of the elementary criteria in it, referred to as the compactness of the cluster. The relative size can further be weighted by adding weights to the experts to distinguish them by their level of expertise. The compactness can be calculated in different ways but the concept is the same for each of them. This measure defines how similar the opinions in the cluster are. One possible approach of calculating this is by taking the similarity of the most and least typical criteria of a cluster. These can be found by calculating the members of a cluster that have respectively the highest and lowest average similarity with all other elements in it. Another method is by calculating and normalizing the enclosed surface between the upper and lower bounds of an interval valued fuzzy set enclosing all elementary criteria of the cluster. These two factors are then combined with weights, depending on relative importance of the similarity of the opinions and the importance of representing the majority of the population. This produces a single value, the cluster confidence. Now we need to select a criterion per performance variables from the clusters. This should represent the average opinion of the consulted population on what they desire as values for the variable. It is often simplest to select the most typical value from the cluster with the highest cluster confidence. Alternatively, one could try to merge the most typical values of the top k clusters, but this is often troublesome and can lead to illogical results such as a criterion that accepts all values or none. The confidence in the representativity of the elected criterion and furthermore the elementary preference score that follows from its evaluation is the same as the cluster confidence of the cluster it was selected from. This is the elementary confidence score, which is in fact just the highest cluster confidence. B. System confidence and goodness After choosing the representatives we can evaluate the systems using the underlying decision support system. We have chosen to extend Logic Scoring of Preferences (LSP), a modern and flexible DSS based on soft computing techniques. Each system is evaluated in turn. First, all performance variables of a system are evaluated using the elected criteria, resulting in a vector a elementary preference scores. These are then combined into a single global preference score through an LSP aggregation structure which defines the importance of the attributes. This is constructed in advance by the decision maker and depends largely on the original decomposition of the problem. Similarly to a global preference, a global confidence score is calculated, which is called the global confidence or system confidence. The results of the evaluation step are not trivial to interpret. Each system has two scores which makes it hard to rank them. To facilitate the final step in the decision making we propose to combine the preference score and the confidence value into a single parameter. Note that this can also be combined with a possible cost analysis, as discussed in [10]. We call the combined value the “goodness” of a system, which represents both the degree in which the system satisfies the problem’s requirements and the confidence we have in the accuracy thereof. It is calculated by taking the weighted average of the global preference and the system confidence. This allows us to rank the systems by their goodness. V. C ONCLUSION In this work we have proposed a technique to aggregate large amounts of data. This has allowed us to extend an existing DSS in such a way that it remains practically unchanged while ex- tending its field of applicability. Now it can handle the input of more than one person. There is no real limit to the size of the input. The amount of people consulted can vary from a small board of actual experts to a large population of clients to a combination of both using weights to distinguish both. This allows us to use social media as a source of input without having to worry about how to combine the different opinions of separate individuals. R EFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] J.J. Dujmovi´c and Fang W. Y., “Reliability of lsp criteria,” 2004. P.J.G. Lisboa, H. Wong, P. Harris, and R. Swindell, “A bayesian neural network approach for modelling censored data with an application to prognosis after surgery for breast cancer,” Artificial Intelligence in Medicine, vol. 28, pp. 1–25, 2003. P.G.W. Keen, “Decision support systems: The next decade,” Decision Support Systems, vol. 3, pp. 253–265, 1987. G.P. Huber, “Issues in the design of group decision support systems,” MIS quarterly, pp. 195–204, September 1984. J.P. Shim, M. Warkentin, J.F. Courtney, D.J Power, R. Sharda, and C. Carlsson, “Past, present, and future of decision support technology,” Decision Support Systems, vol. 33, pp. 111–126, 2002. A. Tapia-Rosero, A. Bronselaer, and G. De Tr´e, “Similarity of membership functions - a shaped based approach,” in Proceedings of the 4th International Joint Conference on Computational Intelligence. 2012, pp. 402– 409, SciTePress - Science and Technology Publications. A. Tapia-Rosero, A. Bronselaer, and G. De Tr´e, “A shape-similarity based method for detecting similar opinions in group decision-making,” Information Sciences Sp. Iss. on New Challenges of Computing with Words in Decision Making, 2013. G. DeSanctis and R.B. Gallupe, “Group decision support systems: A new frontier,” 1985. G. DeSanctis and R.B. Gallupe, “A foundation for the study of group decision support systems,” Management Science, vol. 33, no. 5, May 1987. H.-J. Zimmermann, “Fuzzy programming and linear programming with several objective functions,” Fuzzy Sets and Systems, vol. 1, pp. 45–55, 1978. G. De Tr´e, Vage Databanken. J.J. Dujmovi´c and W.Y. Fang, “An empirical analysis of assessment errors for weights and andness in lsp criteria,” 2004, San Francisco State University, Department of Computer Science. J.J. Dujmovi´c, “A comparison of andness/orness indicators,” San Francisco State University, Department of Computer Science. J.J. Dujmovi´c, “Optimum location of an elementary school,” . J.J. Dujmovi´c, G. De Tr´e, and S. Dragi´cevi´c, “Comparison of multicriteria methods for land-use suitability assessment,” 2009. J.J. Dujmovi´c, G. De Tr´e, and N. Van De Weghe, “Lsp suitability maps,” 2009. R. Bodea and E. El Sayr, “Code coverage tool evaluation,” 2008. J.J. Dujmovi´c and G. De Tr´e, “Multicriteria methods and logic aggregation in suitability maps,” 2011. J.J. Dujmovi´c and H. Nagashima, “Lsp methods and its use for evaluation of java ides,” 2005. J.J. Dujmovi´c, “Characteristic forms of generalized conjunction/disjunction,” . Didier Dubois and Henry Prade, Fuzzy Sets and Systems: Theory and Applications, Academic Press, Inc., 1980. George J Klir and Bo Yuan, Fuzzy Sets and Fuzzy Logic: Theory and Applications, Prentice Hall, 1995. Dan Gusfield, Algorithms on strings, trees, and sequences: computer science and computational biology, Cambridge University Press, New York, NY, USA, 1997. Jozo J. Dujmovi´c and Henrik Legind Larsen, “Generalized conjunction/disjunction,” International Journal of Approximate Reasoning, vol. 46, no. 3, pp. 423–446, Dec. 2007. Beslissingsondersteuning en sociale media: hoe moeten we omgaan met grote hoeveelheden uiteenlopende meningen? Robin De Mol Begeleider(s): prof. dr. Guy De Tr´e, Ana Tapia-Rosero Abstract—In dit document stellen we een uitbreiding voor op bestaande (groeps)beslissingsondersteuningsalgoritmen voor om te kunnen omgaan met grote hoeveelheden data, wat tot nu toe onmogelijk was. Er wordt gebruik gemaakt van sociale media als bron van informatie. Via die medium worden mensen gevraagd om hun mening omtrent bepaalde aspecten. Ze dienen deze in te geven in de vorm van lidmaatschapsfuncties, een soft computing techniek die toestaat om te rekenen met woorden in plaats van met exacte wiskunde. Deze functies worden gegroepeerd in clusters op basis van hun gelijkaardigheid. De manier om deze te berekenen is gebaseerd op een nieuwe techniek afgeleid van de Levenshtein-afstand. Daarna berekenen we de representativiteit van elke cluster ten opzichte van de gemiddelde mening van de populatie. Deze data wordt gebruikt tijdens de evaluatie van het beslissingsondersteuningsprogramma. Mits een kleine aanpassing wordt er een extra parameter berekend die aangeeft hoe accuraat de resulaten zijn. Deze technieken maken het mogelijk om grote hoeveelheden mogelijks uiteenlopende meningen te evalueren door gebruik te maken van bestaande algoritmen. Sleutelwoorden— Logic Scoring of Preferences (LSP), Groepsbeslissingsondersteuningsprogrammas (GDSS), Aggregatie, Vertrouwen, Soft Computing I. I NTRODUCTION ANDAAG kan men om het even wie om het even waar bereiken dankzij het internet. Dit cre¨eert een groot aantal mogelijkheden, alsook van een economisch standpunt. Bedrijven zijn nu competitief met elkaar op globale schaal. Om als beste naar voor te komen moeten ze zich steeds meer focussen op wat de klant precies wil van een product. Dit heeft gezorgd voor een groei in de onderzoeksafdeling van veel ondernemingen, waarbij er veel aandacht geschonken wordt aan de mening van de klant. Lang werden enkel experts betrokken bij belangrijke beslissingen maar dankzij de mogelijkheden die het internet biedt wordt ook het clienteel steeds meer geconsulteerd. Zo kunnen ze hun mening geven over opkomende producten. Dit heeft echter ook tot gevolg dat men veel meer informatie moet verwerken. In dit werk bespreken we een manier om bestaande beslissingsondersteuningssoftware uit te breiden op zodanige wijze dat het mogelijk wordt om deze grote hoeveelheden informatie gemakkelijk en overzichtelijk te verwerken. We bespreken een aantal technieken die ons hiermee zullen helpen. Vervolgens tonen we ook aan hoe dit gebruikt kan worden voor bestaande beslissingsonderteuningssystemen. Eerst introduceren we een aantal belangrijke termen. Vervolgens gaan we dieper in op het proces van de aggregatie van de informatie. Daarna bespreken we het concept “vertrouwen”. Tot slot is er een korte conclusie. V II. T ERMINOLOGIE In de context van beslissingsondersteuningssoftware zijn er een aantal termen die gebruikt worden om te verwijzen naar bepaalde aspecten. In het meest algemene scenario beschouwen we een probleem waar we de beste oplossing voor zoeken. Mogelijke oplossingen heten (candidaat-) systemen. De analyse van een probleem leidt tot een hi¨erarchische boom waarvan de bladeren performantievariabelen worden genoemd, of kortweg variabelen. Voor elke variabele wordt er een evaluatiefunctie opgesteld, dit zijn de elementaire criteria. Deze worden gebruikt om de waarden van de variabelen van de systemen te evalueren, wat leidt tot elementaire voorkeursscores. Per systeem worden deze geaggregeerd tot een globale voorkeursscore, een maatstaaf die aangeeft in welke mate het systeem geschikt is om het probleem op te lossen. Eens alle systemen ge¨evalueerd zijn worden de resultaten voorgelegd aan de beslissingsmaker, een persoon (of mogelijks een groep) die instaat voor de selectie van het meest geschikte systeem. Merk op dat het beslissingsondersteuningsprogramma slechts dient als ondersteuning voor het maken van een beslissing; het maakt de beslissing niet zelf. Wat een groepsbeslissingsondersteuningsprogramma onderscheidt van het voorgaande is het feit dat de elementaire criteria afgeleid worden van een groep experts. Dit kunnen echte experts zijn, maar de term slaat ook op een mogelijks grote groep mensen die geconsulteert worden via sociale media. Elke persoon die zijn of haar mening ingeeft wordt een expert genoemd. Hoe meer mensen betrokken worden in dit proces, hoe meer meningen men kan verzamelen. Per variabele wordt men gevraagd om een functie op te stellen die aangeeft welke waarden men verkiest. Deze functies zijn afkomstig van soft computing en heten lidmaatschapsfuncties. Als bereik hebben ze een waarde tussen 0 en 1 (inclusief) die aangeeft welke waarden uit het domein (een performantievariabele) preferabel zijn. In de context van beslissingsondersteuning zijn dit de elementaire criteria en zijn de waarden bekomen na evaluatie de elementaire voorkeursscores. III. AGGREGATIE Ons doel is om de grote hoeveelheden data te reduceren om de complexiteit ervan te verlagen. We hebben gekozen voor een “pre-aggregatie” techniek, waarbij de aggregatie gebeurt voor de evaluatie door het beslissingsondersteuningssysteem van de systemen. De gevolgen hiervan zijn tweevoudig: ´ e´ n • de gegevens van alle experts worden gecombineerd in e Fig. 1. Een voorbeeld van lidmaatschapsfuncties waarbij de Y-as omgezet is naar een percentage. Op de X-as staan waarden uit het domein van de overeenkomstige gerepresenteerde term. enkele, algemene expert die de “gemiddelde mening” representeert, voor de evaluatie van de systemen gebeurt; • we kunnen bestaande beslissingsondersteuningssystemen gebruiken zonder veel aanpassingen. De algemene expert is slechts een benadering van de gemiddelde mening. De correctheid van zijn representativiteit wordt aangegeven door een vertrouwensparameter, waar we later verder op ingaan. Om de meningen te kunnen groeperen op basis van hun gelijkheid dienen we eerst gelijkheid tussen twee lidmaatschapsfuncties te defini¨eren. Hiervoor introduceren we een alternatieve notatie gebaseerd op twee aspecten: hun vorm en de relatieve lengtes van hun componenten. Samen vormen zij de symbolische notatie van een lidmaatschapsfunctie. A. De vorm-notatie De vorm-notatie beschrijft de vorm van de elementaire criteria. Algemeen bestaan die uit een aantal typische componenten zoals hoge waarden (die aangeven wat voorkeurswaarden uit het domein zijn), lage waarden (die ongewenste waarden aangeven) en de delen die deze verbinden. De componenten kunnen gemakkelijk op zich ge¨ıdentificeerd worden wat het opstellen van de vorm-notatie gemakkelijk maakt. B. De lengte-notatie De lengte-notatie bevat informatie over de relatieve lengtes van de componenten. Hier wordt gebruik gemaakt van soft computing technieken om de lengte van componenten onder te verdelen in e´ e´ n van de volgende zeven categorie¨en: extreem kort, zeer kort, kort, normaal, lang, zeer lang en extreem lang. De eigenlijke intervallen die deze categorie¨en omvatten is afhankelijk van de totale lengte van de elementaire criteria. C. De symbolische notatie en gelijkaardigheid Het is duidelijk dat de vorm-notatie en de lengte-notatie even lang zijn qua aantal karakters. Samen vormen ze de symbolische notatie van een elementair criterium. Ze bestaan uit een opeenvolging van symbolische karakters, die elk in feite een koppel zijn van een vorm-component en een lengte-component. De gelijkaardigheid tussen twee elementaire criteria is gebaseerd op de Levenshtein afstandsmaat tussen twee termen. Deze werkt origineel op basis van een aantal basisoperaties op karakters maar hebben we hier uitgebreid om te werken met symbolische karakters. We beschouwen drie acties: invoegen, vervangen en verwijderen. Elke actie heeft een bepaalde kost die bepaalt hoe zwaar de operatie is. Als men bijvoorbeeld een extreem kort karakter door een zeer lang karakter wenst te vervangen zal de kost hoog zijn, en zelfs nog hoger wanneer ook de vorm-component verschilt. De kost van invoegen of verwijderen hangt ook af van de lengte van het symbolisch karakter. Met deze techniek zijn we in staat om een afstand te meten tussen twee symbolische notaties. Dit doen we voor elk paar elementaire criteria en zo bekomen we een gelijkheidsmatrix per performantievariabele. Deze matrix is symmetrisch rond de hoofddiagonaal en is daarop overal 1. Fig. 2. Voorbeeld van de symbolische notatie van een lidmaatschapsfunctie.. D. Clusteren Na het opstellen van de gelijkheidsmatrix kunnen we de elementaire criteria groeperen in clusters. Dit doen we op hi¨erarchische wijze op basis van meest gelijk eerst. Dit heeft als interessante eigenschap dat de finale clusters uniek zijn. Het is belangrijk om een goed overzicht te houden over wat we zover al bereikt hebben. We hebben informatie verzameld van een grote groep mensen. Zij hebben elk hun mening gegeven in de vorm van een lidmaatschapsfunctie over wat ze goed en niet goed vinden voor een aantal performantievariabelen van een probleem. Deze functies hebben we omgezet naar hun symbolische notatie en gegroepeerd op basis van hun onderlinge gelijkaardigheid. Om over te gaan naar de volgende stap dienen we per performantievariabele e´ e´ n vertegenwoordigend criterium te kiezen. Om deze keuze te kunnen maken introduceren we een nieuw concept, het vertrouwen. IV. V ERTROUWEN Het vertrouwen is een breed concept dat op verschillende plaatsen in het proces opduikt, telkens met een eigen definitie. We hebben reeds vermeld dat het weergeeft hoe goed het verkozen criterium de gemiddelde mening representeert, maar ook in de resultaten wordt het concept gebruikt. Net zoals de finale voorkeursscore geaggregeerd wordt uit elementaire voorkeursscores wordt er ook een finaal vertrouwen berekend op basis van de elementaire vertrouwens, afkomstig van de elementaire voorkeursscores van de elementaire criteria. Op dat moment dient deze parameter als indicator voor de mate waarin we kunnen vertrouwen in de juistheid van de finale voorkeursscore. We kunnen drie verschillende instanties waar we vertrouwen aantreffen zijn de volgende: • clustervertrouwen, bij elke cluster • elementair vertrouwen, bij de elementaire voorkeursscore het verkozen representatief elementair criterion globaal vertrouwen, bij elk systeem (ook systeemvertrouwen en finaal vertrouwn) • A. Clustervertrouwen en elementair vertrouwen De eerste keer dat we het concept vertrouwen gebruiken is na het clusteralgoritme zijn werk heeft gedaan. We zoeken een maat van belang voor elk van de clusters. Hiervoor stellen we dat het clustervertrouwen, per cluster, afhankelijk is van de relatieve grootte van de cluster en van de mate waarin de leden van de cluster gelijkaardig zijn. De relatieve grootte van de clusters is eenvoudig te berekenen en kan verder nog uitgebreid worden met gewichten voor elke expert om een onderscheid te kunnen maken op basis van hun vakkundige kennis. De gelijkaardigheid de elementen binnen een cluster kan op verschillende manieren berekend worden. Een mogelijke aanpak is door de gelijkaardigheid van het meest en minst typerende element van de cluster te nemen. Deze elementen worden berekend op basis van hun gemiddelde gelijkheid of ongelijkheid ten opzichte van de andere elementen in de cluster. Een andere aanpak ligt meer in de richting van de vaagverzamelingenleer, waarbij we gebruik maken van de oppervlakte tussen de onderen bovengrens van de interval-gebaseerde vage verzameling die de cluster omvat. De gewogen relatieve grootte en de algemene gelijkheid worden gewogen gecombineerd, waarbij men de gewichtsparameter kan aanpassen om ofwel de impact van de grootte van de clusters ofwel van gelijkheid binnen een cluster te vergroten. De bekomen waarde is het clustervertrouwen. Nu moeten we een enkel vertegenwoordigend criterium zoeken per performantievariabele. Dit moet de algemene mening van de groep over wat goede en slechte waarden zijn representeren. Vaak het het eenvoudigst om de meest typerende waarde van de cluster met het hoogste vertrouwen hiervoor te kiezen. Het op een of andere manier samenvoegen van de typerende waarden van alle clusters zou te complex zijn en vaak ook leiden tot wiskundig juist maar onlogische resultaten zoals lidmaatschapsfuncties die alle waarden aanvaarden of alle waarden verwerpen. Het vertrouwen in dit vertegenwoordigend criterium is overeenkomstig met het clustervertrouwen van de cluster waartoe het criterium behoort. Hier spreken we van het elementair vertrouwen. Het geeft aan hoe accuraat deze vertegenwoordiger de gemiddelde mening voorstelt. vertrouwens samengevoegd tot een globaal vertrouwen, het systeemvertrouwen. De resultaten van de evaluatie zijn in niet triviaal te interpreteren. Het rangschikken van de systemen is niet evident gezien ze elk gekenmerkt worden door twee parameters. Indien het aantal systemen groot is en er geen overzicht kan gehouden worden is het mogelijk om een laatste stap uit te voeren waarbij de voorkeur en het vertrouwen gecombineerd worden met bepaalde gewichten. Dit is eventueel samen te voegen met de kosten- en voorkeursanalyse zoals voorgesteld in [10]. De bekomen waarde heet de goedheid van het systeem en geeft de mate weer waarin het systeem voldoet aan de vereisten van het probleem (de mate waarin het geschikt is als oplossing) en de mate waarin we zeker zijn over de juistheid van dit resultaat. Eens dit berekend is voor elk systeem kan men ze rangschikken op basis van dalende goedheid. V. C ONCLUSIE In dit werk hebben we een techniek opgesteld om grote hoeveelheden informatie te aggregeren. Dit staat ons toe om bestaande beslissingsondersteuningssoftware uit te breiden zodat hun huidige werking ongeschonden blijft terwijl hun toepassingsdomein wordt vergroot. Op deze manier wordt het mogelijk om meer dan e´ e´ n persoon te consulteren bij het maken van een beslissing. De hoeveelheid mensen is niet begrensd: men kan werken met een kleine groep van vakdeskundigen of met een enorme groep aan mensen die men contacteert via sociale media. R EFERENTIES [1] [2] [3] [4] [5] [6] [7] B. Systeemvertrouwen en goedheid Na het kiezen van een vertegenwoordigend criterium per performantievariabele zijn we klaar om de systemen e´ e´ n voor een te evalueren. Hiervoor gebruiken we als basis Logic Scoring of Preferences (LSP), een bestaand beslissingsondersteuningssysteem. Eerst berekenen we voor elke performantievariabele hoe goed elk systeem hieraan voldoet. Hiervoor gebruiken we de gekozen vertegenwoordigende criteria. Dit levert voor elk systeem een set elementaire voorkeurswaarden op met overeenkomstige elementaire vertrouwens. Deze worden gecombineerd tot een globale voorkeursscore per systeem door de LSP aggregatiestructuur. Deze geeft het onderling belang van de performantievariabelen weer en leunt nauw aan bij het oorspronkelijke resultaat van de decompositie van het probleem in zijn attributen. Gelijkaardig worden de elementaire [8] [9] [10] [11] [12] [13] [14] [15] [16] J.J. Dujmovi´c and Fang W. Y., “Reliability of lsp criteria,” 2004. P.J.G. Lisboa, H. Wong, P. Harris, and R. Swindell, “A bayesian neural network approach for modelling censored data with an application to prognosis after surgery for breast cancer,” Artificial Intelligence in Medicine, vol. 28, pp. 1–25, 2003. P.G.W. Keen, “Decision support systems: The next decade,” Decision Support Systems, vol. 3, pp. 253–265, 1987. G.P. Huber, “Issues in the design of group decision support systems,” MIS quarterly, pp. 195–204, September 1984. J.P. Shim, M. Warkentin, J.F. Courtney, D.J Power, R. Sharda, and C. Carlsson, “Past, present, and future of decision support technology,” Decision Support Systems, vol. 33, pp. 111–126, 2002. A. Tapia-Rosero, A. Bronselaer, and G. De Tr´e, “Similarity of membership functions - a shaped based approach,” in Proceedings of the 4th International Joint Conference on Computational Intelligence. 2012, pp. 402– 409, SciTePress - Science and Technology Publications. A. Tapia-Rosero, A. Bronselaer, and G. De Tr´e, “A shape-similarity based method for detecting similar opinions in group decision-making,” Information Sciences Sp. Iss. on New Challenges of Computing with Words in Decision Making, 2013. G. DeSanctis and R.B. Gallupe, “Group decision support systems: A new frontier,” 1985. G. DeSanctis and R.B. Gallupe, “A foundation for the study of group decision support systems,” Management Science, vol. 33, no. 5, May 1987. H.-J. Zimmermann, “Fuzzy programming and linear programming with several objective functions,” Fuzzy Sets and Systems, vol. 1, pp. 45–55, 1978. G. De Tr´e, Vage Databanken. J.J. Dujmovi´c and W.Y. Fang, “An empirical analysis of assessment errors for weights and andness in lsp criteria,” 2004, San Francisco State University, Department of Computer Science. J.J. Dujmovi´c, “A comparison of andness/orness indicators,” San Francisco State University, Department of Computer Science. J.J. Dujmovi´c, “Optimum location of an elementary school,” . J.J. Dujmovi´c, G. De Tr´e, and S. Dragi´cevi´c, “Comparison of multicriteria methods for land-use suitability assessment,” 2009. J.J. Dujmovi´c, G. De Tr´e, and N. Van De Weghe, “Lsp suitability maps,” 2009. [17] R. Bodea and E. El Sayr, “Code coverage tool evaluation,” 2008. [18] J.J. Dujmovi´c and G. De Tr´e, “Multicriteria methods and logic aggregation in suitability maps,” 2011. [19] J.J. Dujmovi´c and H. Nagashima, “Lsp methods and its use for evaluation of java ides,” 2005. [20] J.J. Dujmovi´c, “Characteristic forms of generalized conjunction/disjunction,” . [21] Didier Dubois and Henry Prade, Fuzzy Sets and Systems: Theory and Applications, Academic Press, Inc., 1980. [22] George J Klir and Bo Yuan, Fuzzy Sets and Fuzzy Logic: Theory and Applications, Prentice Hall, 1995. [23] Dan Gusfield, Algorithms on strings, trees, and sequences: computer science and computational biology, Cambridge University Press, New York, NY, USA, 1997. [24] Jozo J. Dujmovi´c and Henrik Legind Larsen, “Generalized conjunction/disjunction,” International Journal of Approximate Reasoning, vol. 46, no. 3, pp. 423–446, Dec. 2007. Contents 1 Introduction 1 1.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 The Current Situation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.5 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Literary study 2.1 2.2 Decision Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.1 About Decision Support . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.2 Characteristics of Decision Support Software . . . . . . . . . . . . 11 2.1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Multicriteria Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.1 2.3 2.4 6 Decision Support Algorithms . . . . . . . . . . . . . . . . . . . . . 14 Mathematical Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3.1 Soft Computing Techniques . . . . . . . . . . . . . . . . . . . . . . 16 2.3.2 Logic Scoring of Preferences . . . . . . . . . . . . . . . . . . . . . . 22 2.3.3 Suitability Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Group Decision Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.4.1 Aspects of GDSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.4.2 Fields of Application . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3 Research 29 3.1 Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.3 Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 x 3.3.1 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.3.2 Further Tiers of Aggregation . . . . . . . . . . . . . . . . . . . . . 36 3.4 Confidence as a Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.5 Defining and Calculating Confidence . . . . . . . . . . . . . . . . . . . . . 38 3.6 3.5.1 Confidence at Cluster Level . . . . . . . . . . . . . . . . . . . . . . 39 3.5.2 Elementary Confidence at Membership Function Level . . . . . . . 47 3.5.3 Global Confidence at System Level . . . . . . . . . . . . . . . . . . 48 Combining Confidence and Preference . . . . . . . . . . . . . . . . . . . . 49 4 Case Study 53 4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.3 4.2.1 Required Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.2.2 Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5 Conclusions 5.1 68 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 xi List of Figures 2.1 Membership function for “Slow” . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 General membership functions 2.3 Generalized conjunction and disjunction . . . . . . . . . . . . . . . . . . . 19 2.4 Generalized conjunction/disjunction gradations . . . . . . . . . . . . . . . 20 2.5 Generalized conjunction/disjunction Weighted Power Mean . . . . . . . . 21 2.6 LSP Aggregators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.7 Disjunctive Partial Absorption aggregator . . . . . . . . . . . . . . . . . . 24 2.8 Mandatory-Desired-Optional compound aggregator, D-nested . . . . . . . 25 2.9 Mandatory-Desired-Optional compound aggregator, A-nested . . . . . . . 25 . . . . . . . . . . . . . . . . . . . . . . . . 18 2.10 Mandatory-Desired-Optional compound aggregator, general . . . . . . . . 26 2.11 LSP Compound Aggregators . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.12 Suitability map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.1 Shape-String . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2 Shape-String example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.3 Relative lengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.4 Feature-String . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.5 Shape-Symbolic notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.6 The Shape-Similarity method . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.7 Most and least typical values . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.8 Interval-Valued Fuzzy Set . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.9 Upper and lowers bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.10 Lower bound without core . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.1 Case study aggregation structure . . . . . . . . . . . . . . . . . . . . . . . 59 xii 4.2 Clustering dendrogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.3 Case study cluster 18 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.4 Case study cluster 30 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.5 Case study cluster 52 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.6 Case study cluster 60 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 xiii List of Tables 4.1 Generated elementary criteria . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.2 Shape- and feature-strings . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.3 Similarity matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.4 Confidence configuration A1 . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.5 Confidence configuration A2 . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.6 Confidence configuration A3 . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.7 Performance variable confidences . . . . . . . . . . . . . . . . . . . . . . . 64 4.8 Global preference and confidence . . . . . . . . . . . . . . . . . . . . . . . 65 4.9 Global goodness B1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.10 Global goodness B2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.11 Global goodness B3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.12 Derived optimal game prototype . . . . . . . . . . . . . . . . . . . . . . . 67 xiv Chapter 1 Introduction 1.1 Context Today, the world has become more “flat” than ever before. As described in the book by Thomas Friedman 1 , the third wave of globalization is a result of the rise of the internet. This allows us to connect to people on a more global scale than ever before: we have the freedom to talk to people all around the globe. We can “see” people everywhere around the world; the world is flat. This has multiple consequences, but it mainly implies having easier access to information which we can freely share. In the past we were limited to our closest environment but the arrival of personal transportation introduced a wave of globalization allowing us to connect to more people in a larger circle. Suddenly, other countries were also possible sources of information, but might also be considered as competition. Exchanging technology, expanding economy and trade allowed for a personal enrichment leading to an increased life standard, but this life was not for everyone. Now, with the internet, we as individuals can reach other individuals anywhere, at any time. But also for companies these changes have important consequences. This new, free medium of communication provides a new, large, possible market as it allows them to reach more customers. But this is true for all businesses, so in order to stay on top of their game, they had to shift their focus to marketing techniques. In order to stay competitive with their rivals, two things received a large boost the last decennia: the research and development department, and the development of commercials. Opposed to the past, where a company was focussed on producing good 1 The World Is Flat, Thomas Friedman, 5 April 2005, Farrar, Straus and Giroux, 0-374-29288-4 1 products on a steady scale to convince and keep a steady customer base, businesses now invest more money in commercials to reach more people. However, commercials alone are not enough. Because every company commercializes their products, upgrades that appease the crowd are needed to convince possible buyers that product A is more interesting than product B. Products are becoming more personalized, and customers like this. New techniques that focus on the ease of use can cause a large boost in popularity. This explains the trend we see nowadays where companies focus more on the customer than on the product. Often the overall quality of products is showing a decline (mostly in durability), mainly because companies notice that their clients do not mind consumerism. If they make the product attractive enough, people will buy it and more importantly, buy a new one if the old one gives up. Even when the old product is still working, people will replace their product by a new version as long as the new version is attractive enough. Therefore, there is a strong trend for companies focussing on the needs and desires of the customers. One type of shift everyone experiences is the use of personalized commercials. Companies realize that bombarding everyone they can reach with commercials is not effective. The spam-effect causes people to neglect the overload of information, which has the adverse effect of what is desired: instead of promoting their products, they push the customer away. Personalized commercials are a first step in the direction to handling with this. Companies gather information on the habits of their users in multiple ways. A passive way of doing this is by simply observing the behavior of their clients and trying to deduct correlations in their actions. A typical example of this is the “shopping basket analysis”, in which an inventory is kept of recently made purchases and companies go looking for trends of items that are often bought together. A famous example of this is the story of how a company discovered a teenage woman was pregnant before her father did, simply by her purchase history 2 . New techniques for passive information gathering have been developed the last years in the domain of data mining, which focusses on handling with large amounts of information. Techniques such as clustering, data warehousing and trend analysis are becoming more important in the data management systems of businesses. 2 http://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-out-a-teen-girl-waspregnant-before-her-father-did/ 2 An active way of gathering information of customers is through various types of questionnaires. An important factor of product development now depends on creating the product the client wants, and hence it is important to know what the customer wants. To find out what this is, companies ask questions of their users on what they like. Often extra information based can be found on geographical and demographical analysis, but now in the new, flat world, when the spectrum of clients has vastly expanded, new techniques are required. 1.2 The Current Situation It is obvious there is a need for techniques on data management to stay competitive in the flat environment where everyone can reach everyone in the current world. Incorporating the customer is a necessary trait to create better products in order to gain a position in the global market. Therefore, information has to be analysed, as accurately as possible and in a clever way. Artificial intelligence based on ontologies and mathematical analysis tools based on statistics are up and coming. Nowadays, users are being included in business decisions. The gathering of information on their preferences is not the problem. The real question when facing the vast amount of data being gathered is how to interpret this information? How can you put this information and use it to derive what product someone really wants? Obviously, everyone has their own opinion, but what steps should be taken during development to satisfy more customers? What types of attributes of the product are mandatory, which are optional? How important are they compared to each other? Such problems are known as decision support problems. To that end, decision support systems have been developed which will be further discussed in the next chapter. However, in the evolving flat world, the employed techniques should also evolve, as sticking to old ones will not suffice. New technologies are needed to create accurate insight into the vast amounts of data now available to us. 3 1.3 Approach This paper introduces a set of techniques to handle large amounts of information. The resulting software is based on an existing decision support system (DSS). The main idea is that through the use of an aggregation technique, analysing the data is simplified. With aggregation comes a loss of information, and to compensate for this a measure of confidence is introduced. A discussion on the interpretation thereof follows later. The technique heavily relies on soft computing, a branch of mathematics that is rapidly gaining popularity. It revolves around a couple of principles stating that people do not always reason in exact mathematics but more so in linguistic terms and ranges of values rather than sharp (crisp) and precise values. This is illustrated by the following example: Imagine driving a car and approaching a red light. As the driver, you do not think “I am exactly 24 m and 72 cm away from the lights, so I must brake to exactly 31 km/h in order to stop at the right place”. In reality, what happens is that you realise “I am close to the lights, so I should slow down”. The exact distance and speed do not matter that much, what matters are terms such as close and slow. These terms are different depending on the context and the individual. What is close, and what is slow? Soft computing techniques allow us to model these concepts on an individual level, while handing us a set of operations allowing us to use this information to perform several calculations. It should also be clear that soft computing techniques do not always provide “the optimal” solution to a problem, such as the braking technique that wears out the brakes the least, or uses the least gas. Instead, it provides a robust framework that is able to gracefully handle a range of problems in a smooth way. Moreover, it just present a solution. An important property of soft computing techniques is that the solution, when plotted, will display smooth curves rather than the solutions produced by exact techniques, which often show discontinuities and breaking points. That, combined with the fact these techniques allow the handling of a range of problems rather than reducing any problem to optimal path, makes soft computing similar to a natural way of handling, much like what humans do. This is exactly what we are trying to accomplish, and that makes soft computing perfect for the application field in this research. A more 4 detailed mathematical explanation is given later on, with the rest of the mathematical preliminaries. 1.4 Applications It is interesting to shed some light on possible fields of application for the proposed techniques. Besides from a commercialization point of view, aggregation can be used in different business contexts. Interesting types of applications are social media based, incorporating inputs from a large number of users in business decisions. On the other hand, several techniques can be applied to create a platform of flexible querying for users, allowing them to ask questions to a system built with data aggregation, such as “where would be a good place to go to on a sunny day?”. This platform can provide a geographical map showing the suitability of possible locations matching the flexible query, called a suitability map (S-map). Alternatively, these techniques can be used in Group Decision Support Systems (GDSS) in a highly advanced context, being used by groups of experts to reach a consensus on an important business decision, even when the group of experts is not very large. In this work, the focus lies on the social media aspect and on how large amounts of data can be handled efficiently, though here and there the connection to other fields of application is made. 1.5 Structure The remainder of this document is structured as follows. Chapter 2 is a literary study elaborating on the background of the research conducted in the thesis. Chapter 3 is dedicated to the research itself where new concepts related to group decision support such as the proposed aggregation technique and the confidence measure are introduced. Chapter 4 presents a case study to illustrate how it all works. Chapter 5 is the final chapter that concludes with a summary of what is accomplished and briefly settles some opportunities for future work. 5 Chapter 2 Literary study The purpose of this literary study is twofold. One the one hand it portrays the current state of decision support systems, algorithms and techniques, backed by the documented literary works listed in the bibliography. On the other hand it gives a detailed mathematical explanation of one of the more recent decision support algorithms, which is used as a base for the conducted research. This introduces the mathematical preliminaries which are necessary for understanding the conducted research. The lack of literature found on group decision support reflects the fact that this research is modern and ongoing, which is further contributed to by the subject of the thesis. First we will discuss decision support in greater detail followed by an elaboration on multicriteria decision problems. Then, some mathematical preliminaries are given to serve as a base for the research. Finally, there is a small part on group decision support. 2.1 2.1.1 Decision Support About Decision Support Throughout the day, people make choices concerning trivial matters such as what to eat for lunch or what to do in the next hour. At work, choices that might influence the course of the business are made, which might have an economical impact. In clinical environments, doctors make decisions regarding the healthcare of their patients. Often this is backed by probabilistic models such as Bayesian Dependency Networks or Neural Networks [LWHS03]. 6 Clearly, systems to assist in making decisions, especially difficult ones, are a valuable asset. This is what Decision Support Systems try to do. Not every decision is made by an individual. On a larger scale, decisions are also made by groups of people. It is a common practice for business decisions to consult a board of professionals or managers. They come together to make decisions concerning their company. Doing this improves the quality of the decision that is being made, as they combine their different expertises and share their knowledge with each other. This way, they can accomplish more as a group than they would be able to alone. This concept of sharing knowledge has already been in place for a long time but it is the way the information is exchanged between each other which has changed through time. Originally, meetings were held and all experts had to be physically present to attend. Due to the uprise of tele- and videoconferencing there was no longer a direct need for executives to be together at the same location [Hub84] as it became possible to communicate over the internet. This first step has multiple beneficiary effects. First, the experts no longer need to come together to a certain location, which takes time and effort, but they can simply conference anywhere they are as long as they are online. Second, more experts can now be consulted due to the lack of any geographical restriction. Today, it is not unusual to discuss problems with different individuals all over the world, using the internet as a medium [SWC+ 02]. An example of this are the many websites dedicated to answering questions in a community-driven sense, often referred to as “stacks” 12 . But decision making on a large scale is not restricted to only business decisions and experts. A large crowd of people can also be asked to make decisions. Think of voting for a political leader or a referendum. Alternatively, this can evolve further allowing businesses to include large crowds in their decisions due to the rise of the internet to the point where social media is involved. Facebook originally had a policy to allow users to vote on “important site governance changes” but has recently decided to take away this right from its users 3 , ironically through that same voting system (which passed in their favour because not enough votes were cast, which was the reason Facebook chose to call the vote). However, the idea has been introduced and a door has been opened to 1 http://stackoverflow.com http://serverfault.com 3 http://venturebeat.com/2012/11/21/no-more-voting/ 2 7 companies that want to get a public opinion by using social media as a way of getting information. Alternatively, the public can be used to create a common place of information, much like community-based projects. Social media can be consulted to produce a map of events or to rate restaurants or to derive which locations are popular on a warm, summery day. This can be done passively by observing the behaviour of users or actively through asking people what they like and then aggregating these results into a publicly accessible map. Decision support has also paved the road to more advanced technology. Systems like data warehouses (DW), Online Analytical Processing (OLAP) and techniques like data mining are all examples of software aiding when making complex decisions [SWC+ 02]. These programs often grant the user an elevated look on the data at hand, allowing for smarter choices to be made. It should be clear that the development of DSS has many interesting applications in the modern world of today and tomorrow. Due to its versatile nature, DSS can be employed for different purposes ranging end-user personal assistance to making business decisions to massive-scale collaboration. Furthermore, there is a growing market and with it a need for group decision algorithms, which will allow software to gather and aggregate large amounts of data and present them back to groups of users, be they experts or just a large group of social media users, in an orderly fashion, and this with a minimal loss of information. Definition Many different definitions have been given to try to define what decision support is exactly. The truth is that decision support is exactly what the name implies, it supports making decisions. This is a broad concept and several people have given their point of view on decision support (DS). One of the earliest definitions is given by Gerrity (1971) [DG85][SWC+ 02], who said decision support is: “An effective blend of human intelligence, information technology and software which interact closely to solve complex problems”. 8 This clearly reflects the fact Gerrity feels decision support is useful to solve problems that are too complex to handle by hand. Problems that involve complex mathematical calculations or have a large range of parameters. Problems that have too many candidate solutions to evaluate them each manually make an set of example problems that are perfect for computers to solve. It also illustrates that decision support is not autonomous: it simply provides a human-computer interface for problem solving. The user is a required part of the decision support and is often referred to as the decision maker (DM). A different definition is given by Keen (1987): “The application of available and suitable computer-based technology to help improve the effectiveness of managerial decision making in semi-structured tasks”. Clearly, Keen focusses on decision support from a business point of view as he illustrates managerial decisions are often a dread and not always very effective, some things which he hoped could be resolved by decision support systems. Otherwise, DSS are also referred to as: “Interactive decision aids for managers”. For Keen, decision support systems are simply software implementations of decision aids for managers. They are varying based on the field of application they are used in, but the bare essentials are the same for all. Gorry and Morton defined the concept of a decision support system as follows: “A decision support system is a computer system that deals with a problem where at least some stage is semi-structured or unstructured.” The terms structured, semi-structured and unstructured were defined to indicate degrees in which a decision problem is easily solvable. A new and unknown problem is seen as difficult, whereas an older, well-known problem might already have a known solution algorithm to solve it. Regardless, DSS should be able to solve a broad range of problems but are particularly interesting to solve those problems of which solutions methods are not well-known. Decision support systems are thus exactly what the name says, systems that support the 9 process of decision making. They require input from the decision maker to model both the problem and the evaluation logic. Decision support is especially useful for problems where there are multiple possible solution systems and we want to find the best ones from amongst them to solve the given problem. DSS are on the other hand a human-computer interface exposing the computational strength that is available to aid in solving a whole range of problems, constructed to increase the efficiency of decision making and the effectiveness of the result. It is important to keep in mind human evaluation lies at the heart of decision support, from defining what the problem is and what is preferred from the solution, to making the actual selection between the viable systems. A final desired property for DSS is the ability to generate ad hoc results quickly to new problems as they arise, which Keen discusses when talking about levels of support [Kee87]. He emphasizes DSS should be able to answer “what if” questions. History Decision support systems have evolved a lot through the years, growing from analytical tools and information systems to computer-based software. They were originally introduced in the 1970’s as a way to allow decision makers to simplify their task of making hard decisions by providing them with a powerful tool to do the calculations for them [Kee87]. Through time, decision support has evolved into several specific branches supporting specific decision making tasks, but generally the idea is the same, to improve the efficiency with which a user can make a decision and to improve the effectiveness of that decision [SWC+ 02]. Today, decision support has a growing potential due to the increase in resources made available by the internet and cloud computing, allowing observers to ask questions to a large audience through social media and to aggregate the received replies. This has led to the birth of Group Decision Support Systems [DG85],[DG87]. It is much believed that the future of DSS lies in mobile computing, where mobile cells such as smartphones, tablets or PDAs serve as clients, requesting on-demand decision support from a server [SWC+ 02]. Possibly, the passive clients can be agents collecting data or serve as small worker nodes doing a part of the calculations needed to answer a 10 question asked by another client. 2.1.2 Characteristics of Decision Support Software In the business world large and important questions ([DDTD09]) such as “where to build the next expansion store to enlarge our influence” or “what location would be best suited for a new industrial infrastructure” often need to be answered. Because these are important decision of large gravity, experts are consulted for their opinion on the matter. However, they do not always find a solution with ease. They might disagree on vital parts or the problem might just be too complex to handle. These questions depend on many factors ([DDTD09],[DDTVDW09]) and are too large to handle alone. The process of coming to a conclusion can take a long period of time which is not only by definition timely, but also costly, and even then there is no guarantee that every bit of information that was presented was used properly and that nothing was forgotten. This is obviously a perfect field of application for group decision support systems which have certain interesting properties and advantages. For one, GDSS can handle large groups of information quickly and they will never overlook information or make calculational mistakes. These programs allow experts to model both the problem and their way of reasoning about the best solution. They perform all the tedious calculations, saving lots of time and making sure every piece of information is used and nothing is overlooked. At the heart of a DSS lies a decision support algorithm. There are many different decision support algorithms ([DDT11]). The main factor that is used to separate them from one another is the degree in which the user is capable of modelling the way he or she evaluates a logical problem ([DDTVDW09]). If we look at how decision support systems have evolved throughout time, it becomes clear that there is a trend towards systems that involve soft computing techniques [Duja],[SWC+ 02]. These allow for more flexibility in the way of inputting a human way of thinking, which is often not as strict as the classical logic part of math. Sometimes we do not want the exact and strict conjugation between two attributes, but we want to express a desire for both to be satisfied, with a slight preference for one above the other. Hence we see that simple systems are no longer sufficient to compete in the actual world; complex systems based on advanced mathematical theory are needed. Moving from DSS to GDSS, the underlying decision support algorithm becomes more 11 complex as well. Different experts have different opinions, and it is no longer trivial to say which attributes are more important that others, because not every experts has the same priorities. If you were to ask each expert individually which factors he or she thinks are more important than others, they would generally answer different. For example, a geologist would have different requirements for a suitable location for building a new school than a biologist. The geologist would deem it necessary that the ground material is solid and rigid and would put this requirement high on his priority list whereas the biologist might be more concerned with the environment and would hence allocate a higher necessity to the proximity of trees and parks ([Dujc]). Sometimes, experts will not be able to reach an agreement. Then how should the decision making process proceed? Whose opinion should be valued more and whose suggestions should be followed? Is this even a correct way of handling, keeping in mind that picking preferred experts results in ignoring the input of some of the other experts, which would result in a loss of information, which is bad in any scenario. 2.1.3 Summary It is clear that good decision support software should be able to overcome many problems. More precisely, it should be so that the program can handle multiple users with diverging opinions and aggregate them to a single output with as little loss of information as possible. Furthermore it should be flexible in its modelling capabilities. It should be able to handle difficult problems and handle large groups of data quickly. The development of aggregation techniques is a field of study that is under active research and is also the subject of this thesis. As will be discussed later, there are many different ways of combining the input to a single output. 2.2 Multicriteria Problems For any decision problem, we say there are different systems (i.e., candidates or candidate solutions) which are all possible solutions. The purpose of DSS is then to evaluate and rank the systems. To define which of the evaluated systems are the good ones, we define a measure to indicate how good a system is at solving the problem. In other words, we are looking for the system that best satisfies the requirements of the problem [DN05]. 12 A multicriteria problem is a problem that simply has multiple requirements. These requirements can be decomposed into a hierarchical tree of measurable performance variables, sometimes also called attributes. For each of the performance variables, we define an elementary criterion, which is an evaluator function mapping a system’s measured value for that variable on a value indicating how well that system satisfies that criterion. This value is called an elementary preference score or elementary preference degree [DDT11],[Dujc]. In order to compare multiple systems, we aim to find a single measure that combines all the information of the elementary preference scores. We call this score the global preference score, or just global preference for short. This measure indicates how well the system meets all requirements at the same time. A high value indicates most, if not all, requirements are met. A low value indicates the opposite and usually carries the meaning that that system is not a viable solution. Preference scores are often normalized ([DF04]) so a value of 0 indicates completely unsatisfactory whereas 1 indicates completely satisfactory, and any gradation should be interpreted as the degree of which a requirement is satisfied, where a value closer to 1 means a higher degree of satisfaction ([DDTVDW09]). While evaluating all systems, a mapping between each system and its calculated global preference score is maintained. This makes it easy to evaluate the results at the end. The system with the highest score is in theory the “best” one, though that does not imply it is also the best in practice. It is rare to find a solution that fully satisfies all requirements. The proposed best solution, the one with the highest global preference, should be validated by the decision makers for its viability. It is still his or her responsibility to select the right system from the ones the algorithm rated as most suitable. The DSS simply aids in the process by evaluating the systems through making computations. For finding the global preference score of a system, its elementary preference degrees should be aggregated into one single value. It is usually the way this aggregation is done that separates different MultiCriteria Decision Methods (MCDM) from one another. 13 2.2.1 Decision Support Algorithms The currently most used decision support algorithms are briefly explained now. Besides the fact all decision support algorithms have as a main goal to combine elementary preference degrees to a single global preference, they should have at least the following ten fundamental properties, as discussed in [DDT11]: 1. Ability to combine any number of attributes (performance variables) 2. Ability to combine objective and subjective inputs 3. Ability to combine absolute and relative criteria 4. Flexible adjustment of relative importance of attributes 5. Modeling of simultaneity requirements (both soft and hard simultaneity) 6. Modeling of replaceability requirements (both soft and hard replaceability) 7. Modeling of balanced simultaneity/replaceability 8. Modeling of mandatory, desired and optional requirements 9. Modeling of sufficient, desired and optional requirements 10. Ability to express suitability as an aggregate of usefulness and inexpensiveness The combination of these properties allows for the modelling of a natural way of thinking and allows for a flexible method for combining performance variables in many conceivable ways. A couple of approaches are briefly discussed: Simple Additive Scoring (SAS), MultiAttribute Value Technique (MAVT) and MultiAttribute Utility Technique (MAUT), Analytic Hierarchy Process (AHP), Ordered Weighted Averaging (OWA) and Logic Scoring of Preferences (LSP) [DDTVDW09], [DDT11], [DN05]. SAS is the simplest of techniques. It is also the oldest and relies simply on the concept of assigning weights to each performance variable to denote their relative importance. The global preference of a system is then calculated as the weighted sum of the elementary preference degrees of the attributes. This model is simple and fast but does not allow for much flexibility and implies a disjunction between all attributes. MAVT and MAUT are much like SAS but replace the way the elementary preference degrees are calculated by specific functions with the intent to capture human judgements. 14 Furthermore, the weighting factors are chosen so that they sum to a total of 1. As SAS, MAVT and MAUT are not very flexible. AHP makes an assumption based on psychological research where it is posited that the highest accuracy is achieved when the amount of attributes that has to be compared is small. It is based on a hierarchical structure where attributes are compared pairwise. The root of the resulting tree represents the global preference score. In OWA, the performance variables are first ranked on their relevance by the decision maker. Then, the relative importance between them is modelled by a vector of weights that sum up to 1. The elementary preference scores of each performance variable are used in an OWA aggregator function, which is a parametrized instance of a class of mean-type aggregation operators. Through choosing the weights properly, adjustable levels of simultaneity and replaceability can be achieved. LSP is one of the more recent methods which is based on a step-by-step process where the elementary preference scores are combined with desired simultaneity or replaceability and aggregated into a new value, which can in turn be aggregated again recursively with other degrees. This can be seen as a non-binary tree with at the root the global preference of the system. In this document, LSP is expanded to handle group data, effectively transforming it into a GDSS. To better understand LSP, a few mathematical concepts are needed, which are explained in the next section. 2.3 Mathematical Preliminaries LSP heavily relies on soft computing techniques, both for the elementary criteria and the aggregation steps that combine the elementary preference scores to a global preference. Therefore we will now briefly explain the necessary soft computing concepts. 15 2.3.1 Soft Computing Techniques Membership Functions Membership functions are a basic soft computing technique which are used in the context of thinking in terms instead of values. Mathematically seen, they are the characteristic representation of a fuzzy set [DT]. In this work, they are mainly used to represent elementary criteria. They can best be understood by understanding what a fuzzy set is. Fuzzy Sets A fuzzy set [DP80], [KY95] is similar to the concept of a set in a mathematical sense. It is often used to describe a linguistic term. In classical mathematics, each individual in a universe is either completely a member or completely not a member of a set. This means any individual either “is” the linguistic term, or “is not” the linguistic term. For example, in the universe of people, each individual either is or is not male. We call this a strict or crisp set. With fuzzy sets, each individual is a member of each set with a certain degree of membership. This degree of membership can be interpreted in different ways, depending on the application context [DT]. Whatever the interpretation, each has the same characteristics for the extreme values, namely that a membership degree of 1 means total participation (equivalent to the classical membership of an individual) and a 0 means the opposite (equivalent to the non-membership of an individual). A fuzzy set can be represented graphically using a membership function. It plots the degree of participation for each element in a universe. The membership function is then characterized by the fuzzy set it represents, though different sets may have the same membership function. As an illustrative example, lets look at a possible membership function for the linguistic term “slow”. Keep in mind this represents the opinion of the person defining the membership function. This is my individual function, others might produce a similar yet slightly different one or even a completely different one. In the context of driving with a car, we aim to model the concept of average 16 “slow” speed. We say that anything below 30 km/h is slow and that everything above 50 km/h is absolutely not slow. In this case, all real values from 0 to 30 receives a membership degree of 1 (indicating fully in accordance to the term “slow”), and everything above 50 gets a 0 (indicating “not slow”). Note that this does not imply that any speed above 50 km/h is necessarily “fast”; this would be modelled by a different membership function. The part between 30 and 50 km/h can be filled in as we please, representing our interpretation of the term slow. A technique that is often used is through a linear connection. The reason for this is two-fold: it is simple to work with mathematically and it often properly represents human logic. The higher the speed, the less “slow” it is. This gradient is generally perceived as linear, meaning that increasing the speed twice as much results in an interpretation of twice as much “less slow”. The resulting membership function is shown in Figure 2.1. Figure 2.1: Membership function for the linguistic term “Slow”. Typically, the graphical representation of a membership function is trapezoid-shaped. This property stems from the fact neighbouring elements in a universe are often close to each other and hence exhibit similar behaviour. Moreover, these are the easiest functions to work with based on the linearity of each part yet are also sufficient in their modelling 17 capabilities. The membership function should be continuous. Note that even if the ordered elements do not produce a continuous membership function, a bijection can be found that reorders the data in such a manner that a continuous membership function is found. In the rest of this work we will thus assume all membership functions are continuous without violating the generality of the technique. Finally, fuzzy sets can also be used to represent strict sets. Their corresponding membership function-representation can then be split into rectangles. Each element is either part of the set, or is not, indicated by a membership degree of respectively 1 or 0. This gives a typical rectangular shape. It is thus clear that fuzzy sets extend strict sets with an extra dimension, the gradient of membership. Interpreting Membership Functions There are different types of interpretations that can be given to membership functions. The interpretation in the context of decision making is that of elementary criteria, in which a performance variable is chosen of which the values are presented on the x-axis. The function is then used to model regions of preferred values and gradients between them. The y-axis then reflects the degree in which a value is satisfactory for the chosen criterion [DDTVDW09]. Applied to DSS, this value represents the elementary preference score. Figure 2.2: Membership functions where the Y-axis has been converted to a percentage of satisfaction, illustrating how a membership function can be used to imply a range of desired values for a variable. We call the support of a membership function the range on the X-axis where the function does not equal zero. The core of a membership function is the range on the X-axis where the function equals one. 18 Generalized Conjunction and Disjunction Another technique which is used in LSP is the generalized conjunction and disjunction (GCD) [DL07]. This is a generalization of the common concepts of conjunction and disjunction in the sense that it allows a gradient between the extremes. Soft computing here extends the classical concepts of disjunction and conjunction by treating them as points. Adding a connecting line between them creates a new dimension. The concepts “andness” and “orness” are introduced as degrees of simultaneity and replaceability. The mathematical AND, when weighted, then becomes the generalized conjunction. A new parameter α is introduced. It represents the degree of “andness” between a set of input values. In an extreme case, the full conjunction, or the classical logical AND, is the generalized conjunction with α = 1. The other extreme, with value 0, expresses a full disjunction and equivalent to the classical logical OR. Any gradation with α between 0.5 and 1 indicates a partial conjunction, also known as the “andor”-function, and still implies conjunction over disjunction. There is a higher degree of simultaneity between the inputs than replaceability. Completely dual to this concept of “andness” is “orness”, represented by the ω sign. This is shown on Figures 2.3 and 2.4. Figure 2.3: Graphical representation of the generalized conjunction/disjunction. 19 Figure 2.4: Symbolic representation of the GCD and some of its gradations as they are often used in LSP. HPD, SPD, HPC and SPC stand for hard and soft partial disjunction and conjunction, respectively. There are different implementations for the GCD, but a commonly used implementation is based on the Weighted Power Mean (WPM 4 5 ) [Duja]. This is a series which calculates a mean given two input vectors, E and W, and a parameter r. The vectors respectively represent the input values and their weights. The parameter r changes the behaviour of the mean. The formula to calculate the weighted power mean is the following: k X E1 · · · Ek = ( Wi Eir )1/r , −∞ ≤ r ≤ +∞ i=1 In this implementation of the GCD, the degree of “andness” (or equivalently “orness”) is configurable through the choice of the r-parameter. For minimal r, that is for the limit where r approaches negative infinity, the mean equals the minimum of the input values, independent of the separate weights assuming they are non-zero. This represents a full conjunction between the inputs, which is better known as the logical AND between predicates. For maximal r, that is for the limit where r approaches positive infinity, the behaviour is reversed and represents a full disjunction, which equals the logical OR. In the specific case where r exactly equals 1, the weighted power mean is reduced to a simple arithmetic mean better known as the weighted mean of its inputs. For any r-value larger than 1, the WPM results in a partial disjunction, growing in expressiveness as r increases. Reversely, for r smaller than 1, the WPM produces a partial conjunction. 4 5 http://en.wikipedia.org/wiki/Generalized mean http://planetmath.org/weightedpowermean 20 For negative r, the conjunction gains an additional interpretation. In these cases, the input values to the WPM have as characteristic that they are considered mandatory. This means that if the value of any of the inputs is 0 (or completely unsatisfied) that the result of the aggregation will also be 0. In the context of LSP, this would is used to indicate a bunch of requirements must be fulfilled. Systems that violate one of these have a resulting global preference score of 0 because one of the mandatory attributes is not respected, leading to the conclusion that the evaluated system is unusable. In order to simplify calculations and to specify a sort of standard in order to do comparisons, there is a set of suggested values for r to express specific gradations in andness and orness [DL07]. These are shown in Figure 2.5: Figure 2.5: An example of GCD using WPM to map the connection between the WPM rparameter and the respective GCD andness and orness levels between inputs. 21 2.3.2 Logic Scoring of Preferences Because LSP uses the soft computing techniques discussed above, it allows accurate and flexible modelling of human evaluation logic. It uses membership functions as elementary criteria and the generalized conjunction and disjunction for the aggregation structure. The steps of LSP are as follows: • Compose a hierarchical requirement tree indicating the relevance of the attributes • Define elementary criteria with membership functions [Zim78],[DT] • Create the aggregation tree composed of LSP aggregators • Evaluate each system by calculating its global preference Requirement Trees and Elementary Criteria For multicriteria decision problems, the requirements can often be decomposed hierarchically in a requirement tree. For a software product prototype for example, these are closely related to the software quality attributes. Similarly, they can be decomposed into components that can be individually measured and evaluated. Such a component is a performance variable. Each is represented by a linguistic term. Therefore, their evaluator functions, the elementary criteria, are perfectly suited for the application of membership functions. To illustrate this, we give a short example of an elementary criterion. For example, lets say we want to build a new school and are looking for the best possible location to do this. One of the considered performance values is the level of sound that is allowed in the neighbourhood. We could rate that anything over 80dB is completely unacceptable and that anything under 50dB is considered best-case. We can proceed to model our decreasing tolerance for increasing loudness using a linear downward slope. The resulting elementary criterion is a membership function of the family displayed in the middle graph in Figure 2.2, where C would be 50dB and D would be 80dB. This function is then used to model our preference for the range of acceptable values for this performance variable and can be reused to evaluate each system to calculate their elementary preference score for the performance variable “loudness”. Different systems will typically have different values on the X-axis. 22 This process is applied for each performance variable, so there will be exactly as many elementary criteria as there are performance variables. Only after defining all of the elementary criteria it makes sense to evaluate a system. Doing so generates a vector of elementary preference scores. The next step is to combine them into one measure, the global preference score, which reflects the global ability of the evaluated system to satisfy the requirements of the multicriteria problem. In LSP, this is done using LSP aggregators. LSP Aggregators In LSP, the aggregation of elementary preference scores is heavily based on the GCD. The r-parameter in each step is based on the hierarchical performance variable tree. During its creation in the first phase of LSP, the performance variables can be annotated with mandatory and optional indicators, simplifying the aggregation later on. The remaining work to be done is to choose weights specifying the relative importance of the attributes. An example illustrating this is displayed in Figure 2.6. Figure 2.6: An example of GCD aggregators in an LSP aggregation structure. Note that the conjunction is used to express the desire for simultaneous satisfaction of multiple requirements in an LSP problem. Dually, the disjunction is used to indicate it suffices when any of the inputs is satisfied. 23 Compound Aggregators Using the concepts of LSP aggregators we can already model a large part of human evaluation logic. However, after dealing with many problems, it occurred that similar structures kept reappearing. It is possible to define some sort of design patterns for specific constructs of reasoning. These are usually realized through the composition of LSP aggregators in a specific way and are called canonical aggregation structures (CAS, [DDT11]). An example relation that is often used is the Conjunctive/Disjunctive Partial Absorption (CPA/DPA, [DN05]). This aggregation respectively combines a mandatory/sufficient input x and a desired output y. In the mandatory case, this means that the resulting preference score will be 0 if the elementary preference of x is not satisfied and x+xR or x−xP in case it is, where x is the elementary preference of the mandatory input and R and P are reward and penalty terms, which are based on the elementary preference score of the desired input, y. Similarly, for a sufficient input x and a desired output y, the output preference is (close to) 1 if the input is completely satisfied, x + xR in case it is not but the desired input is satisfied, x − xP in case it is not and the desired input is not satisfied, and smaller than y when x is completely unsatisfied. The values of R and P are usually chosen in the range of [0.05 − 0.15] and −[0.10 − 0.30] respectively, based on the importance of the desired input. Figure 2.7: An illustration of the Disjunctive Partial Absorption aggregator, combining a Sufficient input and a Desired input with P = -0.1 and R = 0.25. We can even go further and define compound aggregators using regular GCD and CPA/DPA aggregators as building blocks to define Mandatory/Desired/Optional (MDO) and Sufficient/Desired/Optional (SDO) aggregators ([DN05]). There are several implementations of the these aggregators and they slightly differ in their properties. In these operators, the optional input is much like the desired input but has a lower compensational power. 24 Figure 2.8: A realization of the MDO operator, D-nested. Figure 2.9: A realization of the MDO operator, M-nested. 2.3.3 Suitability Maps A particularly interesting example of displaying the results of the previously mentioned system-preference mapping is through suitability maps (S-maps). These are applicable for problems of which the candidate solutions can be displayed on a graphical map. A typical example is when solving geographical problems, where the possible solutions are locations. First, a geographical map is meshed into a grid of locations, and for each location the performance variables are measured, such as altitude, slope, ground material, etc. ([Dujc]). After calculating the global preferences, they can be visually represented in a 3D-plot where the global preference, referred to as the suitability, is displayed as a bar on the Z-axis and the locations are plotted in the x,y-plane based on their grid coordinates. Figure 2.12 shows an example. 2.4 Group Decision Support The discussed decision support systems thus far are all in the assumption there is one central person running the operation. This person is supposed to enter both the aggregation structure and his or her personal preferences for the attributes in the form of elementary criteria, after first defining the performance variables of the problem. This person is often also the decision maker because it is also his or her task to eventually make the decision based on the information that is calculated and presented by the DSS. GDSS extend regular DSS in a way that it tries to provide a solution for the presence 25 Figure 2.10: A realization of the MDO operator using a weighted neutrality (A). Figure 2.11: An example of an LSP aggregation structure with compound aggregators, outlined in gray. Figure 2.12: An example S-map, where suitability is synonymous for global preference score. of multiple DMs, which is a much more realistic scenario, especially in business environments. But also in a different context, the one of social media, more people than just a board of experts can be involved in the decision making process. Where classic DSS support one DM, and GDSS support a group of experts (say anywhere between 2 and 15 DMs), we strive to find techniques to handle many more DMs, in the order of hundreds or thousands. To do this, certain factors start to play a role such as complexity of the calculations and the representations of the results, but most importantly ways to handle the large amounts of data, which we propose to do through aggregation. Therefore we 26 need to split decision support algorithms into several aspects. This allows us to handle them separately, resulting in a decomposition of DSS in several aspects. In the current literature on group decision support systems, the focus lies on the sharing of information among the members of a relatively small group of experts. This research is an innovation in the sense that we want to broaden the spectrum to open up the possibility of using GDSS in the context of social media, effectively consulting large groups of experts. 2.4.1 Aspects of GDSS Analysing the nature of decision support systems, we can find several parts in which we can split the process. This mainly happens by looking at who does what, and by separating these tasks into roles. First, it is clear that we can separate the decision makers from the experts. This means that opinions from outside the board of executives making the decision are also heard. Second, the experts can be separated from the evaluators, the latter being the people who decide which performance variable of a problem is more important than the other. Finally, the analysts can also be seen separately. They are the people who model the problem by deciding which performance values there are and how they are hierarchically structured. Generally, a person can have multiple roles, though the roles are independent of each other. This decomposition allows us to split GDSS into several independent roles, such as problem modelling, expert opinion (data) gathering, aggregation techniques and decision making based on the resulting system-preference mapping. 2.4.2 Fields of Application Possible fields of application for decision support are versatile and each have their own requirements, so a flexible framework is necessary, with careful consideration of the user about when a certain technique is considered good or bad and for which contexts this is true. The most traditional field of application for GDSS is that of business decisions, where a group of experts is consulted in the decision making process. Their opinions need 27 to be both respected and combined at the same time, as accurately as possible. The system should facilitate the sharing of information and compute the compatibility of their possibly diverging opinions. The GDSS serves as an aiding tool to help keep a clear overview on all the information that is at hand and at the same time provides a mathematically grounded summary of all the input. Another possible field of application is in the domain of medicine. Clinical decisions can in some way be compared to business decisions in the sense that a group of experts is consulted for their opinions on what would be the best course of action given a problem. These problems can be trying to determine the illness of a patient or the proper treatment for a tricky condition. This type of GDSS would be more probabilistically based, backed by other types of probabilistic tools such as Neural Networks, Bayesian Networks and general genetics science. These types of problems are considered critical, as the health of a person is at stake, and the correct handling of information is a matter of great importance. Therefore, the correctness of the computations, the interpretation of the results and the use of the framework should be clear and precise. Another field of application is one that has yet received little attention in current literature, as it is created by a recently up and coming trend. The use of including social media in the decision making process is becoming more and more valued due to a change of the face of the earth. Businesses can now reach much more people in a simple way through the internet to obtain valuable information on their clients, and use this information to produce a better product to gain a market advantage. A better product can be either a product that better satisfies the demands of the customer or a product of higher quality, based on the desired properties of a product. Because a client base can be really large, techniques to handle such vast quantities of information are needed and this is what GDSS can provide. An interesting recent and well-known example of a business including social media in decision making is the one from Hasbro, the company behind the popular board game Monopoly. Hasbro held a public poll to gain information about the user preference for the player tokens, including the well known tokens and some new ones. This led to the replacement of the iron by the cat token 6 . 6 http://www.bbc.co.uk/news/entertainment-arts-21356033 28 Chapter 3 Research It is clear that there is a large range of applications that would benefit from extended decision support. With the discussion on decision support systems however, it is also apparent that it is not obvious to find a one-size-fits-all solution. The performed research focusses on the terms aggregation and confidence. Aggregation is used to minimize the trouble of handling large amounts of data input, and confidence is used as a measure to minimize the loss of information. First, the different aspects of decision support systems are analysed more in-depth than before, followed by a clear declaration of the scope of the research. Then, an aggregation technique is introduced and explained, followed by the definition of confidence, and the calculation thereof. Finally, there is a section on how to interpret confidence by combining it with global preferences. 3.1 Aspects Before going into the depth of the research it is interesting to analyse the different aspects of decision support systems. Step by step, we first have the problem analysis, in which a problem is decomposed into its performance variables. At this point, we can specify which are most important and which are mandatory. Then comes the definition of elementary criteria defining the preferred values for these attributes, gained from experts. These are then used to evaluate several candidate systems. The hereby produced elementary preference scores are aggregated by an aggregation structure. This struc29 ture reflects the previously established relative importance between the attributes. The resulting output global preference scores are gathered in a system-preference mapping. The decision maker(s) can now make a decision. In conservative decision support, all these roles are usually performed by a single person. In group decision support, this is extended in such a way that a group of experts “acts” as one person. The aim is to produce a system with a logical separation of the roles in the decision making process. For example, it is desired that the experts should be able to model their own opinion on what are good values for the performance variables individually and independently. Another point of difference is the aggregation structure. Some might find a certain attribute more important, even going from optional or desired to mandatory. Some might differently distribute weights indicating a different relative importance between the attributes of the problem. It is apparent that there are different dimensions to decision support, and that a separation of the independent steps into the previously mentioned roles is possible. This allows us to consider that separate people are involved for the separate steps, or even scenarios where certain inputs are externally provided. 3.2 Scope To further explain the developed techniques, an example case study is described in detail in chapter 4. We hereby limit ourselves to an investigation into the field of social media applications. Therein the “experts” consist of all the consulted people, which can be a very large group, going from hundreds to tens of thousands. Some important established organisations already consult their user base for making decisions, such as Google 1 and, until recently, Facebook 2 . The decision makers are then a group consisting of the members of a board of executives of the considered business. 1 2 http://www.google.com/about/company/philosophy/ https://blog.facebook.com/blog.php?post=70896562130 30 The core of the research is about the aggregation of the large amounts of “expert” inputs. Their elementary criteria indicating their preferences are represented by generated membership functions. From here on, we consider the aggregation structure to be externally provided. The actual origin thereof is not important, as it can be independently developed by the decision makers. Otherwise, it can be the result of an aggregation technique on combining non-binary weighted trees, but this is outside the scope of the work. The next section focusses on the aggregation of membership functions, and thus the clustering of elementary criteria. 3.3 Aggregation The aggregation technique applied in this research is based on a recently developed technique [TRBDT12], [TRBDT13]. It happens in the preference expression step, directly after gathering all elementary criteria. Because this happens before the elementary preference scores are calculated and long before the aggregation structure is used. This causes the aggregation structure to be oblivious of the aggregation, allowing it to remain largely unchanged from the aggregation structure in traditional LSP. The purpose of the extra early aggregation step is to combine the experts into a single, merged expert. This means that all experts will appear to the aggregation structure as a single person. Because this all happens in an early stage, it can also be referred to as pre-aggregation. The concept of aggregation is chosen to reduce the large amounts of information to a handleable portion, both logically and calculationally. This brings with it an inevitable loss of information, which is mitigated by the reintroduction thereof as the parameter called confidence. Aggregation is a multiple-tier process based on the similarity of membership functions. The purpose is to group experts with similar opinions in clusters followed by selecting a representative cluster with certain confidence. This representative is then used for further calculations. 31 3.3.1 Clustering In order to cluster the experts into groups we need a way to compare membership functions. To do this we first translate them to an alternative representation based on their shape and the length of their components. This representation is called the shapesymbolic notation and consists of two parts, a shape-string and a feature-string. Because this is purely mathematical and not directly related to the concept of decision support, we here use the term membership functions instead of elementary criteria. Definition 3.3.1 The shape-symbolic notation of a membership function is an alternative representation consisting of two parts: a shape-string describing its shape and a feature-string describing the relative lengths of its components. Shape-string First, the membership functions are described based on their shape. The according representation is called a shape-string where each segment of the membership function will use a symbol among a sign {+, -} to represent an upward or downward slope, a value [0, 1] to represent the level of preference (on segments without a slope) and a letter {L, I, H} to denote a point where L means 0, H means 1 and I means a value in ]0,1[. This serves as an alphabet for a context-free grammar G(N;T;S;P) which we can use to generate shape-strings, where N = {<slope>, <preference level>, <point>, <segment>, <shape-string>} is the set of non-terminal symbols; T = {+, -, 0, 1, L, I, H} is the set of terminal symbols; S = {<shape-string>} is the starting symbol; and P is the following set of production rules: <slope>::= + | <preference level>::= 0 | 1 <point>::= L | I | H <segment>::= <slope>|<preference level>|<point> <shape-string>::= <segment> | <segment><shape-string> 32 Figure 3.1: Illustration of shape-string alphabet based on trapezoid shaped membership function. Figure 3.2: Examples of shape-strings for membership functions. Feature-string Second, each membership function is related to a feature-string. This linguistic description tries to capture the length of the segments on the X-axis of the membership functions. This allows us to make a distinction between two functions with similar shape-strings. Here, soft computing is used for its flexibility. We define a set of relative lengths: R = {ES = extremely short, VS = very short, S = short, M = medium, L = long, VL = very long, EL = extremely long} Research has shown that around seven distinct linguistic terms is the optimum when weighing off precision versus correctness: it becomes harder to differentiate between more than seven different types of length, whereas less than seven terms do not allow enough descriptive power to be useful. For each membership function, the length of its characteristics are translated to their correct linguistic representation. To this end, first all membership functions are scaled on the X-axis to have the same length. This is harder than it sounds as the membership functions represent expert opinions and it is difficult to scale the functions without harming their representativity. Hence a maximum value has to be found at which all 33 functions are clipped, and this value needs to be at least strictly larger than the maximum of all d-values of all elementary criteria. When this known, fixed length has been determined, each part can be assigned a linguistic term describing its relative length. For this, we calculate the fraction of the length of the part to the known length of the X-axis. A possible mapping of these fractions to the terms is depicted in Figure 3.3 using membership functions for each relative length: Figure 3.3: Possible representation of relative lengths using membership functions. To determine the corresponding term, we look up the fraction on the X-axis and look at the maximum membership function vertically above that point. Based on this principle we can determine the feature-string for each membership function. The Shape-symbolic Notation Combining both the shape-string and the feature-string we could annotate each membership function by the shape-symbolic notation. This exists of a set of symbolic characters, one for each part of the membership function and each consisting of a shape character and a length indicator. Based on this representation, membership functions can be compared to each other. Therefore, we use a similarity measure which respects the properties of reflexivity and symmetry, and is based on the similarity between two shape-symbolic notations. Is is calculated using a modified version of the Levenshtein-distance [Gus97] based on a cost function taking into account inserting, replacing and deleting shape-symbolic characters. 34 Figure 3.4: Example of feature-string for a membership function. Figure 3.5: Example of the shape-symbolic notation of a membership function, showing symbolic characters b1 and b2. For inserting and deleting, the cost depends on the length of the respectively inserted and deleted part. For replacing, the cost depends on the change of length. An extra penalty is added in case the new shape-symbolic character has a different shape component. For each pair of membership functions, the distance is calculated as a measure of similarity and stored in a matrix. This matrix is symmetric and always one on the main diagonal, respecting the reflexivity and symmetry requirements. After all similarities have been found, the membership functions are clustered hierarchically in a bottom-to-top manner based on a highest similarity first policy. This policy has the interesting property 35 that it produces unique results. The entire process is captured in Figure 3.6. Figure 3.6: Detailed breakdown of the shape-similarity measure used to aggregate expert opinions. Note that this grouping process is in fact independent of the context and can be applied purely mathematically to group any set of membership functions based on their similarities. This technique can thus also be used in other scenarios or applications. In the context of GDSS and LSP, the elementary criteria are clustered: each expert is asked for his or her opinion on each attribute. Per attribute, they are grouped into clusters. The experts in a single cluster generally have the same opinion for what values are desired for the considered performance variable. This also implies clustering has to be done for each of the performance variables. 3.3.2 Further Tiers of Aggregation After the experts are grouped in clusters, further steps of aggregation can be done. Representing a cluster with a single membership function, the output of the first round of aggregation can be interpreted as the inputs of a small group of experts with dissimilar opinions. This is similar to the current literature, which proposes several techniques for handling with small groups of experts, though a couple of things need to be kept in 36 mind: • In the literature, no assumptions are made on the similarity between experts, meaning that it is possible multiple experts have similar opinions. In our case, this is no longer possible due to the fact we have already grouped experts with similar opinions into clusters. • As described further on, a new measure called confidence is introduced, which is not dealt with in existing techniques in the way it is used here. Not using confidence would mean increasing loss of information. Therefore it is interesting to investigate new modes of aggregation after clustering, rather than applying an already existing technique. Assuming LSP is used as decision support algorithm, a single elementary criterion is needed per performance variable. In the current situation, we have a group of clusters per attribute from among which we need to find a representative. To this end, we need to find a representative for each cluster and then combine these into a single representative elementary criterion. To find these representatives there are several possible approaches. For the representative of a cluster, the suggested approach is based on some characteristics like the number of opinions (e.g., majority or minority) and shape of membership functions representing the expert opinions (represented by small cores), among other meaningful cluster characteristics [TRBDT13]. When selecting the final representative, the cluster representatives have to be either merged or one of them has to be selected. Both have advantages and disadvantages: merging means less loss of information whereas selection is easier. In the context of elementary criteria however, the merging of membership functions is often meaningless. When trying to merge a function that represents “only low values” and one that prefers “only high values”, one could quickly end up with a meaningless result that accepts all values, or that has a medium preference along the entire range. This means the process of merging might in fact be counter-productive and lead to a large loss of information. For this reason, selection is recommended to elect the final representative. To make the choice of which cluster is elected as final representative, taking the cluster with the highest confidence is recommended. 37 Extensive research on this topic is beyond the scope of this paper but it is not unimportant. We continue the research with a discussion of the concept of confidence. 3.4 Confidence as a Concept To minimize the loss of information introduced by aggregation we define a new measure, “confidence”. This is a broad concept and its interpretation is not trivial. The term confidence appears at different levels throughout the system and a clear definition can only be given in the proper context. Depending on the level at which we view confidence, its interpretation is different. We will discuss three levels of confidence in order: 1. Cluster confidence, at cluster level 2. Elementary confidence, at elementary criterion level 3. Global confidence, at system level At the system level, global confidence in itself implies several things. For one, it is an indicator for the certainty that the global preference of the system is representative for the bulk of experts. It defines how correctly it follows their opinions. Otherwise, it also represents the degree of difference between them. In case all experts agree closely, the global confidence will be higher than in case the experts disagree. Furthermore, it also depends on the measured values for the performance variables of the specific system being evaluated. 3.5 Defining and Calculating Confidence After outlining the confidence concept, there is still a need for a clear definition and the way to calculate it. Finding a final global confidence value per system will be a multiple step process. We start with elementary confidence scores per cluster which are then aggregated similarly to preference values. The first appearance of confidence happens at the cluster level after the experts are grouped into clusters. For each cluster, we calculate a value that indicates the degree in which it represents the average opinion of the population. Afterwards, a single representative from the clusters is elected for further evaluation leaving us with one elementary 38 criterion per performance variable. We define a confidence measure for it to indicate its representativity for the opinion of all the participating experts. We call this the elementary confidence. Finally, these are propagated through the aggregation structure and produce a global confidence value per system. This represents the certainty of the correctness of the corresponding global preference score. Each of these levels is now further explained. For each level we will clearly define confidence and discuss how it can be calculated. 3.5.1 Confidence at Cluster Level The calculations of cluster confidence is in itself a multiple step process. The value is based on two aspects: the cluster itself and its similarity to the other clusters. First we calculate the cluster confidence based only on the cluster itself. Then, it can be readjusted based on the similarity of the cluster to the others. This may be done to increase the confidence we have in clusters with similar typical values. This might be the case when a larger portion of experts share similar opinions even though they were originally separated into different clusters. These recalculations keep this in mind and try to balance this out properly. Definition of Cluster Confidence The cluster confidence is calculated immediately after the clustering algorithm is finished. It serves as a measure of the magnitude and compactness for each cluster. To that end, it depends on two important factors: • relative frequency fr , the relative amount of experts in the cluster and • compactness c, the degree in which the elementary criteria in the cluster are similar. If we call the cluster confidence level γ, we desire the following relations with these parameters: γ ∼ fr , γ ∼ c. This leads to the following definition of cluster confidence: 39 Definition 3.5.1 The cluster confidence represents the importance of the cluster. It combines the relative size of a cluster and its compactness. Clusters with a containing a majority of the population or that have very similar opinions will often have a higher confidence value. Let γ be the cluster confidence, fr the weighted size of the cluster and c the degree of compactness of the cluster (see further). γ can then be calculated as follows: γ = k1 · fr + k2 · c In this definition we are free to choose k1 and k2 as normalisation constants. Note that these parameters can be interpreted as weights which can be modified to change the relative importance of the weighted relative weights and the compactness. We can thus reduce this by replacing k1 and k2 by α and (1 − α), introducing alpha as the weight coefficient. The final form of the first-tier confidence formula can then be written as follows: We can rewrite the calculation of cluster confidence γ using α as follows: γ = α · fr + (1 − α) · c We still need to define how to calculate the relative frequency fr and the compactness c. The former seems straight forward but it is not. There is an important third factor to keep in mind which we have thus far not discussed. This is the possibility of weights among experts. Relative Weighted Frequency There are multiple scenarios in which it would be interesting to assign weights to experts. This occurs when the experts can be partitioned into groups of expertise, independently of their elementary criteria. Such an example would be in the process of a business decision where a board of experts is consulted aside from the social media polling. As the director, you might value the opinion of a single expert more than that of an individual from the crowd, and this can be specified by adding weights. These weights can be normalized but they do not have to be. In fact, when normalized, problems might occur for when the amounts of experts becomes very large. When 40 consulting tens of thousands of people and besides them renowned experts, whose opinion you value five times as much, you might need to assign an individual a weight of one fifty thousandth, which can cause a loss of precision in calculations. Much more natural is the approach where the weights are relative and only the fraction of two weights shows the relative importance of one individual compared to another, where two equal weights imply equal importance. In the case of social media, a suggested default value is to assign all experts the same weight first, and possibly assign actual experts a (much) higher weight. The relative frequency then becomes a weighted relative frequency, defined in the following. Definition 3.5.2 Let E be the set of all experts, n be the size of E, Ei be the i-th expert in E and let w(e) be the weight of expert e. Then the relative frequency fr,k of cluster k is calculated as follows: P fr,k = w(e) e∈ k n P w(Ei ) i=1 Compactness The calculation of the compactness c is not as straight forwards, because the concept of compactness represents the degree in which a cluster is internally coherent, or similarly be the inverse of a measure indicating how spread it is. We propose several approaches for calculating the compactness: one based on typical values and one based on interval-valued fuzzy sets. Typical Value Compactness A first approach is by using the typical value of a cluster. The typical value is often taken as the median or mean. For membership functions, we have already defined a similarity measure using the shape-symbolic measure. We can use this to select a representative from each cluster that has the highest similarity with all other membership functions in it. From now on. we call this the most typical value. Note that the most typical value is always a member of the cluster. Definition 3.5.3 The most typical value of a cluster is the membership function that has the highest similarity to all other membership functions in the cluster. It is also necessarily a member of this cluster. 41 Opposed to the most typical value, there is also the least typical value. This is in itself also a typical value but this time of the dissimilarity between the membership functions in a cluster, and can be found by finding the element with the highest distance (i.e., the lowest similarity) to the others. Definition 3.5.4 The least typical value of a cluster is the membership function that has the highest dissimilarity to all other membership functions in the cluster. It is also necessarily a member of this cluster. Based on the similarity between the most and the least typical values we can define the compactness of the cluster as being proportional to their similarity. Their similarity has already been calculated during the clustering step and can be found in the similarity matrix. Because this is already normalized it can directly serve as a measure for compactness. Definition 3.5.5 The compactness based on the most and least typical values of a cluster is equal to the similarity between them. However, keep in mind that a small cluster with little membership functions will be more likely to have similar most typical and least typical values. In fact, in the extreme case where a cluster exists of exactly one membership function, this will be both at the same time, and the compactness will be maximal because the similarity of a function with itself is 1. This is mathematically correct, however this could have unwanted repercussions. This would mean that a cluster with only one expert in it would end up with a higher compactness than a cluster of five experts whose opinions somewhat differ. This is not really a problem for the confidence though, as the weighted relative frequency also appears as a parameter in its definition. Another possible mitigation would be through the use of the relative frequency in the calculation of the compactness, too. The relation should then be inversely proportional, but would not have to be linear. A suggestion would be to divide the similarity by the square root of the relative frequency. An illustration of a cluster with its most and least typical values indicated is given in Figure 3.7. An obvious advantage of this approach is the fact is it computationally fast, easy and cheap. However, selecting a representative is not exact, and defining a measure over a set of values based on two members of the set implies a loss of information. However, the measure does still depend on all members indirectly, as the most and least typical values are calculated based on the entire cluster. 42 Figure 3.7: Most and least typical values for a cluster, the most typical value is displayed in dark red, the least typical value is displayed in blue. Interval-Valued Fuzzy Set Compactness A second approach to calculating the compactness of a cluster is through the use of interval-valued fuzzy sets. An intervalvalued fuzzy set is an extension to regular fuzzy sets in the sense that it defines an interval of possible ranges for each value in the domain of the set. Graphically, this can be seen as the composition of two membership functions where one represents the upper bound of possible values and the other represents to lower bound. Figure 3.8: Example of an interval-valued fuzzy set where the lower bound has a maximum value of λ. In order to calculate the compactness from this, we need to find a bounding surface enclosing all membership functions in the set. Analytically this is easily done by taking the maximum and minimum of all membership functions in each point when plotting them together on a graph. Computationally, this is difficult and to be exact the calculations would require an analytical engine and infinite precision. The idea is to then calculate the surface enclosed by the upper and lower bound of the interval-valued fuzzy set. A large surface then indicates a large spread and a small surface indicates a compact cluster. Because we do not have infinite precision, we need an alternative way to find these bounds to approximate their enclosed surface. Through using approximations it becomes possible to estimate the desired surface in a computationally much cheaper and 43 easier way. Figure 3.9: The upper and lower bounds of the encapsulating interval-valued fuzzy set; the upper bound is displayed in green, the lower bound is displayed in teal.. If we consider each membership function as a collection of four values, a, b, c and d, then we can find an approximate upper bound by taking the minimum a and b and maximum c and d. To know whether or not doing this introduces errors it suffices to check if the minimum a and b belong to the same membership function (dually for the maximum c and d). In case they do not, it is possible the estimate is wrong. This can be mitigated by either performing another iteration of calculations, further refining the surface, or by simply reflecting this uncertainty by lowering the confidence in the compactness, for example by increasing α by a certain percentage, lowering the weight of the compactness. The lower bound is harder to estimate. First we compute the maximum a and minimum d. In case the a-value exceeds the d-value, the lower bound is simply the X-axis itself, which is the simplest case. In case it does not, we need to compute the maximum b and minimum c. Again we compare these to check if the maximum b does this time not exceed the minimum c. In that case, the lower bound estimate is given by the computed a, b, c and d values. Similarly to before, we can test the possibility of an error by checking if the maximum a and b (and dually minimum c and d) belong to the same membership functions. In case b does however exceed c, the lower bound will not have a core and is much harder to specify. An additional parameter needs to be added to indicate the maximum height reached by the lower bound. Moreover, a new b and c value need to be computed, which will be equal to each other, and equal to the point on the x-axis where the lower bound is maximal. This can be heuristically done by looking at the inclination between a and b of the membership function with maximum b and the inclination between c and d of the membership function with minimum c. The intersection of these lines can be used as an estimate for the maximal value of the lower bound and its abscissa can be used as new 44 b and c value. Again we face the problem of uncertainty, which can be mitigated as before, either by reiterating with more precision or by lowering the confidence. Definition 3.5.6 The compactness based on interval-valued fuzzy sets can be computed from the surface between by the upper and lower bounds of the interval-valued fuzzy set enclosing all membership functions in the cluster. The surface has to be normalized which can be done by rescaling the x-axis to map the maximum on 1. Figure 3.10: The lower bound of this interval-valued fuzzy set has no core and is clearly only an approximation to the real surface, as one of the membership functions lies partially below the lower bound.. Note that both approaches respect extreme cases such as one expert per cluster and scenarios with one huge cluster containing all experts. Inter-cluster Confidence Redistribution Now have a confidence level per cluster that is only dependent of the elements of that cluster. Sometimes it can be interesting to readjust this for multiple reasons. For one, we might have a number of clusters that are unusable due to the fact they have a very low confidence. Otherwise, a might find a group of clusters that are very similar. In order to deal with this accordingly, we could do several things. A first possibility is to review the clustering step and reiterate until a desired amount of clusters is produced. This is not always optimal however, because adjusting the parameters to the clustering algorithm to produce fewer clusters might have adverse effects on the compactness thereof. Therefore we recommend a different approach, which relies on selecting certain clusters. This again can be done in different ways. Either we select the top-k clusters, after ranking them by decreasing confidence, or we can simply set a threshold and ignore all clusters that fall below it. The former has the side effect that k is fixed and that sometimes a relatively important cluster might be ignored in case there are more than k interesting ones. The latter has the problem that the threshold is not easy to find. 45 Either we set a fixed value, which might in some extreme cases lead to the elimination of all clusters or none at all, or we calculate it dynamically based on the computed intra-cluster confidences. The approach of selecting clusters also has additional possibly negative effects. These come from the fact that selecting clusters directly implies that information is purposefully ignored and thus lost. This can be partially mitigated by redistributing the confidence after making the selection. After establishing the desired amount of clusters, we might have a situation where we can utilize our insight to further aid the algorithm in its artificial human evaluation. We can do this by translating our knowledge to an adjustment of the parameters to influence the decision. To clarify why this can be interesting, take as an example a scenario where there are three clusters with similar confidence, yet two of them have a similar typical value while the third represents a quite different opinion yet has a slightly higher confidence. To now select a final representative, we as humans would likely choose for one of the two clusters that are closely related, because even though individually they are less important than the third, the total image shows they are in some way correlated as they represent groups of experts that have similar opinions. It can therefore be interesting to readjust the confidence post-calculative in order to achieve more “logical” results. This can be done by computing the pairwise distances between the confidences of the clusters and boosting those that have low distances while lagging those that are considered to be outliers, resulting in a net redistribution. Note of Caution The reader should be wary of the concept of inter-cluster confidence redistribution. This step is considered optional as it might do good but it might also do bad. It is not always justified to perform readjustment of confidence levels as this in fact twists the representation of the consulted experts. In social media applications, this is probably not a problem, yet the decision maker should be careful so that he does not end up tweaking the parameters so that he manipulates the output in a certain direction. In critical applications, the manipulation and redistribution of confidence is generally discouraged as to present as “clean” results as possible, without distortion. 46 3.5.2 Elementary Confidence at Membership Function Level The next level where we encounter confidence is when a representative for each cluster is selected for further calculation. This is done because the computation of elementary preferences requires a single elementary criterion per performance variable. This implies all clusters per attribute must be recombined into one representative. We call this the elementary confidence of the selected representative elementary criterion. Again there are multiple approaches to calculate it, but first we need to find a representative per cluster. The most obvious way to do this is by selecting the most typical value. We choose the initial confidence of this representative to be the same as that of the cluster it represents. Next, we discuss how we can combine these cluster representatives into a single elementary criterion, which will be used to calculate the elementary preference score. First, we can try to merge the representatives into a single criterion. This is the path of least loss of information, as each representative is taken into account, however the merging thereof still implies some inevitable loss. The representatives can be combined in a weighted manner where the confidence levels can serve as their weight. These weights need not be normalized all together as an extra normalization step can be easily introduced to achieve this. Merging, however, is rarely meaningful. If two elementary criteria representing opposing opinions with similar confidence, such as “only low values” and otherwise “only high values” are merged, the result would be “semi-preferable everywhere” in case normalization is used, and “all values everywhere” in case a pointwise maximum is chosen as a merging strategy. Even “no values” is possible in case the strict pointwise minimum is taken. These functions might be mathematically merged but they are logically void. Therefore, we will not further examine this approach as additional research is necessary to study the possibilities in this case. Second, the opposing extreme of merging is selection. This strategy is by far the simplest but it also introduces the largest amount of information loss. Selection can be done based on confidence, for example taking the representative of the cluster with the highest confidence score. There are scenarios were selection is acceptable. This is when the clusters have clearly distinct confidence levels and one is obviously “ahead” of the others. In other cases, 47 where the confidence levels are closer to each other, an extra selection criterion could be included to ease the process, such as the size of the cluster, the distance to other clusters, or others. The extra complication however lies in the fact that a choice from similar representatives means a lot of information is lost. This can be reflected by lowering the final representative’s confidence before performing further computations. Note that in the rare case two clusters have the same confidence and selection is chosen, an additional tiebreaker is necessary. When possible, in case the competing representatives allow it, a merging step can be used to minimize the loss of information and to keep the confidence as high as possible. We will use the selection approach in the remainder of this research and the case study. Definition 3.5.7 The elementary confidence is the confidence of the elementary criterion selected for evaluation purposes. When using the selection approach, it equals that of the cluster with the highest cluster confidence. 3.5.3 Global Confidence at System Level Finally there is the global confidence of a system. Its calculation occurs during the propagation through the aggregation structure. This happens similarly to the global preference calculation. Indeed, there is a close resemblance comparing the desired behaviour of preference and confidence: in case of a complete conjunction, the lowest preference score will be dominant and produce a low output. Similarly, a low confidence in the preference of either input of a conjunction should produce a low confidence in the result. The exact value of the resulting confidence after an aggregator, LSP or compound, also depends on the preferences of the system being evaluated. This is the main difference between confidence and preference propagation: the global preference only depends on the elementary preferences and the aggregation structure but the global confidence depends on the elementary confidences, the aggregation structure and also the elementary preferences. In the case of a full disjunction with two inputs with similarly high confidence, the output will depend on which of the inputs has the highest preference score. Alternatively, in case both have a high preference but one of them has a significantly higher confidence, the result will have a high confidence too. Similar situations are found for the generalized version of the disjunction, and dually for the conjunction. 48 Definition 3.5.8 The global confidence at system level is an output parameter indicating the trust we can put in the accuracy of the representativity of the global preference of that system. It is calculated similar to the global preference through propagation and aggregation of the elementary confidence values through the aggregation structure. Each system will then have two parameters indicating its preference and confidence. Both are necessary for the decision makers to perform their evaluation of the systems. The importance of the global confidence depends on the nature of the problem. In the case of social media polling, the cruciality of the correctness of the decision is often not life-important. In fact, even when a mathematically wrong decision is then made, there are often no real repercussions, though the sales numbers might not be as high as desired. In other fields of application, however, such as medical analysis based on decision support, the correctness is very important, and a high level of confidence is required. 3.6 Combining Confidence and Preference In traditional decision support systems, the output is typically a list of the evaluated systems and their calculated global preference scores. The interpretation thereof is straight forward. This is largely due to the fact that each system is linked with only one parameter, making it trivial to rank them. This makes it simple to select the “best” system. Often when the amount of evaluated systems is large, the ones with highest global preference scores are selected by the DSS. Only they are presented to the decision makers as contenders for the final solution. In our case, this is no longer such a trivial process due to the fact there are now two output parameters. The ordering of viable systems is complicated due to the fact there is no natural total ordering on the couple (preference, confidence). We could define a partial ordering or even a total ordering but this would be a strictly mathematical solution to solve a logic problem, and is hence not the preferred way to go. Instead, we look in the direction of comparing both confidence and preference in the context of the application. Much like the comparison between cost analysis and global preference is kept separate 49 from the preference calculation, as explained by Jozo Dujmovi´c, the combination of confidence and global preference is also best kept separate. The main question to which we are searching an answer for is still “which system is the best”, which can not be simply translated to “which system has the highest global preference” or “which candidate has the highest confidence”. Clearly, for the best system we desire a high degree of confidence. At the same time, we also want the solution to have a high preference score. However, what conclusion should we draw in case the candidate with high confidence has a low preference, or vice versa? Depending on the context and more importantly the criticality of the accuracy of the decision, a high confidence might play a more important role than a high preference. In the case of social media applications, the criticality is not very high, and thus the solution with highest global preference can be considered as a viable solution, given their confidence at least surpasses a desired threshold. In other, more critical cases, a high level of confidence might be necessary. In that case, it is plausible to choose a candidate that does not have the highest preference but that has a high degree of confidence. Generally, we still want to find a solution with both high preference and confidence. This is the best possible scenario. To facilitate the decision making process, we propose a technique to combine the two parameters to one, again allowing the evaluated systems to be ordered. The following properties must be respected: • A minimum degree of confidence must be met. • A high global preference is desired. • A high global confidence is desired. • Depending on the context, it must be possible to assign differing importance to the impact of preference versus confidence. We call this combined new parameter the goodness ν of system ξ, and define it as follows: Definition 3.6.1 The goodness ν of system ξ combines the global preference and global confidence of the system with weights and can be used to rank evaluated systems. Let pξ be the global preference score of the system and cξ be its global confidence, then we can find νξ as follows: 50 νξ = k1 · pξ + k2 · cξ Again we see the combination of two weight parameters k1 and k2 . Also here they share the property that increasing k1 implies lowering k2 (at least relatively). We reduce them to one parameter and rewrite the equation as follows: We can rewrite the calculation of goodness ν by using β which gives the following: νξ = β · p + (1 − β) · c where β is the parameter defining the importance of preference versus confidence. In critical applications, β would typically be below 0.5, indicating the desire for a high level of confidence. On the right hand side of the equation, the preference and confidence indication ξ is omitted because it is implied by the left hand side of the equation. After computing this for each system to be evaluated, we can define a filtering rule selecting all those ξ with a confidence above a certain threshold θ, resulting in a filtered set of still viable system Ξ as follows: Ξ = {ξ | cξ > θ} These can then be ordered by an ordering function φ, which is defined as follows: φ : {1, 2, ..., | Ξ |} → Ξ ∧ ∀(i, j) | i < j : νφ(i) ≥ νp hi(j) This ordering function ranks the viable systems in descending order according to their goodness ν. The introduction of this parameter allows us to select good systems to solve the given problem which respect all required properties. It can be tuned by changing the β and θ parameters, allowing a degree of specification of importance between confidence and 51 preference and limiting the results to a set of systems with a minimal degree of confidence. The choice of β can be done beforehand, but the choice of θ is more difficult, as it is not known what confidences will be produced in the aggregation step. Therefore, the choice of θ is best postponed until after the presentation of the global preference calculation results. Based on the output, θ can then be dynamically calculated. Alternatively, in case the results are unsatisfactory, for example when all confidence levels are significantly low, the aggregation calculations can be repeated with different α, assigning different weights to the importance of the intra-cluster confidence parameters. However, this might also be an indication that the consulted experts have very diverging opinions. All in all, unsatisfactory results should be further examined. 52 Chapter 4 Case Study What follows is an illustrative case study to show the discussed techniques in action. We have chosen a possible application using social media and simulate a problem with multiple candidate systems. We evaluate them each using the techniques discussed above to calculate a global preference and global goodness for them and combine these into a goodness. All the phases are explained and their results briefly analysed. The remainder of this chapter is as follows. First, we sketch the background of the problem that we are trying to solve. Then, we go into the entire evaluation process going from selecting performance values to calculating the goodness for each of the considered systems. Finally, there are some brief remarks on the results. 4.1 Background An interesting application of social media interaction is found in the gaming industry. Here, game developers often interact closely with their gaming community through various channels. The most popular games nowadays are those of whom the developers have chosen for a maintenance model based on community feedback. Certain game developers maintain their creations through the use of a feedback loop, listening to what their players have to say and updating their product accordingly. The most well-known examples of this are the biggest hit by Blizzard, World of Warcraft, which has the most hours played per day according to recent observations 1 , and League Of Legends, by 1 http://www.xfire.com/cms/stats 53 Riot Games, which is paving the road for eSports in Europe and America, with the most active players at the moment 2 . Clearly, including users in their development choices pays off for them. Blizzard has held their dominant spot for almost nine years so far, almost dating back to the original release of their biggest hit. The gaming industry is a booming business with large companies and a lot of money. To be successful in this competitive world, it is of key importance to make exactly the product the customer wants. This, and because the users are often also the most knowledgeable about the product itself as gamers, makes this sector ideal for GDSS. The users are the ideal expert for consultation in business decisions. This case study is about a fictive game developer gathering information from its user base for its next game. Of course, the precise details are undisclosed, but some general facts are necessarily given to the community to get accurate feedback. To be as successful as possible with this game, the game developer decides to acquire input from its user base through social media techniques based on the methodologies described in this document. That information will be put together with the opinions of in-house, experienced experts that have worked there for many years and that already have contributed to previously successful games. 4.2 Evaluation The development of a game is a difficult process with a lot of phases, each with a lot of decisions to be made. Some aspects of the game itself can be decomposed into a hierarchical tree of measurable variables, making it an excellent candidate for GDSS. The entire process of attribute decomposition is not relevant to the research and is not discussed further. However, some performance variables are selected and discussed in greater detail. One of them will be studied to elaborate on the clustering and confidence calculations. Then, the results of other performance variables are shown and used as input to the propagation through the aggregation structure. Because the creation of a (possibly compound) aggregation structure is not in the scope of this paper, a simple illustrative example is generated based on the chosen performance variables. Note that this is an interesting topic for further research; it is worth investi2 http://euw.leagueoflegends.com 54 gating the viability of making aggregation structures through social media. In this case study however, we don’t investigate the origin of the aggregation structure and treat it as if it were created externally. Keep in mind that the accuracy of the aggregation structure plays a big role in the calculation of global preference and overall confidence and thus also the goodness of the evaluated systems, but in this example we are more interested in showing the process of calculations and influence of the parameters than the actual correctness of the results. Te rest of the evaluation process is structured as follows: first, we limit ourself to a couple of performance variables. A small discussion rationalizes the made choices. Second, we define the systems that we are going to evaluate. Next we simulate the consulting of experts. Then the clustering and confidence calculations are illustrated by investigating one of the performance variables in detail. Afterwards, the results of the others are given and the propagation of confidence through the aggregation structure is studied. Finally, the resulting goodness calculations are explained based on the evaluation of three systems. The entire process can be roughly split into three big parts: 1. Gathering inputs 2. Performing calculations 3. Analysing results We will handle each of these separately. 4.2.1 Required Inputs First, we take a look at the inputs that are required to perform the calculations. Given a problem, we can define its performance variables. Then, we can generate a few candidate systems to solve the problem. At the same time we can gather information from experts on their elementary criteria. We also need to set up an aggregation structure that will be used to combine the elementary scores into a global score. After gathering all this data, we will be ready to perform the evaluation. 55 Performance Variables The entire decomposition of a game into performance variables is large and cumbersome. Generally, the highest tier of the tree contains quality attributes such as performance, usability, availability, security and scalability 3 . Some of these, mostly from the usability category, have a great influence on the end-user gaming experience and thus make good candidates for social media consultation. The performance variables that are chosen for further investigation are the following: • Average loading time (ALT), • Offline playthrough time (OPT), • Ease of learning (EOL), • Maximum amount of players per server (PPS). These are all performance variables with a continuous range. For some, it is obvious what kind of values will be preferred, like low loading screen times, however it is still useful to gather information to gain insight into the average amount of time players are willing to wait. The offline playthrough time is an important factor to decide the balance between offline and online gameplay material, where offline material strives to be continuously innovative and online material needs enough variation to have replay value. The ease of learning plays a big role in user experience and otherwise user frustration. It is important to let the player grow in experience and discover the game piece by piece, but of course the player should not get the feeling that he only unlocks his full potential by the time the game is over. Therefore it is important to properly balance the learning curve. A good game should be both challenging and rewarding. In the aspect of online gaming, there are always servers involved. An important performance variable for scalability is the maximal server load, but this also has an impact on the gaming experience and player opinions should hence be kept in consideration. These performance variables capture some of the most important aspects of a game, going from offline to online experience and balancing reward and frustration. 3 http://equis.cs.queensu.ca/∼graham/cisc877/slides/CISC%20877%20%20Game%20Architecture.pdf 56 Other Measures Evidently, not every aspect can be decomposed into measurable components. Some choices are between discrete options, the preference of public opinion thereof can be better gauged through questionnaires than the use of GDSS. Examples of such aspects are: • Dedicated online servers versus private hosting on public servers, • Target platforms (pc, console, ...), • Target operating systems (Windows, Mac, Linux, iOS, Android, ...), • Early release with lots of downloadable content (DLC) versus a late but full release. These will not be investigated further but they are also an important aspect to game development and are therefore mentioned. Candidate Systems Three possible game configurations are evaluated. This example is simplified, as only the four mentioned attributes are taken into consideration, but suffices for the purpose of illustration. Candidate one is an example of a balanced game, with both elements for offline and online play. The playthrough time does not take into account replay value but simply indicates the time needed to play through the entire content once. C1: {PPS = 32, EOL = 5, ALT = 25, OPT = 30} Candidate two represents a single player centered game, but has some online elements in it as well. This implies the game has a long learning curve constantly adding new elements unlocking the full arsenal bit by bit as the game progresses. There are relatively long loading screens because there is a lot of different scenery with little reused elements. C2: {PPS = 12, EOL = 7, ALT = 30, OPT = 80} Candidate three is based around multiplayer action and online play, with a low amount of offline playthrough content. The learning curve is low and fast, to allow players to fully dive into the game quickly. Here, the power of the player comes from repeating the 57 same actions to get better at them, rather than spending more time to unlock features. Loading times are low so players don’t have to wait long. C3: {PPS = 64, EOL = 3, ALT = 10, OPT = 12} Experts To illustrate the clustering algorithm the performance variable OPT is elected for further investigation. N=100 experts are generated, represented by their elementary criterion for the chosen performance variable. Alongside those, random weights between 1 and 5 are generated, where 5 indicates the highest level of expertise and 1 is the lowest. Most experts get a weight of 1, representing regular gamers, the vast majority of the group of consulted people. Long time gamers might get a 2, but the higher values in the spectrum are reserved for in house experts at game development, which are also consulted. Some of the experts are displayed in Table 4.1. ID 0 1 2 a 0 0 0 b 0 0 0 49 50 51 71 39 10 77 71 44 97 98 99 26 22 16 55 35 36 c 65 48 36 ... 77 71 44 ... 56 43 82 d 98 84 82 weight 1 1 2 90 84 76 4 1 3 81 45 86 1 4 1 Table 4.1: Some of the 100 generated elementary criteria, representing the already weighted experts. These criteria can be converted into their shape-strings and feature-strings, giving their shape-symbolic representations. This is depicted in Table 4.2. Aggregation Structure The aggregation structure is kept simple in this example. This is mainly because of the fact we limited ourselves to a small number of performance variables. The structure that 58 ID 0 1 2 49 50 51 97 98 99 Shape-string 1-0 1-0 1-0 ... 0+H-0 0+H-0 0+H-0 ... 0+1-0 0+1-0 0+1-0 Feature-string L|E|ES M|S|VS S|M|VS L|ES|ES|ES|ES S|S|ES|ES|VS ES|S|ES|S|VS VS|S|ES|VS|VS VS|ES|ES|ES|M VS|VS|M|ES|ES Table 4.2: The shape-string and feature-string for some of the experts. is used is displayed in Figure 4.1. Figure 4.1: The aggregation structure used in the case study. We understand from this that at least one performance variables should be fulfilled because of the mandatory conjunction at the highest level. “Ease of learning” and “offline playthrough time” are considered disjunctive. Their requirement is non mandatory conjunctive with “average loading time”. 4.2.2 Calculations Now that we have the required inputs we can move on to the actual evaluation process based on clustering, confidence calculations and the core of LSP. 59 Clustering We can calculate the shape-similarity matrix using the distance measure for the shapesymbolic notations we have derived earlier. These calculations result in the following similarity matrix: 0 1 2 0 1.000 0.925 0.825 1 0.925 1.000 0.925 2 0.825 0.925 1.000 ... ... ... ... 49 0.450 0.450 0.450 49 50 51 0.450 0.425 0.475 0.450 0.475 0.525 0.450 0.475 0.525 ... ... ... 1.000 0.875 0.775 97 98 99 0.450 0.375 0.525 0.500 0.425 0.525 0.500 0.425 0.475 ... ... ... 0.825 0.850 0.825 50 0.425 0.475 0.475 ... 0.875 1.000 0.900 ... 0.950 0.875 0.850 51 0.475 0.525 0.525 ... ... ... ... 97 0.450 0.500 0.500 98 0.375 0.425 0.425 99 0.525 0.525 0.475 0.775 0.900 1.000 ... ... ... 0.825 0.950 0.950 0.850 0.875 0.825 0.825 0.850 0.800 0.950 0.825 0.800 ... ... ... 1.000 0.875 0.850 0.875 1.000 0.825 0.850 0.825 1.000 Table 4.3: Similarity matrix showing similarities for each pair of elementary criteria. Note that this matrix is indeed symmetric around the main diagonal and everywhere 1 on it. Next, the elementary criteria are hierarchically clustered based on most-similar first. The stopping criterion is based on a threshold, which is here set at 0.95. This is a rather high threshold which results in a large amount of compact clusters. Increasing the threshold increases both the amount of clusters and the compactness thereof, whereas lowering it decreases both. Running the clustering algorithm produces 42 clusters. Part of the results can be seen in Figure 4.2. Purely based on the amount of experts and clusters there is a cluster for every two or three experts. However we can see that most clusters contain one experts and some contain a larger amount. The largest cluster has 10 experts and the second largest has 8. The clusters with one expert in them are a direct result of the high clustering threshold. For each cluster we can calculate the most and least typical values and define the upper and lower bounds. Some of them are shown here. The colouring scheme is the same 60 Figure 4.2: An excerpt of the clustering algorithm result dendrogram. as before: green and teal are respectively the upper and lower bounds of the enclosing interval-valued fuzzy set. Figure 4.3: Cluster 18. Figure 4.4: Cluster 30. After the clustering is done we can start the preference and confidence calculations. 61 Figure 4.5: Cluster 52. Figure 4.6: Cluster 60. Confidence Calculations First we calculate the cluster confidence. The results of using interval-valued fuzzy sets are displayed for several values of α to study its impact. Based on the cluster confidence the top five are selected and shown. Cluster ID 68 76 36 2 18 Cluster Confidence 0.597 0.590 0.589 0.582 0.575 Table 4.4: Configuration A1 α = 0.4, interval-valued fuzzy sets Cluster ID 68 36 76 2 18 Cluster Confidence 0.503 0.494 0.493 0.490 0.586 Table 4.5: Configuration A2 α = 0.5, interval-valued fuzzy sets The calculations are illustrated for cluster 18. It contains five experts with IDs 18, 19, 62 Cluster ID 30 68 36 18 76 Cluster Confidence 0.412 0.409 0.399 0.397 0.397 Table 4.6: Configuration A3 α = 0.6, interval-valued fuzzy sets 25, 26 and 27. Their respective weights are 1, 1, 3, 1 and 1. Their relative weighted frequency is thus equal to the sum of those weights divided by the total weight of all experts. This leads to a value of fr = 0.042. The normalized surface of the cluster is calculated by taking the minimum a and maximum d of all criteria, multiplying them and taking that as the maximal surface. The surface enclosed by the interval-valued fuzzy set for cluster 18 is then divided by the maximal surface. This results in a compactness c = 0.93. Combining both with α = 0.4 gives the listed cluster confidence of γ = 0.575. Increasing α increases the importance of the relative frequency of the cluster, which increases the confidence in clusters with more members. The reason the overall confidence seems to drop lower when increasing α is due to the fact the original clustering threshold resulted in a lot of compact clusters. The clusters all have a relatively low size, meaning the relative frequency is low, even for the largest cluster. Therefore, shifting the weight towards its importance lowers the confidence in general. In a scenario where the examined population of experts is much larger, in the order of thousands, and the clustering algorithm produces a higher average experts per cluster, this behaviour would not occur. Note that the top five clusters do not necessarily contain the largest clusters. Due to the random weights and possibly the compactness of other clusters, some smaller clusters such as clusters 18 and 68 score the best. Also note that the largest cluster does not have a much higher confidence than the other clusters. This is because the weights of the members in the other cluster are low on average, as opposed to the smaller clusters that score well because the weights of their experts are high. This makes the relative weighted frequency of the top five clusters about the same. 63 For the remainder of the calculations configuration A3 is used. The choice for α = 0.6 is on purpose, as we wish to give a larger impact to the relative size of the clusters to compensate for the high threshold during the clustering phase. In what follows, each cluster is represented by its most typical values which will serve as elementary criterion. Other Performance Variables The results of the other performance variables’ confidence calculations are shown in Table 4.7. Performance variable EOL ALT PPS a 0 0 8 b 1 0 16 c 4 0 64 d 8 60 128 Elementary Confidence 0.830 0.772 0.917 Table 4.7: Confidence result for the other performance variables The ease of learning is rated on a scale from one to ten indicating how difficult it is to learn the aspects of the game. This can be interpreted as the amount of times an action should be repeated before it is considered to be an acquired skill. The average loading time is in seconds and shows the tolerance for sitting idle while the game loads various elements. The maximum players per server is pretty straight forward and expresses the amount of players that can be logged in and playing at the same time on one server. Propagation Through the Aggregation Structure The global preferences are derived by aggregating the elementary preferences of the systems using the given aggregation tree. This is the core of the decision support program that remains unchanged, we just apply LSP here. Similarly but slightly different, the global confidence is propagated as well. For any aggregator A, the output confidence is calculated as follows: 1. Calculate the output preference from the input preferences 2. Find the input with preference closest to the output preference 3. Take that input’s confidence as a starting point 4. Calculate the output confidence from the input confidences (similar to preference) 64 5. Take the average of the calculated output confidence and the selected input’s confidence This implementation leads to logical results but is not part of the research. It also respects the dependency between the output and the input confidences, input preferences and the aggregation parameter. Alternative implementations are worth examining further, as is mentioned later on in the part of future work on compound aggregation structures. The results are displayed in Table 4.8: System ID 1 2 3 Global preference 0.711 0.532 0.856 Global confidence 0.727 0.876 0.806 Table 4.8: Global preference scores and global confidences per system after propagation through the aggregation structure At this point, the algorithm can be stopped and the results can be analysed by the decision maker(s). In case there are only a few systems being evaluated, as is the case here, this is possible as there is a clear overview over all systems. We see candidate system 2 has a significantly lower preference than the others. At the same time, its confidence is the highest, meaning we can put a high amount of trust in the accuracy of these results. Apart from that, system 3 seems to win from system 1, in both preference and confidence, likely rendering it the most suitable system. 4.2.3 Results If there are a lot of systems and there is no clear oversight of the results, the next step in the process is executed to rank the systems. This is illustrated here for the sake of the example. The following shows the results of combining preference and confidence for several values of β. In case the accuracy of the results is crucial, we want to have a high confidence for the solution. As we can derive from configuration B1, with β = 0.4, this may lead to a higher goodness for a system with relatively low preference, as long as the confidence in this is high, because the trust we have in the accuracy of this solution is important. In 65 Cluster ID 1 2 3 Goodness 0.720 0.739 0.827 Table 4.9: Configuration B1 Global goodness, β = 0.4 Cluster ID 1 2 3 Goodness 0.719 0.704 0.832 Table 4.10: Configuration B2 Global goodness, β = 0.5 Cluster ID 1 2 3 Goodness 0.717 0.669 0.837 Table 4.11: Configuration B3 Global goodness, β = 0.6 a way, the confidence can be interpreted as a sort of variance on the preference, which can then be seen as the expected value. That is why a solution with high preference yet low confidence is sometimes less trustworthy. In this system, the cruciality is not very high and β values closer to one are also viable. Choosing β = 0.5 or β = 0.6, we find that system 2 drops to the bottom as confidence becomes less important relatively to preference. The results affirms our previously made short evaluation, which confirms system 3 is the best. The semantic analysis of these results translate to the following: over all, the gaming population seems to prefer a multiplayer game, though a balanced game is also viable. 4.3 Final Remarks In the previous, the proposed techniques are used to evaluate certain systems to estimate how viable they are. However, when the performance variables of the problem are 66 sufficiently mutable and the set of all possible systems too exhaustive, the methodologies explained in this work can also be used to generate good possible solutions with attribute values in the optimally preferred range. For example, instead of evaluating certain prototype games, the “perfect” game could be made, where the performance variables are allotted a value based on the results of the clustering step. It is then no longer necessary to perform further aggregation and goodness calculations as the system is de facto by construction optimal. For the chosen clustering configuration this would lead to a game with the following parameters: Performance variable EOL ALT PPS OPT amax 0 0 8 10 bmax 1 0 16 71 cmax 4 0 64 83 dmax 8 60 128 83 Optimal value 2.5 as low as possible 32 75 Table 4.12: The optimal game prototype, as derived from the results of the clustering step β = 0.6 This derived prototype clearly has a bit of everything. It should be easy to learn, with average sized servers, quick loading times and still a good amount of playthrough time. Of course, this might not be feasible, as this means the game is in fact a bit of everything. In reality, this includes the risk it has everything but excels at nothing. This immediately shows the drawback of constructing an optimal prototype versus evaluating multiple candidate systems: the systems are constructed with care and take into account factors that aren’t incorporated in the decision support algorithm, such as time to market, available budget and resource requirements, whereas the optimally constructed prototype might be completely infeasible. 67 Chapter 5 Conclusions So far we have shown the possibilities of the proposed framework. It allows us to gather information from a source that was previously unreachable. It proposes methodologies to analyse this data in a useful way. The used techniques are all recently developed and offer a high degree of flexibility in human logic modelling through the use of soft computing techniques and configurable parameters. First, we gather weighted expert opinions based on soft computing techniques where each expert could express his/her expertise or preferences through membership functions. Then we convert them to their corresponding shape-symbolic notations and cluster them hierarchically according to a shape-similarity measure based on the Levensteihndistance. Second, we calculate the confidence of the clusters based on their relative size and compactness. The balance of importance between both can be changed by tweaking the α parameter. Optionally, a second round of confidence calculations can be executed, carefully redistributing the confidence, boosting the confidence of clusters with similar cores and lowering that of the others to compensate. Third, the aggregation step occurs, selecting the best clusters and their most typical values as representatives. The confidence at this point reflects the representativity of the experts’ opinions by the representing elementary criterion. Fourth, each candidate system is evaluated through the traversal of the aggregation structure. A global preference and a global confidence per system are derived. The confidence level can now be interpreted as a degree of trust we can put in the accuracy of the global preference score. 68 Finally, the resulting preference and confidence can be combined. The cruciality of the problem can be reflected by altering the β parameter. The result is an orderable set of systems, from which the mathematically best can be selected for evaluation by the decision maker(s). It should be clear that the output depends heavily on the parameters. The most obvious ones are the clustering threshold, α and β, but also the aggregation structure, θ and the weights of the experts play an important role. A little change in any of these can cause a drastic change in the results. Hence it is important to realize the correctness thereof has a large influence on the accuracy of the final ranking of the evaluated systems. The accuracy of the parameters reflects the actual opinions of the experts and the decision makers. The results of the framework are only as accurate as the correctness of the representation of their the human logic. Obviously, the framework is just a tool to aid humans in their selection process. 5.1 Future Work Looking at possibilities to further explore, it is interesting to think of different aggregation techniques, as the confidence calculations proposed in this work are closely tied to the context of pre-aggregation and are not applicable in other situations. Another thing worth looking into is the construction of compound aggregation structures, as was briefly mentioned before. Alternative Aggregation Techniques Pre-aggregation has the requirement that all experts have to be consulted in advance. Their data needs to be collected before the algorithm can run in order to produce meaningful results as they are aggregated first of all to serve as an input to the decision support system. In case new experts are introduced later on, all calculations have to be remade and the results of a previous evaluation possibly neglected as the extra information can and most likely will lead to different results. This also implies that all information on the previously consulted experts needs to be stored somewhere, even after their input has been used in a calculation. This is not always a problem but it rules out the possibility of use in certain fields of 69 application. In case, for example, we want to set up a community driven on-demand suitability map generator for specific queries, the entire aggregation and confidence calculations have to be redone every time a new user (here considered an expert) enters his or her preferences. It is clear that with pre-aggregation there is a separate input gathering phase, a short calculation step and a long life phase, in which the gathered input is used to answer flexible queries, though the input can no longer be changed without recalculation. Most importantly, these phases are separated and do not overlap. It might be interesting to look for an aggregation technique that allows a certain form of continuous (re-)evaluation, where there is no separation of time in the phases, and where the cost of aggregation is amortized by splitting it into small chunks of short calculations each time an expert inputs his or her opinions. The final desired scenario is one in which experts can input their opinions and produce a resulting preference mapping. Then, a new expert can join in and enter his or her input, and the previous results are updated on-the-fly. In a way, the resulting preference and confidence scores per system can be seen as a sort of running average of the results of all experts. A possible approach to that end could be by postponing the aggregation step until after decision support has been done. The whole system would then consist of multiple independent single decision support calculations, one per expert, each resulting in a personalized output reflecting the opinions of that expert. Those outputs can then be aggregated in multiple ways. Note that there is no mention of confidence up to this point as there has been no aggregation thus far. This also means the existing DSS can be used as-is, without modifications and extensions, as all aggregation is done on their outputs. The aggregation of the independent outputs can be done in different ways. Keep in mind that thus far we have an array of mappings between each system and its calculated global preference, one element per expert. The simplest way to aggregate these to a single preference score, reflecting their combined opinions, is by taking a weighted average. The weights can be distributed based on the expertise level of the experts, similarly to pre-aggregation. A simple measure for confidence can then be derived from the size of the interval of difference between the preference scores for each system. Simply subtracting the minimal preference from the maximal preference from all experts and rescaling the 70 result to a normalized axis gives an insight into how differing the opinions are. This confidence measure however is however sensitive to outliers. Another possibility worth investigating is through a more probabilistic approach. Note that we can interpret the array of global preference values per candidate as a probabilistic distribution which would presumably converge to a Gaussian distribution in case the amount of experts is large enough. The overall combined global preference can then be derived by first finding the best matching Gaussian and taking its expected value. The confidence that goes with it can be seen as a measure of spread on this distribution and can thus be found from the standard deviation. Compound Aggregation Structures In the examined case study, the origin of the aggregation structure was not specified. In this case, it was simply generated from my point of view, but it was meant to represent the opinions of a small group of decision makers, whom agree on which of the preference variables are most important and how they should be aggregated. Similarly to the expert opinions however, this could also be constructed from information gathered from social media. The experts could be asked to rank the performance variables on their importance and indicate which they deem necessary. Some effort can go to the exploration of the possibility of aggregating that information to create a compound aggregation structure, to further include clients in the decision making process. 71 Bibliography [BES08] R. Bodea and E. El Sayr. Code coverage tool evaluation. 2008. [DDT11] J.J. Dujmovi´c and G. De Tr´e. Multicriteria methods and logic aggregation in suitability maps. 2011. [DDTD09] J.J. Dujmovi´c, G. De Tr´e, and S. Dragi´cevi´c. Comparison of multicriteria methods for land-use suitability assessment. 2009. [DDTVDW09] J.J. Dujmovi´c, G. De Tr´e, and N. Van De Weghe. Lsp suitability maps. 2009. [DF04] J.J. Dujmovi´c and W.Y. Fang. An empirical analysis of assessment errors for weights and andness in lsp criteria. 2004. San Francisco State University, Department of Computer Science. [DG85] G. DeSanctis and R.B. Gallupe. Group decision support systems: A new frontier. 1985. [DG87] G. DeSanctis and R.B. Gallupe. A foundation for the study of group decision support systems. Management Science, 33(5), May 1987. [DL07] Jozo J. Dujmovi´c and Henrik Legind Larsen. tion/disjunction. Generalized conjunc- International Journal of Approximate Reasoning, 46(3):423–446, December 2007. [DN05] J.J. Dujmovi´c and H. Nagashima. Lsp methods and its use for evaluation of java ides. 2005. [DP80] Didier Dubois and Henry Prade. Fuzzy Sets and Systems: Theory and Applications. Academic Press, Inc., 1980. 72 [DT] G. De Tr´e. Vage Databanken. [Duja] J.J. Dujmovi´c. Characteristic forms of generalized conjunc- tion/disjunction. [Dujb] J.J. Dujmovi´c. A comparison of andness/orness indicators. San Francisco State University, Department of Computer Science. [Dujc] J.J. Dujmovi´c. Optimum location of an elementary school. [DY04] J.J. Dujmovi´c and Fang W. Y. Reliability of lsp criteria. 2004. [Gus97] Dan Gusfield. Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, New York, NY, USA, 1997. [Hub84] G.P. Huber. Issues in the design of group decision support systems. MIS quarterly, pages 195–204, September 1984. [Kee87] P.G.W. Keen. Decision support systems: The next decade. Decision Support Systems, 3:253–265, 1987. [KY95] George J Klir and Bo Yuan. Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall, 1995. [LWHS03] P.J.G. Lisboa, H. Wong, P. Harris, and R. Swindell. A bayesian neural network approach for modelling censored data with an application to prognosis after surgery for breast cancer. Artificial Intelligence in Medicine, 28:1–25, 2003. [SWC+ 02] J.P. Shim, M. Warkentin, J.F. Courtney, D.J Power, R. Sharda, and C. Carlsson. Past, present, and future of decision support technology. Decision Support Systems, 33:111–126, 2002. [TRBDT12] A. Tapia-Rosero, A. Bronselaer, and G. De Tr´e. Similarity of membership functions - a shaped based approach. In Proceedings of the 4th International Joint Conference on Computational Intelligence, pages 402–409. SciTePress - Science and Technology Publications, 2012. 73 [TRBDT13] A. Tapia-Rosero, A. Bronselaer, and G. De Tr´e. A shape-similarity based method for detecting similar opinions in group decision-making. Information Sciences Sp. Iss. on New Challenges of Computing with Words in Decision Making, 2013. [Zim78] H.-J. Zimmermann. Fuzzy programming and linear programming with several objective functions. Fuzzy Sets and Systems, 1:45–55, 1978. 74
© Copyright 2024