LIU-ITN-TEK-A--15/034--SE Utvärdering av användarupplevelsen av mobilspel med hjälp av sessionsinspelningsverktyg Veronica Börjesson Karolin Jonsson 2015-06-12 Department of Science and Technology Linköping University SE- 6 0 1 7 4 No r r köping , Sw ed en Institutionen för teknik och naturvetenskap Linköpings universitet 6 0 1 7 4 No r r köping LIU-ITN-TEK-A--15/034--SE Utvärdering av användarupplevelsen av mobilspel med hjälp av sessionsinspelningsverktyg Examensarbete utfört i Medieteknik vid Tekniska högskolan vid Linköpings universitet Veronica Börjesson Karolin Jonsson Handledare Camilla Forsell Examinator Katerina Vrotsou Norrköping 2015-06-12 Upphovsrätt Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under en längre tid från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/ Copyright The publishers will keep this document online on the Internet - or its possible replacement - for a considerable time from the date of publication barring exceptional circumstances. The online availability of the document implies a permanent permission for anyone to read, to download, to print out single copies for your own use and to use it unchanged for any non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional on the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its WWW home page: http://www.ep.liu.se/ © Veronica Börjesson , Karolin Jonsson Evaluating the user experience in mobile games using session recording tools A thesis presented for the degree of Master of Science in Media Technology and Engineering Linköping University, Sweden Veronica Börjesson Karolin Jonsson Supervisor: Camilla Forsell Examiner: Katerina Vrotsou Stockholm 2015–06–20 Abstract This thesis work examines how the user experience of mobile games can be evaluated with the use of session recording tools. The aim is to produce a workflow for user testing with session recording tools for mobile devices. In order to evaluate the tools and services, and to develop the workflow, several user tests have been conducted. When using mobile session recording tools, it is possible to record the screen of the device and the microphone input while the user is playing the game. In some tools it is also possible to record the input from the front camera of the device, making it possible to capture the user’s facial expressions and reactions during the test session. Recording the test session makes it easier to understand and evaluate the player experience of the game, and also to identify issues such as difficulties with the navigation in the application or annoyance due to non intuitive interaction patterns. It is also a good way to get feedback about what the user likes and dislikes in the application. The fact that no additional equipment is needed for recording the test session, and that the user can perform the test comfortably on their own device in their own home, increases the chances for the test itself to have a minimal impact on the user experience, since the user can complete the test in their natural environment. Session recording tools are appropriate when conducting remote user testing since the users and the user experience researcher do not have to be at the same location. It is also a flexible approach since the testing does not have to be carried out in real-time. The test users can perform the test when they have time and even simultaneously, while the user experience researcher can watch and analyse the recordings afterwards. When conducting user testing with session recording tools, there are also other parts necessary besides the actual tool. The test has to be set up (instructions, tasks, questions etc.) and both the test and the game application containing the integrated session recording tool need to be distributed to the test user in some way. The test users need to be recruited from somewhere, and they have to match the desired target group for the test session. There are test services which provide all this; test set up, recruitment of test users, distribution of test and game application, and also some which even provide analysis of the recordings. When not using a test service, the test facilitator needs to take care of recruitment of test participants, test set up, distribution and analysis of test data by him or herself. During this study, methods for conducting user testing using session recording tools both with and without test services have been tested and evaluated. The mobile game Ruzzle, developed by MAG Interactive, has been used as test object. This thesis also covers how the user experience in mobile games differs from other software, and it also investigates how the user experience can be analysed from the session recordings, i.e. how the user’s emotions can be read from the recorded screen, voice and face. As a part of the thesis work, a testing workflow has been developed for the commissioning company MAG Interactive. It contains guidelines for how to conduct user testing with session recording tools and which parts that are necessary in order to carry out a successful test process. Tables with information regarding the tools and test services are also presented, in order to facilitate the decision on which tool/service that is most suitable for the specific test objective. Key words: User Experience, Player Experience, User Testing, Session Recording Tool, Testing Workflow i Sammanfattning Detta examensarbete undersöker hur användarupplevelsen i mobilspel kan utvärderas genom användning av sessionsinspelningsverktyg. Syftet är att producera riktlinjer, i form av ett arbetsflöde (på engelska även kallat workflow), för hur användartester med sessionsinspelningsverktyg för mobila enheter kan genomföras. För att kunna utvärdera verktyg och testtjänster, samt förvärva de kunskaper och den erfarenhet som krävs för att kunna utveckla detta arbetsflöde, har flertalet användartester genomförts. Mobila sessionsinspelningsverktyg möjliggör inspelning av mobilenhetens skärm medan användaren spelar mobilspelet. Vissa verktyg har även inspelning av mobilenhetens frontkamera vilket gör det möjligt att spela in användarens ansiktsuttryck under testsessionen. Detta underlättar vid försök att förstå användarupplevelsen och vid utvärdering av spelupplevelsen, samt vid identifiering av problem som till exempel svårigheter att navigera i applikationen eller irritation över icke intuitiva interaktionsmönster. Det är även ett bra sätt att få återkoppling om vad användaren gillar och ogillar med applikationen. Det faktum att ingen extra utrustning behövs för att kunna spela in testsessionen, samt att användaren kan utföra testet på en enhet som de känner sig bekväm med, i sitt eget hem, ökar chanserna för att testet skall ha en minimal inverkan på användarupplevelsen, eftersom användaren kan utföra testet i sin vardagsmiljö. Sessionsinspelningsverktyg passar bra vid utförandet av användartester på distans eftersom användare och testanordnare ej behöver befinna sig på samma geografiska plats. Det är även ett flexibelt tillvägagångssätt då testerna ej behöver utföras i realtid. Testanvändarna kan utföra testet när de har tid, till och med samtidigt, medan testanordnaren kan analysera inspelningarna efteråt. Vid genomförandet av användartester med sessionsinspelningsverktyg, finns även andra essentiella delar förutom själva verktyget. Det måste finnas en metod för att skapa ett test (innehållandes bl.a. testinstruktioner, uppgifter och frågor) och möjlighet att distribuera testet till användarna. Användarna skall även rekryteras någonstans ifrån, och de måste överensstämma med den önskade målgruppen för testsessionen. Dessutom behöver applikationen, med det integrerade verktyget, distribueras till användarna så att de kan ladda ned och installera applikationen för att kunna delta i testet. Det finns testtjänster som erbjuder både skapande av test, rekrytering av testanvändare, distribuering av test och applikation, och vissa erbjuder även analys av sessionsinspelning. Då ingen testtjänst används är det upp till testanordnaren själv att rekrytera testanvändare, skapa testet, distribuera test och applikation, samt analysera inspelningar. I denna studie har metoder för att genomföra användartester med sessionsinspelningsverktyg, både med och utan testtjänst, testats och utvärderats. Mobilspelsapplikationen Ruzzle, utvecklad av MAG Interactive, har använts som testobjekt. Detta examensarbete omfattar även hur användarupplevelsen i mobilspel skiljer sig från användarupplevelsen i vanlig mjukvara och det undersöks även hur användarupplevelsen kan analyseras från sessionsinspelningar, d.v.s. hur användarens känslor kan utläsas från inspelad skärm, röst och ansikte. Som en del i examensarbetet har ett arbetsflöde för testning producerats för företaget MAG Interactive. Detta innehåller riktlinjer för genomförandet av användartester med sessionsinspelningsverktyg och vilka delar som är nödvändiga för att genomföra en framgångsrik testprocess. Tabeller med information gällande olika verktyg och testtjänster presenteras för att underlätta valet av sessionsinspelningsverktyg/testtjänst. Nyckelord: Användarupplevelse, Spelupplevelse, Användartester, Sessionsinspelningsverktyg, Testriktlinjer ii Acknowledgements We would like to thank our families and friends for their love and support, through all of our lives which has led up to this moment. A big thank you to our supervisor Camilla who has been an inspiration and given us lots of good feedback and advice. We would also like to thank all of the participating companies who has taken the time to answer all of our questions and letting us test their tools and services. Finally, a big thank you to MAG Interactive, and especially to the Ruzzle team, for giving us this opportunity and making us a part of the team. We have learned so much and gained valuable experiences that we will carry with us for the rest of our lives. It has been super fun and we are so glad for getting the opportunity to get to know all of the amazing people at MAG Interactive, it has truly been a pleasure. Karolin and Veronica, Stockholm May 2015. iii iv Contents 1 Background 1.1 Introduction . . . . . . . 1.2 Motivation . . . . . . . . 1.3 Aim . . . . . . . . . . . 1.3.1 Objectives . . . . 1.4 The Test Object: Ruzzle 1.5 Disposition . . . . . . . 1.6 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Theory 2.1 User Experience . . . . . . . . . . . . . . 2.2 Digital Games . . . . . . . . . . . . . . . 2.2.1 Relevant Genres of Digital Games 2.2.1.1 Social Games . . . . . . 2.2.1.2 Casual Games . . . . . . 2.2.1.3 Mobile Games . . . . . . 2.3 User Testing . . . . . . . . . . . . . . . . 2.3.1 Different Testing Methods . . . . 2.3.2 Testing Methods . . . . . . . . . 2.3.3 Remote User Testing . . . . . . . 2.3.4 Post Test Questionnaire . . . . . 2.3.5 Testing of Digital Games . . . . . 2.3.6 Testing of Mobile Games . . . . . 2.3.7 Test Users . . . . . . . . . . . . . 2.4 Session Recording Tool . . . . . . . . . . 2.4.1 Metrics . . . . . . . . . . . . . . 2.4.2 Facial Reactions . . . . . . . . . . 2.4.3 Audio Recordings . . . . . . . . . 2.5 Workflows for User Testing . . . . . . . . 2.5.1 Remote Usability Testing . . . . 2.5.2 Mobile Applications . . . . . . . 2.5.3 Mobile Games . . . . . . . . . . . 2.5.4 Session Recording Tools . . . . . 3 Approach 3.1 Production of Testing Workflow . . . . 3.2 Materials . . . . . . . . . . . . . . . . 3.3 Research . . . . . . . . . . . . . . . . . 3.4 Evaluation of Session Recording Tools 3.5 Integration . . . . . . . . . . . . . . . . 3.6 Finding Test Users . . . . . . . . . . . 3.7 Creating the Test Plan . . . . . . . . . 3.8 Distribution . . . . . . . . . . . . . . . 3.9 Execution of User Tests . . . . . . . . 3.10 Analysis of Session Recordings . . . . . 3.11 User Feedback on the Test Object . . . v . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 1 1 2 2 3 3 . . . . . . . . . . . . . . . . . . . . . . . 4 4 5 6 7 7 7 8 8 9 10 11 12 14 14 15 16 17 17 18 18 19 19 21 . . . . . . . . . . . 22 22 22 22 23 23 23 24 24 24 25 25 3.12 Final Evaluation of Session Recording Tools and Test Services . . . . . . . . . . 26 4 Results 4.1 Test Services and Session Recording Tools Initially Investigated . 4.2 Tested Session Recording Tools . . . . . . . . . . . . . . . . . . . 4.2.1 Lookback . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 UXCam . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3 PlaytestCloud . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.4 UserTesting . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Distribution and Test Set Up Services . . . . . . . . . . . . . . . . 4.3.1 Beta Family . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 PlaytestCloud . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 UserTesting . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.4 Distribution Without Test Set Up Service . . . . . . . . . 4.4 Comparison of Test Services and Session Recording Tools . . . . . 4.5 Outcome of Test Session Analysis . . . . . . . . . . . . . . . . . . 4.5.1 Questionnaire Results . . . . . . . . . . . . . . . . . . . . 4.5.2 Insights Gained from Screen, Facial and Voice Recordings . 4.5.3 The Test Object: Ruzzle . . . . . . . . . . . . . . . . . . . 4.6 Resulting Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.1 Test Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.2 Test Objective . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.3 Test Users . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.4 Tool and Test Service . . . . . . . . . . . . . . . . . . . . . 4.6.4.1 Session Recording Tool . . . . . . . . . . . . . . . 4.6.4.2 Distribution and Test Set Up . . . . . . . . . . . 4.6.5 Time Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.6 Prepare Test Details . . . . . . . . . . . . . . . . . . . . . 4.6.6.1 Preparations . . . . . . . . . . . . . . . . . . . . 4.6.6.2 Introduction . . . . . . . . . . . . . . . . . . . . . 4.6.6.3 Instructions . . . . . . . . . . . . . . . . . . . . . 4.6.6.4 Screener . . . . . . . . . . . . . . . . . . . . . . . 4.6.6.5 Pre Gameplay Questionnaire . . . . . . . . . . . . 4.6.6.6 Tasks . . . . . . . . . . . . . . . . . . . . . . . . 4.6.6.7 Post Gameplay Questionnaire . . . . . . . . . . . 4.6.7 Perform Test . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.7.1 Pilot Test . . . . . . . . . . . . . . . . . . . . . . 4.6.7.2 Actual User Test . . . . . . . . . . . . . . . . . . 4.6.8 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.9 Summarise Results and Share with the Team . . . . . . . . 5 Discussion 5.1 User Testing of Mobile Games . . . . 5.1.1 Remote Testing . . . . . . . . 5.1.2 Test Users . . . . . . . . . . . 5.1.3 Test Method and Procedure . 5.1.4 Social Aspects . . . . . . . . . 5.1.5 Post Gameplay Questionnaire 5.2 Session Recording Tools . . . . . . . . . . . . . . vi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 27 30 31 33 34 36 37 38 41 43 45 46 46 47 51 51 51 51 52 52 52 53 54 56 57 57 57 58 58 58 59 59 59 60 60 60 61 . . . . . . . 62 62 62 64 66 67 68 69 5.2.1 5.2.2 5.3 5.4 5.5 Evaluation of tools and features . . . . . . . . . . . Grading of SRTs and Test Services . . . . . . . . . 5.2.2.1 Website . . . . . . . . . . . . . . . . . . . 5.2.2.2 Easy to integrate . . . . . . . . . . . . . . 5.2.2.3 Easy to set up test . . . . . . . . . . . . . 5.2.2.4 Customise test . . . . . . . . . . . . . . . 5.2.2.5 Demographics Specification . . . . . . . . 5.2.2.6 Profile Information . . . . . . . . . . . . . 5.2.2.7 Researcher Environment . . . . . . . . . . Analysis of Recordings . . . . . . . . . . . . . . . . . . . . 5.3.1 Voice . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Facial . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Read Emotions . . . . . . . . . . . . . . . . . . . . Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Planning the Test and Writing Instructions . . . . . 5.4.2 Pilot Testing . . . . . . . . . . . . . . . . . . . . . 5.4.3 Deciding on a Session Recording Tool . . . . . . . . 5.4.4 Deciding on Recruitment, Distribution and Test Set Further Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 71 71 72 72 72 72 73 73 74 74 75 75 76 76 77 78 80 81 6 Conclusion 82 References 84 Appendix A - Initial Test Instructions 89 Appendix B - Declaration of Informed Consent 90 Appendix C - Test Procedure: Lookback 91 Appendix D - Test Procedure: UXCam 93 Appendix E - Pre Gameplay Questionnaire 94 Appendix F - Post Gameplay Questionnaire 95 Appendix G - Session Recording Tool Survey 96 Appendix H - Questions for the Pilot Test 97 Appendix I 98 - Final Workflow List of Figures 1 2 3 4 Ruzzle, a mobile game developed by MAG Interactive. . . . . . . . . . . . . . . 2 Illustration of the flow concept developed by Mihaly Csikszentmihalyi [6] (adapted from an illustration by Senia Maymin [32]). . . . . . . . . . . . . . . . . . . . . . 6 Questions answered by different UX research methods (adapted from [48] and [15]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Lookback’s UX researcher environment. . . . . . . . . . . . . . . . . . . . . . . . 32 vii 5 6 7 8 9 10 11 12 13 14 15 UXCam’s UX researcher environment. . . . . . . . . . . . . . . . . . . . . . . . PlaytestCloud’s UX researcher environment. . . . . . . . . . . . . . . . . . . . . UserTesting’s UX researcher environment. . . . . . . . . . . . . . . . . . . . . . Beta Family’s test set up service. . . . . . . . . . . . . . . . . . . . . . . . . . . PlaytestCloud’s test set up service. . . . . . . . . . . . . . . . . . . . . . . . . . UserTesting’s test set up service. . . . . . . . . . . . . . . . . . . . . . . . . . . Age and gender distribution from a total of 26 post gameplay questionnaire respondents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test users preferences to start and stop the recording manually or automatically and where they would have preferred to conduct the user test. There was a total of 15 survey participants, out of which 7 had completed the test session using UXCam and 8 using Lookback. . . . . . . . . . . . . . . . . . . . . . . . . . . . The test users’ preferences regarding preview functionality in the session recording tool. There was a total of 15 survey participants, of which 7 had completed the test session using UXCam and 8 using Lookback. . . . . . . . . . . . . . . . Properties for test set up. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Lookback with the setting to show camera input in the lower right corner set to on. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 35 36 39 41 43 47 49 50 57 70 List of Tables 1 2 3 4 5 6 7 8 9 10 Clarification of the differences between playability and usability according to Becerra and Smith [3]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Benefits and challenges with remote usability testing. . . . . . . . . . . . . . . . Session recording tools included in the initial investigation. . . . . . . . . . . . . Test services which also provide SRTs and were included in the initial investigation. Features available in the UX researcher environment (where recordings can be watched and annotated) for the respective services. . . . . . . . . . . . . . . . . Properties for the SRTs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Features for test set up and distribution services. . . . . . . . . . . . . . . . . . Grading of tools and services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choice of session recording tool. . . . . . . . . . . . . . . . . . . . . . . . . . . . Choice of test set up and distribution service . . . . . . . . . . . . . . . . . . . . viii 6 11 28 29 30 31 38 46 53 55 Abbreviations FACS GEQ GUI HCI ISO NDA NPS PX QA SDK SRT UI UX VIS Facial Acting Coding System Game Experience Questionnaire Graphical User Interface Human Computer Interaction International Standards Organization Non-Disclosure Agreement Net Promoter Score Player Experience Quality Assurance Software Development Kit Session Recording Tool User Interface User Experience Voice Interaction System Definitions Attitudinal methods: The attitudinal approach aims to collect data about “what people say”. Behavioural methods: The behavioral approach aims to answer “what people do”. Dashboard: In this thesis, the term dashboard refers to the session recording tool’s website where the recordings are uploaded and organised and where it is possible to view the user tests. Facial Acting Coding System: Facial Acting Coding System (FACS) is a guide which provides categorisation of the movement of facial muscles by assigning all facial muscles a number which is modified when the muscles move. Gameplay: Gameplay is created by the game developer and the player together. It is the model developed through game rules, interaction with the player, challenges, skills to overcome these challenges, themes and immersion. Graphical User Interface: A User Interface (see UI below) that includes graphical elements such as windows, icons and buttons [10]. Heuristic: A heuristic is a method or procedure where experience is used in order to learn and improve [14]. Heuristic Evaluation: A couple of experts review and examine the UI and decide how well it conforms to recognised usability principles called “heuristics”. Net Promoter Score: Metric for customer loyalty. The primary purpose is to evaluate the loyalty of the customers, and the net promoter score (NPS) is based on the question ”How ix likely is it that you would recommend [company X] to a friend or colleague?” Non-disclosure Agreement A legal contract through which the parties agree to not disclose any information covered by the agreement. This can cover for example confidential material, information or knowledge. Playability: How fun the game is to play, how usable it is and how well the interaction style and plot-quality is. Playability is affected by the quality of the storyline, responsiveness, pace, usability, customisability, control, intensity of interaction, intricacy, strategy, degree of realism and quality of graphics and sound. Player Experience: The experience a player has when playing a game, the UX of games. PX is targeting the player and the interaction between the player and the game. Screener: A pre-defined first question in the test setup, to prevent people who are not in the target group to become a test user and continue with the test. The persons who give the correct answer can continue with the test while the others are denied, the users do not know what the correct answer is beforehand. Session Recording Tool: In this thesis a session recording tool refers to a digital recording tool for mobile devices, using the built in camera of the device. Test Service: In addition to session recording, this term also includes recruitment of test users, test set up and distribution of the test and the application. Some test services also offer analysis of the recordings. Think aloud: Users vocally explain their thoughts while using the product. Quantitative methods: Quantitative methods are good for answering questions like “how many” and “ how much”. Qualitative methods: Qualitative studies collect data about behavior or attitudes through direct observations. User Interface: The interface features through which users interact with the hardware and software of computers and other electronic devices. Usability: How effective, efficient and satisfactory a user can achieve a specific goal in a product. User Experience: The perception a user gets from using a product, or the anticipation of using it. This also includes the experience afterwards and subjective emotions. UX Researcher Environment: In this thesis, the UX researcher environment refers to the session recording tool’s online environment where it is possible to analyse a recording, i.e. watch a video and add annotations. x 1 Background The background chapter contains six sections, where the first two introduces the study 1.1 and motivates why it has been conducted 1.2. The third section 1.3 explains the aim of the study as well as the objectives, and the fourth section 1.4 introduces and describes the test object. Finally, the disposition of the thesis 1.5 as well as the limitations of the study 1.6 are described. 1.1 Introduction This thesis is written as a part of the Master’s program in Media Technology and Engineering at the Department of Science and Technology, Linköping University during spring 2015. The thesis work has been carried out in association with the mobile games company MAG Interactive at their headquarter in Stockholm, Sweden. The aim of this thesis work is to evaluate different tools for testing the user experience (UX) in mobile games and to produce a workflow for how to conduct user testing with session recording tools (SRTs). The workflow will be used as guidelines for MAG Interactive, describing how the process of user testing should be conducted. When performing user testing with a SRT, there are other necessary parts besides the SRT. It is important to find suitable methods for recruiting the test users and creating a test plan. Also, both the application and the test instructions need to be distributed to the test users if conducting remote user testing. Methods for all of these parts will be discussed in the thesis and suitable services will be evaluated. Since proper user tests, including analysis, has to be carried out in order to evaluate the tools, it has been decided that the user tests should also generate valuable feedback about the game which can be used in future iterations. 1.2 Motivation Currently, there is little research available regarding the use of SRTs in UX evaluation of mobile games. Therefore, the aim of this thesis work is to compare and evaluate easily accessible tools and produce guidelines which can be applied during user testing with these tools. Mobile games development is a rapidly growing and changing area, but it is difficult to make accurate evaluations of the UX and to perform user tests which will yield in reliable results. It is generally preferable to test the UX in the player’s natural environment. However, this usually means that the result can only be based on subsequent feedback, which can be problematic due to memory limitations of the player. Other factors, like personal preferences and difficulties explaining the experience in detail, can also affect the result. It is possible to conduct observational tests in focus groups (a moderated test session where a group of players discuss the game), but this involves placing the player in an unnatural environment which might affect the performance and behavior of the player. Additionally, focus groups typically only test new players and not old ones which might be desirable. Another method to test the UX, and the player’s understanding of the interface, is to use mobile SRTs. These tools can be used to record taps on a touch display along with everything the user sees on the screen, and in some cases even facial expressions and sound. The use of SRTs allows for tests to be performed in the player’s natural environment. 1.3 Aim The aim of this thesis project is to produce a workflow for how to conduct user tests of mobile games using a SRT. Various mobile SRTs will be evaluated and compared against each other, 1 thence the top candidates will be integrated into the mobile game application Ruzzle. User tests will be carried out in order to investigate the on-boarding process of the game, as well as to establish which tool (if any) is the most appropriate in the context. The resulting workflow will contain general guidelines for how to conduct user testing. 1.3.1 Objectives The thesis aims to answer the following questions: • How can UX be tested using mobile SRTs? • Why is remote session recording a suitable approach and which available tools are the most appropriate for mobile games? • How can recorded test data from SRTs be interpreted into information that can be used to address UX and usability issues? 1.4 The Test Object: Ruzzle The test object, which the users have tested in the user tests during this study, is a mobile game called Ruzzle. Ruzzle is a social player vs. player word game developed by the Swedish mobile games company MAG Interactive. The players can choose to challenge friends or strangers and the board is constituted of 16 letters in a 4x4 grid (see figure 1). The game was inspired by the board game Boggle. The aim of the game is to find as many words as possible in two minutes. A word is formed by dragging a finger between adjacent letters on the board, where one word has to consist of a minimum of two letters. It is not possible to use the same letter-box more than one time in a word, and each word will only be awarded points once per round. One game is constituted of three rounds, each of the rounds are two minutes in duration and the total score determines the winner. Additionally, the different letters awards different points and the goal is to collect more points than your opponent before the game finishes. As you gain experience your level will increase. There is also a tournament mode where the player, after reaching level Figure 1: Ruzzle, a mobile game developed by six, takes part in weekly tournaments. The MAG Interactive. goal in the tournaments is to get an as high score as possible, and every player can play as many rounds as they like. Every week, the player competes against 19 opponents with the scores from the best round played. Ruzzle requires a network connection and is available on iOS, Android, Windows phone and Facebook. The test 2 object is an established game which has been downloaded around 60 million times, according to MAG Interactive’s observations in May 2015. 1.5 Disposition In order to gain a profound understanding of what is needed when evaluating the UX of mobile games, this report begins with a theoretical part in chapter 2. This chapter contains various theories and definitions, ranging from UX to Player Experience (PX) and finally addresses mobile games and testing workflows. “Approach” in section 3 introduces the method and steps for conducting the study, while the chapter “Results” (section 4) demonstrates the results of the study. In chapter 5, “Discussion”, the results are discussed and evaluated based on the theoretical section and also based on knowledge gained from conducting the user tests. Finally, chapter 6, “Conclusion”, aims to answer the initial objectives presented in the introduction (chapter 1.1). 1.6 Limitations This research aims to find a testing workflow suitable for the specific target game and similar games, hence the aim is not to declare an entirely general method that will work for every type of game. The thesis project is conducted during the course of 20 weeks, which is the time limit for the master’s thesis. When this study was initiated, several of the relevant testing tools were only available for iOS, hence iOS became the choice of platform for this study. Since session recording of user testing using mobile devices is a relatively new field, especially within the area of mobile games, many of the investigated test services and SRTs are under development. They are frequently being updated and new features and tools are becoming available. The tables and other details regarding the tools and test services in this report has been collected with the information available when writing the report. Therefore, we reserve the right for eventual printing errors and outdated information. 3 2 Theory The theoretical chapter consists of five sections. Section 2.1 explains what UX is, how it can be defined and what separates UX from usability. Section 2.2 defines what a digital game is and how the UX in games differentiates from regular software. Section 2.2.1.3 focuses on UX in mobile games and what separates it from UX in other digital games. Section 2.3 is about user testing; what it is and what different methods that exist including what parts they consist of. Section 2.4 covers SRTs; what it is and how it can be used in UX testing. Finally section 2.5 deals with workflows and what guidelines to follow when conducting user testing. 2.1 User Experience The International Organization for Standardization (ISO) defines UX as: ”a person’s perceptions and responses resulting from the use and/or anticipated use of a product, system or service” [19]. This includes the user’s emotions, beliefs, preferences, perceptions, physical and psychological responses, behaviours and accomplishments before, during and after use of the product. UX is also a consequence of interaction with a product or system, and the internal and physical state of the user as a result from prior experiences, attitudes, skills, personality and context of use [19]. According to the ISO, usability can be defined as: “The extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use.” (ISO 9241-11) [56]. The usability criterias can be used to assess aspects of UX, for example functionality, system performance, interactivity, assistance from the system, etc [19]. Usability refers to implementing an application in a way so that the user can perform a task effectively and efficiently, without having to put too much effort into it. The result should be of good quality and the time it takes to perform the task should be minimised, i.e the main focus is productivity [5]. Usability Partners [56] describes UX as a subjective extension of usability, which focuses on how satisfied the user is. In order to evaluate the UX it is important to go beyond the aspect of usability and evaluate the whole experience. UX is an important part of human computer interaction (HCI), but initially the term was not associated with games development [2]. Nowadays, the field of HCI and game research are learning from each other and HCI UX evaluation methods are used in games development for improving the UX of the game [2]. Due to the ”intentional challenge and emotion” in games, many HCI methods can not be used in the same way as when evaluating the usability of productivity applications [29]. Creating a good (or bad) UX in games depends on the aspects: immersion, fun, enjoyment, usability, aesthetic, pleasure and flow. It is also important how all of these aspects influence the user [5]. Therefore, these factors have to be considered when evaluating the UX. In HCI, focus is often on the end result of the experience. The current UX evaluation methods often offer insights about the experience, but not objective knowledge. As stated by Calvillo et. al; “experience is both the process and outcome of the interaction” [5]. When interacting with an application, the user should feel that all “elements of the experience acted in symphony” [5], which will give rise to a positive experience. Hence Calvillo et al. [5] means that by evaluating the elements present in the interaction process, also the experience can be evaluated. An experience is both personal and subjective, but from a scientific point of view an evaluation of the general UX is needed. Even though the experience is personal, it is not often unique. It is possible to share an experience with others and to empathise with it. Even if an action is performed by an individual and gives rise to a personal experience, the same process of interaction when completing the task is used by many individuals. In the same way, the experience is many times the same or similar [5] which makes it possible to get an 4 idea about the general perception of the experience by observing or asking some of the users. 2.2 Digital Games All games available on digital platforms such as PCs, consoles or mobile devices are digital games. A digital game can be distributed both online and offline, and can be available as both singleplayer and multiplayer. Games can be described as an activity one is participating in for fun. Entertainment can, however, be difficult to define since it is objective and depends on what the player is experiencing as fun [20]. For a game to be fun, it needs to be motivating, have rules and be engaging. It also needs to have a purpose and a feeling of presence. Rules can be displayed by success and failure rates in the graphical user interface (GUI), or be made up during the game process. Isbister states that specific usability measures are necessary for digital games. This is due to the fact that a game is a complex software with different goals and area of use, compared to traditional task-orientated software, which currently most usability evaluation methods are targeting [17] (page 8). Researchers have different point of views, standpoints, methods and terminologies for developing good UX in games. According to Isbister et al. [17] (page 5), there is a difference in testing the UX and the player experience (PX). The PX is the experience a player has during game play, i.e. the UX of the game. Testing the UX covers what it is like to interact with the software, including how engaging the experience is, regardless of the end goals. The focus of PX testing is to determine if the game is enjoyable and fun to play, but also to find out where the players might encounter problems and get stuck. The playability determines the quality of the game, i.e. playability is a kind of usability measurement for games, including the UX quality and how enjoyable the game is. The definition of playability is: “.. the degree to which a game is fun to play and usable, with an emphasis on the interaction style and plot-quality of the game; the quality of the gameplay..” [55]. Playability is affected by the quality of the storyline, responsiveness, pace, usability, customizability, control, intensity of interaction, intricacy, strategy, degree of realism and quality of graphics and sound. Since games are supposed to give rich and meaningful experiences, where also the gamers personal feelings are involved, the study of PX requires additional methods besides the usability methods used in the field of HCI. When playing a game, the player continuously evaluates his/her own performance in the game. This can be done both consciously and subconsciously. Is the player able to perform, meet challenges and attain the desired goals? After reaching the goals, the player will experience positive feelings and perceive him- or herself to be competent [51]. Immersion, fun, presence, involvement, engagement and flow are concepts that has been used to describe PX, and these terms are often broadly defined. The concepts can be related to various psychological compartments, i.e the concepts are overlapping each other making it more difficult to measure and understand them. By using psychologically valid metrics when evaluating games, it makes it easier to measure the experience [51]. Becerra from AnswerLab and Smith from Electronic Arts state that “we don’t use games, we play them” [3]. Therefore, the PX should be measured from a playability, and not a usability, perspective (see table 1). Becerra and Smith say that if a game would be usable, it would be boring, since the players would understand everything instantly. Without the challenge, a game would not be exciting. However, menus, navigation and such still have to be usable in order to make it possible to play the game. Becerra and Smith explain that there are two different types of motivations for playing a game; task oriented and fun orientated. 5 Playability Usability Challenges are good Challenges are bad Surprising or unclear elements can be positive and enjoyable Localisation and understanding of all main elements should be instant Motivation is focused on the fun Motivation is focused on the tasks Fun is a big factor of success Ease of use is a big factor of success Table 1: Clarification of the differences between playability and usability according to Becerra and Smith [3]. Csikszentmihalyi came up with the concept “Flow” [6], displayed in figure 2. It is about finding the balance between boredom and anxiety and can be applied on mobile games. If the player has good gaming skills, and the game is found to be too easy, it also gets boring. If the player on the other hand has bad gaming skills, the game can be too challenging which leads to anxiety. According to Lemay et al. [28], the flow-model can help designers grasp the answer to a fundamental question about their work: “What motivates people to play?”. Lazzaro studied why some players prefer action games, while others prefer word puzzles. The answer is that people play different games for different reasons, they have various goals and motivations. Lazzaro defined the Four keys of fun [26] which represent the four most important feelings that Figure 2: Illustration of the flow concept devela game can generate. The first key is called oped by Mihaly Csikszentmihalyi [6] (adapted hard fun and acts on the emotions frustration from an illustration by Senia Maymin [32]). and relief, while the second key; easy fun, focuses on curiosity and an easy going approach. The next key is called serious fun which provides relaxation and excitement. The last key, people fun, provides amusement and is enjoyable since it focuses on social bonding and interactions with other people. Lazzaro defines PX as how the player interaction creates emotion, and how well the game provides and satisfies the sort of fun that the players want [27]. The most popular games utilize all of these emotions in order to draw attention and motivate the player. 2.2.1 Relevant Genres of Digital Games Digital games can be divided into different kinds of genres. Below follows information about genres of digital games which are of relevance in this study. 6 2.2.1.1 Social Games A social game is where more than one person is simultaneously engaged actively in a game [18]. When evaluating the UX of a game, the social aspects of the experience also has to be taken into account. Isbister states that most games, both singleplayer and multiplayer, are usually played in a social context. It has been concluded that social play leads to more positive effects than solo play. Social games also provide more competence, while requiring less effort and inducing less frustration [18]. Isbister states that in order to thoroughly understand the user experience of games it is important to also consider the social nature of it. By adding people to the play session the player/players will have an entirely different end experience. Isbister also points out that social games should be tested in a social context [18]. 2.2.1.2 Casual Games Furtugno describes casual games as games that can be played by anyone regardless of age or gender [8]. They are available on all kinds of platforms, even game consoles (such as Nintento Wii) which has previously been used mainly by hardcore gamers. The success of casual games has made games reach a wider audience, but it has also generated new design issues. Since there is no restricted target audience, designing a game that appeals to everyone can be difficult. There are various definitions of casual games in different parts of the games industry, depending on the game content, game medium and play length or the current market. All of these are factors in the design of casual games, but Furtugno claims that the most important factor is to start thinking about who is intended to play the game. Proceeding from this, decisions and assumptions can be made regarding expectations, experiences and what they consider intuitive, challenging and fun. 2.2.1.3 Mobile Games A mobile game is a digital game and as the name of the genre implies, it is played on a specific platform. Gaming has become mainstream, not only hardcore gamers are playing games as leisure activities anymore. The use of smartphones and tablets have made games more accessible and also made them available to a low cost, which has increased the number of so called casual players (players who occasionally or frequently plays easily accessible and easy to play games just for fun or in order to relax). Game designers are standing before the challenge to create appealing, accessible and usable games for players who are not typical hardcore gamers [28]. It is important to consider the general preferences of the targeted audience, Lemay investigated if the typical hardcore gamer and the casual gamer experience games differently and concluded that different audiences are drawn to games. There are no universal guidelines for what constitutes a good gaming experience that can be applied to all groups of players [28]. When analysing a mobile game, it is important to find a balance between flow and challenge [52]. The problem is to find a good difficulty level so that the game is not too easy since that usually becomes boring, but also not too difficult since that becomes frustrating for the player. Regardless of how good the graphics quality is in the game, these kind of problems can lower the quality of the entire PX. When analysing a game it is also necessary to examine and find a balance between the playability, mobility and usability, as well as mobile device and touch screen properties. Korhonen discusses that usability of mobile games should be evaluated differently than other digital games since they are mobile and are used in different contexts than other digital games. Mobile games can also be used at various locations with differing lightning conditions and noise levels etc. and the player might need to focus their attention on other 7 things in their surroundings from time to time. Additionally, it is not possible to measure playability with the same heuristics as those developed for task-orientated software, since it is a game many different parameters, paths and stories are created by each player and hence the scenarios will differ from player to player [21]. Ülger also states that mobile games may encounter several playability issues. One of the issues is handling interruptions, for example when receiving a phone call during a game session. Additionally, control of sounds and environment can be restricted on the device. Another difficulty can be to help the player understand the game. Having a simpler user interface (UI) than in PC or video games is necessary since the screen resolution and size is more restricted. The character status (if the game contains a character) and the game goals need to be evidently visible to the player. Touchscreen displays also present some specific issues, such as distribution of game items on the screen. The distribution of items should suit both right-handed and left-handed people. Since the mobile device is portable and can be used everywhere, the environment, noise and light may vary. Distinctive is also the fact that traditional mobile devices have rather small screens, insufficient audio capabilities, limited processing power and battery limitations [52]. 2.3 User Testing User testing, and measurements of the user experience, is conducted in order to improve the UX of existing UIs [1]. It is also conducted throughout the development process in order to collect feedback on concept ideas, game mechanics, design decisions, etc. which can be used to ensure that a project is heading in the right direction. 2.3.1 Different Testing Methods There is a wide range of UX research methods, some are older and some are more modern. Rohrer states that it is nearly always best to use a combination of multiple methods, providing different insights. The methods can be separated into behavioural vs. attitudinal, and quantitative vs. qualitative (see figure 3). The attitudinal approach aims to collect data about “what people say”, while the behavioral approach aims to answer “what people do”. Usability studies are in-between the attitudinal and behavioral methods, mixing the best from both areas by combining self-reported information with behavioral observations. According to Rohrer, it is generally recommended to go closer to the behavioral side when performing usability studies [48]. 8 Figure 3: Questions answered by different UX research methods (adapted from [48] and [15]). Qualitative studies aim to collect data about behavior or attitudes through direct observations. Qualitative methods are recommended to answer why an issue occur and how to fix it [48]. In quantitative studies, data is gathered indirectly using measurements, analytics tools or surveys. Quantitative methods are better for answering questions like “how many” and “ how much”. These methods can be used to grade issues according to severity, by concluding which issues are the most crucial ones. 2.3.2 Testing Methods “When studying natural use of the product, the goal is to minimise interference from the study in order to understand behavior or attitudes as close to reality as possible” [48]. The natural use approach provides a greater validity to the study, but it also offers less control for the facilitator. If the main focus of the study is to investigate what people do, why they do it and how to fix potential problems; the study will focus on qualitative behavioral methods, observing the players in their natural environment and their everyday use. Heuristic evaluation is a method that consists of having a few experts review and examine a UI and thereafter decide how well it conforms to recognised usability principles or guidelines called “heuristics” [39]. Observational methods for usability testing are used to gather information about the user’s behavior. This can be face and body expressions, focus level, preferences, actions while performing tasks as well as when encountering errors, and also opinions and comments from the test participant. There are three different observational testing approaches; test monitoring, direct recording and think-aloud. In test monitoring, the observer directly observes, records and comments on the participant’s behavior throughout a test session. Typically, the practitioner follows a data sheet with guidelines containing checklists of tasks, test time, comments from the practitioner and the participant, as well as an explanatory statement of the event. It is also possible to use one-way mirrors which means that the test user is observed through a one-way mirror from an adjacent room. The test monitoring approach is the most common method where only a few test users are needed. Direct recording is suitable when having many test participants and when there is a need to eliminate potential bias from the test observer. There are different recording alternatives available, such as audio and video taping and screen recording. In the 9 think-aloud approach, the respondents are asked to verbally explain what they are doing while using the product, and they are reminded to regularly verbalise their thoughts throughout the test session [53]. There is also a usability evaluation method called focus groups, where a group of users discuss a product while being moderated by a test facilitator. The benefits of this method is that the users can discuss their experiences and results, which can lead to many useful ideas and feedback. This is also a cheap testing method, especially if being carried out before developing a product. The drawback is that it is an unnatural environment for the test users and an experience in a focus group could usually not be compared to a real experience in the users everyday lives[9]. User tests can be carried out in different contexts and environments. There is natural use, where the test participant is carrying out the test in his or her normal environment, like they would during everyday activity. There are also scripted user tests, where the participant is following a script. In some studies, a mixture of different contexts is used, which is called a hybrid. A testing method which can be conducted from the user’s natural environment is called remote usability testing, which will be further discussed below in section 2.3.3. According to Krug and Sharon [24], it is important to ask the whole team (developers, designers, UX researchers etc.) what questions they want to have answered as a result of the testing process. They also point out that these questions are not necessarily the ones being asked to the users but rather what you want to know as a result of the study. They suggest that tests should be planned together with the entire team. The whole team should be involved in creating the test, choosing participants and they should be encouraged to provide feedback regarding the test process. 2.3.3 Remote User Testing An alternative to regular observation of test sessions is remote user testing. The main difference between remote user testing and regular user testing, is that the test participants and the test facilitators are at different physical locations. This approach makes it possible for the participants to carry out the test in their everyday environment, without being distracted by the facilitator or disturbing equipment. The test process is perceived to be more natural for the test user, while the UX expert can still watch and analyse the test procedure from a remote location [49]. In a webinar (web-based seminar) at UserTesting.com [24], Krug and Sharon recommend performing a pilot test on one person in order to test the test before using it for user testing. This should be done in order to make sure that everything is working, and that the users will understand the tasks and the questions given to them in the test. Sharon insists on the importance of pilot testing and states that he has never regretted doing a pilot test, only regretted not doing one. One drawback with the remote testing method is that the possibility to interpret the user’s body language is essentially lost. The possibility to ask follow up questions in the middle of the test process is also ruled out (if having an unmoderated test session). On the other hand, remote testing does not require any traveling, neither for the facilitators nor for the participants. This saves time and also makes it possible to have geographically dispersed user groups without requiring a larger budget. Some advantages with remote user testing are that tests can be carried out with a limited budget, strict time constraints and without the use of additional testing facilities. It also offers a faster and easier recruitment process. The test group does not have to be collocated, and the test can be performed in a natural environment where the test participants feel comfortable [49]. There are both moderated and unmoderated remote user testing. In a moderated test session, the facilitator and the user are both online when the test is being performed and they can interact with each other during the test process. When using the unmoderated test method, there is no real-time interaction between the test 10 participants and the facilitators, and the test can be carried out asynchronously. Unmoderated user test studies provide flexibility, and the user can complete the test session when they have time and want to go through with it. Another advantage is that all test users can perform the test simultaneously, since the test data will be analysed retrospectively [49]. When carrying out unmoderated remote user testing, it is extra important with clear and specific instructions. The test facilitator can not assume that the test user will understand how everything works without thorough and easily interpretable instructions. When performing moderated user tests, the moderator can make sure that the user stays on the task, hence unmoderated tests place greater demands on instructions and preparations [24]. Usability.gov [54] summarises the main benefits and challenges with remote usability testing in table 2. Benefits Challenges • Eliminates the need for a lab as well as the need to place the test participant in an unnatural environment, which is usually the case when performing tests in a lab. • Security could be compromised if the testing information is sensitive (since data might leak out). • Restricted view of user’s body language. • Supports a larger group of test users. • Technical problems are likely if the test users: • Typically, less expensive than in-lab testing. – Are not comfortable with the technology (which can be likely if the target group is not gamers). • Possible to run multiple tests simultaneously. – Have conflicting software or equipment installed on their device. • Unmoderated testing allows for the test user to perform the test at any time suitable, increasing the possibility for all participants to complete the test. – Have a slow Internet connection. • Usually possible to test a larger number of users than in a lab environment. Table 2: Benefits and challenges with remote usability testing. 2.3.4 Post Test Questionnaire A post test questionnaire is a survey consisting of questions that the user should answer after testing the product. When performing online surveys, Nielsen recommends short and well written surveys which are easy to answer. This will yield in a higher response rate while avoiding misleading results [43]. According to Nielsen, the most important aspect regarding questionnaires is to maximise the response rate. Low response rates can be misleading since there is a chance they are based on a biased group of very committed users, hence the result could not be viewed as a representation of how most users experience it. In order to get as many people as possible to respond to the surveys, they should be ”quick and painless” [43]. The best 11 way to achieve this is, according to Nielsen, to reduce the number of questions. The questions also have to be easy to understand and the survey easy to operate, in order to avoid misleading answers due to misunderstandings. The questions should only address the core needs, Nielsen refers to Reichheld’s article ”The One Number You Need to Grow” where it is stated that only one question is needed in order to get insight into the customer-satisfaction. From this question the Net Promoter Score (NPS) can be calculated. NPS is a metric for customer loyalty which was developed by Satmetrix, Bain & Company and Fred Reichheld [38]. The primary purpose of the NPS methodology is not to evaluate the customers satisfaction with the product, but to evaluate the loyalty of the customers towards the brand or the company. Reichheld researched the link between survey responses and actual customer behaviour which resulted in one direct question ”How likely is it that you would recommend [company X] to a friend or colleague?” [47] (company X can be replaced by for example a product or a game). Based on their replies, the respondents are divided into different categories where some are considered to be beneficial for the company and some can affect the brand negatively [37]. Nielsen also suggests that an alternative approach to having all users participate in the same short survey, is to ask different questions to different users. In this way more questions can be included in the study while it is still being kept short for the individual survey taker. This might provide more insight into the UX [43]. Since digital games can differ a lot regarding application area, target players and game experiences, there is no universal measurement approach which fits all. This has lead to an absence of coherent tools to measure entertainment experiences in a reliable manner [16]. In an attempt to develop a generally applicable measurement, IJsselsteijn et al. developed the Game Experience Questionnaire (GEQ). GEQ is a post play questionnaire that takes into consideration seven dimensions of player experience: sensory and imaginative immersion, tension, competence, flow, negative affect, positive affect and challenge [16]. 2.3.5 Testing of Digital Games User testing of games can be performed using a range of different methods. They can be small one-to-one sessions or larger test sessions involving groups of people. The procedure can be everything from think-aloud, questionnaires and post-play interviews to an automated recording system which collects data about the player. Some user tests are performed in testing labs with special equipment, where the players can be observed and their actions and reactions documented. This is however a rather expensive procedure, which only makes it available to bigger companies with a large testing budget [4]. The data gathered during the test sessions focuses on both usability and playability issues. By observing the player playing the game, not only usability issues like problems with navigation, menus and level difficulties etc. can be deduced, it can also be established if the player had fun, and if so when and where in the game, as well as how fun it was [4]. Reviews and game forums relying on user feedback are common post-launch evaluation methods in games development. However, these do typically not generate detailed information and it can be difficult to deduce the cause of problems [4]. Brown claims that the most powerful UX evaluation tools “offer insight not just into the user experience but into the exact elements influencing the experience“ [4]. The disadvantages of user testing of digital games are that it takes time and costs money, while the reward is the ability to create games that the players want to play. This generates more players and more money for the business [35]. Even though digital games are different from medical devices, they are both using a UI and hence similar testing methods can be used. Oliveira et al. [45] studied usability testing of a respiratory UI in a medical environment, and investigated the possibility to use computer screen and facial expression recordings in usability testing of the UI. They say that a suggestion is to use computer screen recordings, instead of 12 other qualitative methods like interviews and questionnaires, when performing UX evaluation of clinical information systems. This is because it provides further objectivity while capturing UX problems in real-time. Similarly, the use of questionnaires and post-play interviews have been questioned in the gaming industry. Since questionnaires and post-play interviews are carried out after the game experience, both methods have been criticised for not being able to capture the user state during their engagement with the game. After the game, the test participants have to focus their attention on the evaluation, instead of the experience he or she just had and is supposed to evaluate [34]. Another source of criticism is the fact that emotional experiences can be difficult to describe in words since they are not primarily based on spoken or written language. The use of screen recording tools alone will however not give enough insight into the UX, the user’s emotions also have to be taken into account. In user testing of games, usability issues can be detected from analysing screen recordings, but the player’s emotions when playing the game can not be deduced only from this. To reveal the player’s feelings; physiological measurements can be recorded, survey methods can be used or facial recordings can be collected. Collecting physiological measurements will provide high resolution data, but requires complex and expensive technology and also a high level of expertise. As already concluded, surveys provide limited information. Recordings of facial expressions on the other hand, can be collected even with a smaller budget [45] and can provide clues of the players true emotions. As Darwin puts it in The Expression of the Emotions in Man and Animals: “They [the movements of expressions in the face and body] reveal the thoughts and intentions of others more truly than do words, which may be falsified.” [7]. Displaying facial expressions is a visible and distinctive way of communicating emotions [45] and by observing the users facial expressions during the test session, additional information about the UX can be deduced. Oliveira et al. concludes in their study that the combination of (computer) screen recordings and recordings of facial expressions can improve the evaluation of user interaction tests. Their study aims to improve user assessment during usability tests in order to improve efficiency and effectiveness in health information systems. As stated by Oliveira et al. the combination of screen recordings and facial expression recordings can be used to determine the emotions of the test participator [45]. Since an essential part of a good gaming experience depends on the emotional response of the player (see section 2.2), one can assume that the approach could work well also in user testing of games. Isbister states that the games development process consists of five stages [18] (page 347) which are tested and evaluated using different usability methods and tools: 1. The first step is before the project has even started. The product will not be tested, but it is important to make time for user research in the development process and decide on what research to conduct. It is also necessary to add time to the planning schedule for addressing the usability issues discovered. A testing workflow needs to be developed, making it possible for everyone in the team to test consistently. 2. The second step is the concept phase. Once the target audience, genre and platform have been identified, it is easier to make decisions about the usability testing and specifically heuristic evaluation. The aim is also to design a game which is fun and to recognise which social and psychological aspects that matter. 3. In the third step, called the pre-production stage, it might be desirable to use expert evaluation in the team, to ensure that the UX objectives are fulfilled. 4. The fourth step is the production phase, where mainly classic usability methods, such as think-aloud, are used. If time and resources are available, physiological measures can be 13 used to verify the emotions evoked by the game. 5. In the post-production phase, continuous post-launch and playability testing will be needed if updates and new features are released. 2.3.6 Testing of Mobile Games According to Isbister, it is preferable to perform user tests on mobile or handheld devices in a setting where the “players might engage in game play embedded in their daily lives” [18]. It is also important to test the game on a demographically appropriate group. For example, the people in the test group should have the same relationship to each other, as the people in the target group has to each other in their natural environment. Both environmental and contextual factors are important for achieving a natural gaming experience also during the testing process [18]. Testing the experience that a mobile game provides to the players can not be done using a computer [12]. Dragging fingers on a touchscreen display generates a different UX compared to computer mouse clicks. It is important to test the game on a real device in order to get realistic performance results. Real occurring events such as disruption, battery consumption, memory restrictions or charging of the mobile device have a large impact on the overall UX and playability [13]. Therefore, the best way to understand the UX of a mobile game is to test on a real device and not on a simulator. It is also important to test on many different devices, since most of the end users do not use the same high quality products as the developing company might have access to. Isbister states that video recording and screen capture makes it possible to do rich analyses of gameplay [18]. Inokon, producer at a mobile games company, was interviewed by Isbister and Schaffer [17] (page 161) and he highly recommends usability testing and describes it as ”removing the blinders”. Often the game developers get so close to the game that they lose the player perspective. The usability tests can in many cases reveal issues previously unnoticed, and it might be difficult for the developer to recognise these after spending time and energy working on the game, but it can be these adjustments that help your game to become a market success. Inokon also explains that “usability can be lethal to a project if not used properly”, and lists his most important pieces of advice [17]: • Take time to thoroughly observe the players and grasp the context of the notes. • Make time in the schedule for solving emerged issues. • Not every issue will be addressed, choose to fix the most important changes for the game vision. • A game is not a snapshot, the game is constantly changing so don’t procrastinate the evaluation for too long. Test it when the game is in alpha stage and functional. 2.3.7 Test Users Isbister investigates the question ”If the developers are also players, why can they not test the games themselves?”, and the answer is: because developers are not typical gamers [17] (page 350). They already have a previous experience and knowledge of the game, resulting in biased opinions. It is necessary to test with both professional game testers and end users in order to get a valid player perspective. Recruiting users is the most difficult but also the most important part of user testing [42]. It is important that the game is tested by test users with a similar demography as the users who will play the game, in order to guarantee that it meets their 14 requirements. Despite this, not many companies have a procedure for regularly gathering test users. The traditional ways of finding test users are to recruit colleagues, family members and friends, or persuade random people in the streets or cafés to participate. It is also possible to reach out to players through online communities or social media. Another approach is to use an independent, specialised recruitment agency that finds test users for you, or to use an online test service that provides a community of test users. There are online services which offer both SRTs and user testing (such as PlaytestCloud which is further discussed below in section 4.2.3). The criterias for the test users should match the target audience regarding demographics such as age, gender, game expertise (casual or hardcore) and income [42]. However, Krug and Sharon insist that it is not a requirement to test on the exact target group, since valuable insights will be gained anyway [24]. Test users who are recruited, except for colleagues or family members, are often offered payment as compensation [42]. A common problem when performing user tests is that not all test users show up. According to Nielsen, the average no-show rate is 11 %, which is almost one in nine testers [42]. The noshow rates can be higher for remote user test studies, than for studies carried out in person. Additionally, unmoderated sessions may vary greatly in quality, therefore Schade recommends test facilitators to recruit some extra test users just in case [49]. Test sessions should be carried out throughout the development process, and the number of test users in one session differs depending on the purpose. Nielsen claims that it is enough to test with only five users in one session when performing qualitative usability testing [41]. Using more than five users is a waste of resources, it is better to run as many minor tests as possible with few test users. Nielsen and Landauer studied the number of usability problems found through usability testing [41], and discovered that about 80% of the usability problems can be found using five test users from the same target group. Testing the quality metrics in quantitative testing of a game, such as learning time, efficiency, memorability, user errors and satisfaction, requires four times as many users as qualitative user testing does [44]. Nielsen concludes that 20 is the optimal number of test users for quantitative studies. The result from the 2013 UX Industry Survey [59] shows that most companies (40% of the respondents) uses an average of 6-10 test users per study while the report from 2014 [58] declares that 1-5 users is a more common test group size. According to the result from the 2014 survey, now 40% of the participating companies uses test groups consisting of 1-5 users, and the percentage who uses 6-10 users has slightly decreased. However, according to Krug and Nielsen, testing on only one person is better than not user testing at all. Krug claims that even a bad test with the wrong test user will reveal possibilities to make important improvements [23] [41]. 2.4 Session Recording Tool A Session Recording Tool (SRT) is a software which records the participant’s screen while using the product during a test session. This can be done remotely or at a specific location, for example from the user’s home or at the office. Session recording methods have been used for many years, traditionally using additional equipment (such as cameras and sleds) for recording the user. However, a new type of session recording, where software is used, has become available and popular in recent years. This software is called a SRT and can be integrated into an application, the UX can then be recorded directly in the mobile device or computer without additional equipment. There are a wide range of SRTs, each with their own benefits and drawbacks. Since the aim of this thesis is to evaluate the UX of mobile games, the focus will be on SRT’s with support for mobile applications. Some of the tools support facial and 15 audio recordings, which makes it possible to gather more data about the user experience while the user is using the application. There are some tools which store the recordings on an online dashboard, where multiple researchers can view the recordings and navigate along the timeline. Often also an annotation feature is available, where it is possible for the UX researcher to comment on specific parts of the recording, for example if the test user experiences any difficulties in the application or if the test user’s reactions are extra distinctive in a specific part or view. This makes it possible to perform an extensive analysis which can be used in order to improve the UX. SRTs are especially handy when conducting remote user tests, where it is often desirable to record the test session. The use of additional equipment, such as separate cameras and microphones, places high demands on the test user when participating in remote test sessions. Reducing the need for additional equipment and software will facilitate the test process and make it easier and more natural for the test user, this makes mobile SRTs a preferable alternative. It is also possible to hand out a questionnaire or carry out interviews after the test session. The user will then answer questions about their experience during the test session, but this method does not reveal much of the real UX and the user’s reactions to the product. A combination of a questionnaire together with a SRT can provide more information, this enables the researcher to compare the experience as stated by the user in the questionnaire with the result from the analysis of the recordings. Some of the test services which were investigated in conjunction with the search for SRTs, provide a full test service including finding test users and performing the tests using SRTs. Some services are also carrying out analyses of the test data, and summarise the results into a report which is submitted to the client who ordered the UX evaluation of their product. Brown claims that the most powerful UX evaluation tools are the ones which also gives insight into which exact elements that are influencing the UX of the game, as well as how the UX is perceived [4]. By using SRTs, the UX researcher will get a clear insight into what is happening in the application and how the test user reacts. Important aspects for good gameplay are both challenges and emotions [51]. However, the emotions of a player can only be derived from observations or by questioning the player. SRTs which only records the screen, and not the facial expressions of the player, will have less information about the emotions of the player. Krug and Sharon [24] however, claim that when watching recordings from remote usability tests they do not need to know what the user says and neither do they need to see their faces. All they need i order to be able to evaluate the user experience is their tone of voice. Krug and Shannon also give some advice regarding what to look for when choosing a tool; metrics, videos, turn around times and the possibility to flash forward recordings [24]. UserZoom is a platform for agile usability testing and UX analytics who are using remote session recording to collect qualitative and quantitative data. In a case study [61] performed by UserZoom, they used session recordings to evaluate the UX of a website and their conclusion was that “UX practitioners have been using remote usability testing mostly for collecting valuable quantitative data, something that was not possible to do in a lab. Now, combining remote unmoderated usability testing with videos of the user sessions gives you the best of both worlds. With this approach companies can gather the necessary quantitative data as well as qualitative data to better optimise online user experience on their site.” 2.4.1 Metrics Graham McAllister at Player Research divides the player metrics into behaviour, rationale, perception and experience. Behaviour refers to what the player did during the game play, rationale refers to why they did it. Perception comprises what the player think happened and experience refers to how the player felt. Aspects to consider are what did the player do, why 16 did the player do it, how did the player perceive it and how did it make the player feel [33]. 2.4.2 Facial Reactions A common way to read facial reactions is the classical way where two or more humans interact and intuition and experience is used to read and interpret the expressions of the other participants. In user testing, this has until recently been done face to face, but now it is becoming increasingly more common to communicate online using video calls. There are also new systems available that interpret facial reactions automatically. Oliveira et. al [45] and Lankes et. al [25] mentions the Facial Action Coding System (FACS). FACS is a guide which provides categorisation of the movement of facial muscles by assigning all facial muscles a number which is modified when the muscles move. This can be used to categorise the movements into facial expressions corresponding to the basic emotions; happiness, surprise, anger, contempt (some uncertainty), disgust, sadness and fear [25]. With FACS the facial expressions can be measured objectively and hence the test participants’ emotions can be deduced. However, using FACS is time consuming, and prone to bias since the analyser is subjective and thus different results can be achieved depending on who is performing the analysis. Additionally, extensive training is required in order to produce FACS ratings [11]. Therefore Hamm et. al developed an automated version for dynamic analysis of facial expressions (in neuropsychiatric disorders). Their system tracks faces in video footage automatically and extracts geometric and texture features which are used to produce temporal profiles of the movements of the facial muscles [11]. 2.4.3 Audio Recordings Similar to reading facial reactions, reading reactions from the voice has mainly been conducted when two persons communicate or interact face to face. Today, modern systems have been developed to interpret emotions from recorded voices. Kostov and Fukoda did a study called Emotion in User Interface, Voice Interaction System [22], where they researched and developed a UI for recognizing emotions in voices, regardless of the speaker’s age, gender or language. Eight emotional states; neutral (unemotional), anger, sadness, happiness, disgust, surprised, stressed/troubled and scared are extracted from a speech database and examined for voice audio resemblance. A VIS (Voice Interaction System) was developed from these results and it can be used to determine what emotions the voice reveals. The VIS was developed based on analysis of human speech factors such as pitch, formants, tempo and the power of the voice. After the speaker’s natural voice properties have been analysed, the VIS interacts with the speaker and the voice emotions are extracted. The researchers have developed an emotion engine where a voice-based system reveals an indication of what emotional state the speaker is in. Professional actors and actresses, as well as non professional subjects as students, were used to record speech for the database, in order to get a reliable basis of what similar acoustics are presented for the eight different emotional states. In order to develop a cross cultural standard for voice emotion detection, students speaking Brazilian, Italian, Spanish, Flemish, Japanese, Macedonian and English were analysed. The demand for devices with the ability to recognise “emotional messages” is increasing, many users want devices which understand what they want it to do without having to waste time getting it to do it. Having the ability to achieve awareness and to interpret the user’s emotional state, regardless of whom the user is will have a big benefit in HCI and adaptive systems based on visual, kinesthetics and auditory. [22]. Altough a VIS will not be used in this research since the analysis of the user’s acoustic emotions will be done manually, Kostov’s and Fukosa’s study 17 can be a valuable starting point. Their study presents what emotional states to examine and what voice properties to investigate in order to achieve emotional interpretation. 2.5 Workflows for User Testing Various workflows are used for different types of development, depending on which testing methods and techniques that are used. There is no general workflow that is suitable for all purposes, it has to be adjusted for the specific company, product and testing situation. It is however possible to use existing workflows as guidelines when developing a testing workflow. SRTs can be advantageously used in remote usability testing and below follows a description of workflows designed for various testing methods and contexts, and also other information which is of relevance to mobile games testing with session recording tools. These approaches focus on the usability aspect of UX och needs to be combined with factors like the player’s emotions and the flow of the gameplay in order to be used in evaluation of the PX. 2.5.1 Remote Usability Testing According to Usability.gov [54], there are some important guidelines to consider when conducting remote usability testing: • The tests should be about 15-30 minutes long and should consist of 3-5 tasks. • The tasks should be straightforward and have well-defined end states. • Include the minimum system requirements of both the tool and the product in the instructions. • Make sure that the contact information for the test user is correct, allowing for follow-ups and reminders if needed. • Instructions and test materials should be prepared so that the users know what is expected from them and also what they can expect from the practitioner. • Consent forms for the test users should be prepared. • If the participants are being compensated, make sure to have the compensation and receipts prepared. The main difference with remote testing compared to traditional observations is the technology. Ensure that: • Whatever product you are testing is available and accessible outside the network/firewall of the company. • There are no firewall issues preventing the users from testing the tools and accessing the product. • Participants can easily download or access the screen recording tool or service being used. 18 2.5.2 Mobile Applications UserTesting has developed a checklist containing the four main steps to complete when conducting user testing of mobile applications [60]. These steps are: 1. Create a test plan 2. Organise the details 3. Run your tests 4. Analyse the results The first part of the test process should focus on defining the testing objectives; what questions are in need of being answered? It is also necessary to know which parts of the application to test and to determine whom to test the application on. Another tip is to identify all requirements, such as specific software, operating systems or devices. The second part, organise the details, consists of making sure that all the details are correct, for a trouble free testing. This includes making sure that the application is available for free to the test user, and that the test participants also know how to access it. There should be clear written instructions, so that the test participant understands what to do without unnecessary delays. It should also be clear how to share files between the test participant and the moderator and if sound and landscape or portrait mode should be switched on/off. The third part involves running the test and hopefully meet the objectives from part one. It should be specified in detail what actions the user should perform, if they for example should sign up for a newsletter, or if they should explore the application on their own. Metric questions can be used, such as task time (time for completing a task), or task difficulty (how easy the task was to complete on a scale), and both pre and post test questions can be asked, involving for example the users previous experience around the product or similar. Post questions can be about describing how enjoyable, easy or difficult they thought the experience was. One tip is to ask the test users if the application is something they would recommend to friends and family (NPS, see section 2.3.4). This question can give interesting insights into how enjoyable the test participant found the UX of the application. The last part of the test process deals with analyzing the resulting test data. A key indicator to determine whether the test has been completed in a correct way or not, and if the task was fun and engaging or difficult and confusing, is to investigate the task time. Try to determine what the user actually thought about the application by comparing ease-of-use questions with value-based questions. The participant may have written that the application is easy to use, but that does not mean that he or she would pay for it. Finally, share a summarised and readily comprehensible report of the test with the development team. In order to improve the test sessions to the next time; watch recorded test sessions together with the team and ask what user actions they are missing and write it down in list form [60]. 2.5.3 Mobile Games The question “How do you conduct a good usability study for a mobile game?” published on Quora (a question-and-answer website), was answered by David Suen from Betable [50]. Suen says that their usability tests are designed after Steve Krug’s book “Don’t Make Me think”. The book is developed mainly for web usability, but according to Suen it can still be applicable for mobile testing. He mentions some key points they are using at Betable: 19 • Do user tests in small batches of 3-5 participants It is sufficient to use smaller batches in order to identify patterns for frequently recurring usability problems. Once the issues have been identified and resolved, do another batch with new testers and find new problems. Do not expect that one test session is enough to identify all of the usability issues. • Getting test users People in the near surroundings such as family, friends or people at meet-ups, are often open to help testing the game. Make sure the experience is fun for them, and once they have completed the test, ask them for other people they know who may be willing to participate in a test. Usually, no compensation is needed but in many cases small things, like buying them a beer, can enhance the experience even more. • Length of test session A test should be about 15-30 minutes. Nemberg wrote a blog post about five common mistakes in game usability testing and how to avoid them. They held a playtesting session at the Gamefounders game accelerator, where the research was conducted. The frequent game usability testing mistakes, as well as the solutions of how to avoid them, were [36]: 1. Too much guidance Talk as little as possible when being the moderator of a test session. Being mute, and not explaining any background information about the game for the test user, is absolutely fine. Let the players find out how it works by themselves. They do need to understand the game mechanics from installation to starting the game, otherwise it is necessary to solve the problem and make it clear. 2. Assuming too much The player can not be expected to understand everything in the in-game menu. Try to get the test users to speak about the menus and items in the game, before starting the actual game. Do the players understand what all of the UI elemens and buttons are meant for? Are they understanding the navigations correctly? Many usability test moderators skip the part with letting the participant use the start screen and menu, this is however not recommended. Usability issues may be occurring also in the start menu. 3. Testing with just one demographic Casual games usually have target groups consisting of players of various age and gender, and if that is the case, the game should also be tested on all of these groups. When testing for example learning games that are targeting young children, their parents, who actually buy the games, should also be included in the test session. If the parent do not like or understand the game concept, chances are high that they do not buy it. 4. Talking too much The participant can get distracted if being asked too many questions, and it removes their focus from the game world and the gaming experience. Instead, ask the players to play the game and verbalise what they are experiencing. Is there any part or element in the game that is triggering a strong emotional response, or is anything making them frustrated? Try to avoid disturbing the player to the extent it is possible, and limit the number of moderators to one per participant. The player can get confused if someone 20 new asks questions during the gameplay. Observe the body language of the player; do they look relaxed or tense, does the body react to specific game actions? Also try to notice when they get really excited, for example if their eyes tingle. 5. Not recording the sessions Though taking notes is a good thing, it is not always enough. Try to also record the actual tests, for games the recommendation is to use a screen recorder, preferably along with an application that records the face of the player or an additional video camera directed towards their face. By combining these recording methods with a skin response sensor, the test results will gain even higher validity. 2.5.4 Session Recording Tools Lookback has produced a guide on how to conduct user testing using their SRT [30]. Since Lookback uses the built in camera of the mobile device, there is no need for additional testing equipment after integrating their Software Development Kit (SDK) into the mobile application. The testing steps which they recommend are the following: • Decide on what to test. Is it for example the UX of an entirely new product or a specific feature? • Decide who to test on. Should it, for example, be employees, people who do not know the product or existing users? Every group has different strengths and weaknesses, and their feedback will depend on existing experience and mindset. • Send the game application to the testers or let them come into the office. If the recording is done remotely, the user behavior may be more accurate but the process less organised. Sending the game application remotely can be done by using for example HockeyApp which is a platform for distributing beta versions of iOS, Android or Windows Phone applications. • Decide on what to test, then compose instructions and questions (if desired) for the test. Do not forget to write clear instructions on how to open the SRT (For Lookback it is normally by shaking the device, unless another method has been specified). Some example questions which might be useful in UX testing are: • How would you perform [a specific task]? • What is the game application about? • How would you create a new account in the game application? • What parts of the UI are most important? Lookback also recommend to test early and repeatedly, the earlier in the development process, the better. For staying on track with an user-centered approach, it is helpful to test repeatedly, in order to know that everything is still working properly. It is recommended to make a habit of testing at least once at every new release, even more often if possible [30]. 21 3 Approach This chapter presents the work methodology used to evaluate the various SRT candidates and to produce the resulting workflow. The first section, 3.1, describes the production of the workflow, next, section 3.2 presents the materials used. The research conducted is explained in section 3.3, and section 3.4 describes how the initial SRTs and services were evaluated. The following section: 3.5, explains how the tools are integrated into the game application. Proceedings for finding test users are presented in section 3.6, following information regarding creation of the test plan in section 3.7 and distribution of the game application and the questionnaire in section 3.8. The last sections explain how the user test was executed in section 3.9, how the analysis was conducted 3.10 and how the user feedback was managed, section 3.11. 3.1 Production of Testing Workflow The aim of the thesis was to produce a comprehensive workflow for how the commissioning company should conduct UX testing of mobile games using SRTs. The workflow was based on the research in chapter 2 as well as the results in chapter 4 in combination with experiences gained during the testing process. The workflow contains: tables displaying the properties of the SRTs, information about what should be considered when carrying out the tests and how to find test users and write questionnaires. The workflow also clarifies how to proceed when analysing the results from the test sessions. The aim was to make it easier and faster for the company to find an appropriate testing tool and to develop a good testing process. 3.2 Materials Since some tools supported only iOS when the project was initiated, the decision was made to integrate the tools into the iOS version of the game application. The computers used for research and tool integration were two MacBook Pros with OS X Version 10.9.4. Xcode Version 6.1.1 was used to integrate the iOS testing tools into the mobile game. The application with the integrated tools was tested on various iOS devices at MAG Interactive. 3.3 Research Initially, research about user experience and testing methods was conducted in order to gain knowledge about the topics of the thesis. The gathered information also covered UX in games and mobile games and how it differentiates from regular software. A comparative study of some of the most popular session recording tools and services found online was carried out. The qualified tools/services and their properties were compiled into two tables allowing a clean and good overview when comparing them (see table 3 and 4, section 4.1). In order to reassure that the correct information had been collected and to fill in gaps in the properties tables, e-mails were sent to the SRT and test service companies in question (Lookback, UXCam, Beta Family, UserTesting, AppSee, WatchSend, trymyUI, Userlytics, PlaytestCloud, UserZoom). An interview and introduction to the tool/service was also conducted with one of the companies, using JoinMe. Existing methods and workflows were investigated, and interviews with responsible testers at the commissioning company MAG Interactive were held in order to gain information about the current testing procedures. 22 3.4 Evaluation of Session Recording Tools Interviewing responsible Quality Assurance (QA) and UX testers at MAG Interactive was valuable in order to find out which properties in the SRT tables (see table 3 and 4, section 4.1) that were most important in the comparison process. A preliminary testing workflow was developed which contained the parts believed to be necessary in order for the test users to be able to conduct user testing with the tools. This workflow was then used for testing the tools considered to be of most relevance based on the requirements from the company and how these correlated to the information collected from the websites of the tools and services. Based on these test results, additional tables containing more specific information about the tools and services were produced. The factors and properties included in these tables focus on which test users to test on (demographics like age, location and previous, current or new players etc. and also if testing can be conducted at the office or not), which platforms, how easy the tool or service is to work with (based on our own experiences) and which services that are offered by the tools or test services. The first draft of the workflow was used to test the tools and services and this was then further refined in order to support the testing objectives for testing mobile games with the help of SRTs. User tests were conducted both with tools which provided recordings only (here it was up to the researcher to find test users, set up and distribute the test and analyse the results) and with test services which provided test set up, distribution, test users and recordings. The purpose of using these two different approaches was to compare the test process and the results in order to see if there were any differences, or if any of the two approaches would be preferable, but also to ensure that the workflow would fit both approaches. The SRTs from table 3 which were selected for further evaluation and testing were Lookback and UXCam. The tools Appsee and TestFairy were integrated and tested by recording ourselves but these were excluded due to not fulfilling all requirements for the test objectives. WatchSend did not offer an easy way to try out their tool and did not respond to e-mails regarding this and therefore they were also excluded from the study. The test services from table 4 which were tested and chosen for further investigation were PlaytestCloud and UserTesting. They both provide session recording, test set up, test user recruitment and distribution of the game application. The services Userlytics and UserZoom, were excluded from the study because they did not provide a trial option where it was possible to test their services and they did not respond to e-mails regarding this. UserZoom did offer a university programme where the tool could be used for free but when a representative from the university tried to contact them, they did not reply. Beta Family’s SRT SuperRecorder was integrated into the game but the tool made it act oddly and it was not possible to play the game in a correct manner and record the session at the same time, hence only Beta Family’s other test services were included in the study. 3.5 Integration The integration was conducted by following the instructions available on each tool’s website (see bottom line in table 3 and 4, section 4.1) for manually installing the SDKs of the tools into the test object, i.e. the game application Ruzzle. 3.6 Finding Test Users When recruiting test users for the test sessions where Lookback’s and UXCam’s SRTs were used, a recruitment letter was written and sent out through social media. The message was posted on the Facebook walls of the authors and in various Facebook groups accommodating 23 a vast amount of users. However only one participant was recruited through this channel. The rest of the participants for these test sessions were recruited by asking friends and family known to have iPhones and by asking family members at family get-togethers to take part using a borrowed device. Test sessions with Lookback and UXCam were also carried out using Beta Family for recruitment of test users. When using UserTesting and PlaytestCloud, the test user recruitment was included in their service. 3.7 Creating the Test Plan A test plan was created in order to inform the test users about how to perform the test. This plan contained a declaration of informed consent, instructions, tasks and pre and post gameplay questionnaires. Since the tools work differently from each other, specific instructions had to be specified for each tool. See appendix C, D, E, F, G for instructions and questionnaires used when carrying out the user tests. An effort was made to try to have the same pre test instructions and post gameplay questionnaire for all test sessions but since the tools and services differs this was not always possible. To ensure that the test plan and the questions were satisfactory and would result in gain of the desired information about the test object, the test plan was sent out to the entire development team asking for their feedback. When this feedback had been taken into consideration, two pilot tests (one for Lookback and one for UXCam) were carried out by two test users at a remote location before it was made accessible to the actual test users. See appendix H for the additional questions included in the pilot tests. 3.8 Distribution The test services who provide test users often also provide distribution of the application, but if they do not, it is necessary to find a suitable distribution option. In order to distribute the game application to the test users, a test service such as Beta Family’s SuperSend, Crashlytics or HockeyApp is needed. An additional service for distributing the questionnaire is also needed, such as Google Forms or a website containing a form for submitting user data. There are however session recording tools that also provide game application distribution, set up of test and recruitment of test users. During the course of this project, Beta Family’s SuperSend was used for distribution of the application and Google Forms was used for distribution of test instructions, declaration of informed consent and the post gameplay questionnaire. When using test services like PlaytestCloud, UserTesting and Beta Family, distribution of the application and the test was included in the service. 3.9 Execution of User Tests In order to perform the user tests without a test recruitment and distribution service, the game application and test instructions needed to be distributed to the recruited test users. The test documents consisted of information regarding the test (see appendix A), a declaration of informed consent (see appendix B), pre gameplay questionnaire (see appendix E), instructions and tasks (see appendix C and D) and finally a post gameplay questionnaire (see appendix F). In the instructions it was also explained how to use the SRT; how to start the recording and how to upload the video. When the tool/service allowed for an unlimited amount of post test questions, a survey regarding the session recording tool and the test user’s testing preferences was included, see appendix G. This survey was conducted after the test users had completed the regular part of the test, including the post gameplay questionnaire. This was because we did 24 not want the survey to interfere with the PX or the result of the post gameplay questionnaire. In order to participate, the users had to download and install the game application on their mobile device. Since Google Forms or Beta Family was used for test set up in this part of the study, the users had to read the instructions in a web browser on a second device like a computer a or tablet. This was the only way to digitally enable the users to read the instructions while at the same time performing the tasks on their mobile device. When using test services which recruited and distributed both game application and test, the tasks and questionnaire were set up from the services’ websites. It was also possible to specify demographics for the test users, only allowing users which matched the requirements to participate in the test. In this study all test users above the age of 20, which had not played the game Ruzzle before, could participate, regardless of their gender. The recruited test users were then able to partake in the test and complete (or at least try to complete) the tasks in the game application while the test session was being recorded, and then answer the questionnaire. 3.10 Analysis of Session Recordings The recorded sessions from each of the services were uploaded and became available at their corresponding websites, where it was possible to analyse the recordings and create annotations. The majority of the researched SRTs and test services call the functionality for adding a comment at a specific time in a recording for ”annotations”. To avoid confusion, annotations is therefore the term which will be used all throughout this thesis, regardless of what the tool/service itself calls it. The analysis was carried out in the researcher environment for each tool and annotations were created for every interesting observation in the recording. This could be regarding misunderstandings and difficulties in the navigation or when playing the game. It could also be in regard to the mood of the test user, if he or she seemed to like something and seemed happy or if they got annoyed or frustrated over something. Annotations were also made regarding if the test user managed to complete the tasks or not and if he/she had any difficulties completing them, also feedback from the test user or other observations about their behaviour and reactions were noted. These annotations were later used when summarising the user experience. Since some of the test sessions were carried out with our friends and family, we made sure to analyse the session recordings where the test users were unfamiliar to us. This was in order to avoid being influenced by already knowing the users and how they express themselves, making sure this would not affect the composition of the workflow or the evaluation of the tools. Normally, the test observer and the test users do not have a relation to each other. 3.11 User Feedback on the Test Object When the session recordings had been analysed, a document with feedback about the game was compiled for the commissioning company. The focus of the user testing was not only to get information about the session recording tools but also to collect feedback about the on-boarding process of the mobile game Ruzzle. It is important to compile the analysed results into tangible feedback in order to be able to address the issues. 25 3.12 Final Evaluation of Session Recording Tools and Test Services Documents with strengths and weaknesses and the overall experience of the tools were compiled in order to be able to compare and evaluate the tools. The information which was of value for the tools and services themselves were sent to the respective company by e-mail. 26 4 Results This chapter contains the collected information about the SRTs and test services investigated in this study and also the resulting workflow which has been produced based on observations, experiences and knowledge gained from this study. Section 4.1 provides information about the SRTs and test services which were initially investigated. Two tables are presented were the properties and features of the services can be compared. In section 4.2 the tested SRTs are being compared on a deeper level based on observations and experiences from using the tools. Properties, strengths and weaknesses are presented. In section 4.3 focus is on comparing properties, strengths and weaknesses of the test services which are providing test set up, recruitment of test users and distribution of test and application. Since some of the tools/services provide both test service and session recording, they have been included in both section 4.2 and 4.3. Section 4.4 contains a table where the tested SRTs and test services have been graded based on their performance in this study. The outcome of the user testing is presented in section 4.5, and the final workflow is displayed in section 4.6. 4.1 Test Services and Session Recording Tools Initially Investigated The SRTs initially investigated during the research phase are displayed in table 3 and the test services which offer session recording and also provide test user recruitment, test set up and distribution of the game application are presented in table 4. The information about the tools and test services that is displayed in the table have been gathered from their websites or by contacting them through e-mail (information collected in February 2015).Some of the tools and services from the table have been excluded from further evaluation due to them not giving access to try out the service, not responding to our emails regarding this or not fulfilling necessary requirements (e.g a high enough frame rate). The tools and services which were selected for further investigation were: Lookback, UXCam, Beta Family, UserTesting and PlaytestCloud. 27 Table 3: Session recording tools included in the initial investigation. 28 Table 4: Test services which also provide SRTs and were included in the initial investigation. 29 4.2 Tested Session Recording Tools The SRTs presented below have been integrated into the target game and tested. The strengths and weaknesses presented for each specific tool (for Lookback see section 4.2.1, PlaytestCloud see section 4.2.3, UXCam see section 4.2.2 and UserTesting see section 4.2.4) is based on our experiences during this thesis work. In order to compare the UX researcher environment (where the recorded sessions can be analysed) for each of the SRTs, a table was compiled, see table 5. Some aspects being considered are organisation of tests and recordings (which is provided in the dashboard), downloading and sharing possibilities and annotation features. Table 5: Features available in the UX researcher environment (where recordings can be watched and annotated) for the respective services. When conducting user testing with SRTs, it might be desirable to for example be able to customise the tool i.e. to change the default settings of the tool. Table 6 displays if it is 30 possible to customise the tool settings and if it is possible to preview the video, etc. The table also specifies if the tool censors password fields and keyboards, if advertisements are visible in the recordings and if the recordings are saved in the case of an application crash. Table 6: Properties for the SRTs. 4.2.1 Lookback Lookback [31] is a tool for recording feedback, user experiences and bug reports. The tool uses the front camera of the device and records both facial reactions, audio and the screen. The focus of the website is modern and stylish, it is easy to navigate but it lacks some functionality and it does not seem completely stable. Lookback’s UX researcher environment is displayed in figure 4. 31 Figure 4: Lookback’s UX researcher environment. Lookback’s strengths Lookback’s weaknesses • Clear progress bar which displays the name of the current view. • Items are spread out over the screen. Far to move both mouse and eyes between the progress bar, face recording, screen recording and annotations. • Good quality of video, voice and also game sounds. • Several issues with annotations: too many steps to create an annotation, not possible to edit, ordered according to creation date and not according to the timestamps (makes it confusing and disorganised). There is also no timestamp when clicking “post a comment” directly. • Censors password and keyboard. • Possible to click on an annotation and move to that time in the recording, also easy to share a link directing to a specific timestamp with a colleague. • Possible to add tags, add videos to projects, to favourise and organise the videos in an easy way. • Does not record game application crashes. • Possible to customise settings, e.g. giving the test user the opportunity to preview the recording before uploading. • Still not a completely reliable tool. The screen was not recorded for all sessions even though it was supposed to be and some recordings were not uploaded at all. 32 4.2.2 UXCam The advantages of using UXCam [62] is that it provides a lot of extra metrics, such as heat maps and possibility to see which view in the game application the user is in and also how the user swipe and navigate through the game. Another advantage is that it is possible to change the settings for the tool, directly from the website instead of changing the application code or ask the test user to change settings in a menu in the application itself. The big drawback is that the website and the SDK is under development and currently unstable, frequently making it impossible to access and analyse the videos. The UX researcher environment is displayed in figure 5. Figure 5: UXCam’s UX researcher environment. 33 UXCam’s strengths UXCam’s weaknesses • The website and the tool are under development and are currently unstable. Many functions are not working, sometimes it is not possible to view the uploaded sessions and sometimes the facial recordings are not uploaded at all. • The progress bar with timeline and annotations also display which direction the user swiped on the screen or if they tapped it. • UXCam has many extra features (heatmap, navigational flow and statistics with visualisations for the tests). • Not possible to download recorded sessions or annotations. • It is easy to share a recorded video or a specific clip from the video. • Does not upload the recording if the application crashes. • The administrator can change settings (video quality, use front camera etc) directly from the website. • The annotation feature is not working properly, sometimes it does not work to add annotations and sometimes they disappear. • The user do not have to specify any specific options when starting the session recording, they just accept a popup asking if it is okay to record the camera input. 4.2.3 • It is difficult to navigate in the video using the progress bar. • It is not possible to start the tool again on the same device if the test user has declined the first pop-up asking to start the recording. PlaytestCloud PlaytestCloud [46] specialises in testing of games. The UX researcher environment can be seen in figure 6. The biggest advantage with PlaytestCloud’s SRT is that it continues recording when restarting the game after an application crash. This separates the tool from the others investigated in this study. A drawback is that it does not provide facial recordings. 34 Figure 6: PlaytestCloud’s UX researcher environment. PlaytestCloud’s strengths PlaytestCloud’s weaknesses • Continues recording after an application crash. • No facial recording. • Does not censor sensitive input fields such as passwords. • Do not have to implement a SDK. • Possible to download sessions and csv file with annotations. • The dashboard and the researcher environment is simple and easy to use. • The annotation function is working smoothly (easy to add new, video pauses automatically, easy to edit, possible to add annotation at -5 or -15 seconds etc.). 35 4.2.4 UserTesting UserTesting [57] has a lot of experience from conducting user tests and they offer many services. Their biggest advantage is that they offer all of the desired functionality. However, in order to record the screen directly in the device it is necessary to integrate the SDK (which is currently available in Beta). Otherwise the user will record the test session using a web camera which will decrease the recording quality remarkably. The researcher environment is presented in figure 7. Figure 7: UserTesting’s UX researcher environment. 36 UserTesting’s strengths UserTesting’s weaknesses • Annotations are not automatically scrolled when the video is played, makes it a bit difficult to follow in the annotations. • Possible to rotate the recording and to increase playback speed. Also possible to jump -5 seconds in the recording. • Smoothly working annotation functionality (automatically pause when typing in a new annotation, save annotation by pressing enter, current annotation is highlighted when the video is playing, possible to edit annotation, etc.). • Does not censor password fields and keyboard. • The recordings were of bad quality when the SDK was not implemented, since they were recorded using the test user’s web camera. • The user gets the test instructions and the tasks on their device screen so they need no additional devices to take part in the tests. • The tasks are visible in the game application which may withdraw attention from the game, especially since there seemed to be some difficulties for the user to open and close the menu, but this depends on preferences. • Possible to create highlight reels directly on the website. 4.3 Distribution and Test Set Up Services The services which offer distribution and test set up and have been tested in this study are: Beta Family, PlaytestCloud and UserTesting. All of these services provide test users, although the recruitment process differs. PlaytestCloud and UserTesting also provide a SRT, while the Beta Family service used in this research has been used only for test set up and distribution together with the SRT’s Lookback and UXCam. Test set up properties regarding tasks and questionnaires can be seen in table 7. Factors such as demographics and the possibility to contact the test users are also displayed in table 7. The strengths and weaknesses presented for the test services (for Beta Family see section 4.3.1, PlaytestCloud see section 4.3.2 and UserTesting see section 4.3.3) is based on our observations and experiences during this thesis work. 37 Table 7: Features for test set up and distribution services. 4.3.1 Beta Family Beta Family provides several test services. There is SuperSend, a free distribution service where it is possible to upload an application and send a description and a link to anyone by mail in an easy and quick way. SuperSend is suitable when creating a test without using a test set up service, which is explained in more detail below in section 4.3.4, Distribution Without Test Set Up Service. Beta Family also has a SRT which is called SuperRecorder, but this is still in an early development stage and therefore it has not been further investigated in this study. It does contain several nice features, such as test instructions and tasks being displayed in the application, feedback for uploading of recordings, a face positioning functionality and it is 38 possible to view all of the recordings directly from the SRT interface on the device. But since the game could not be played properly while SuperRecorder was recording and the researcher environment is missing functionalities like annotation possibilities, it could not be used in this study. The service that has been investigated in this thesis work is Beta Family’s standard test service, which is displayed in figure 8, where it is possible to do all parts of the user testing at the same place. They provide test set up services, recruitment of test users and distribution of the application. It is also possible to specify if the test should be private (where the users are handpicked) or public (where all users can sign up). It is also possible to choose whether the users should get paid or not, many users are willing to take the test anyway since they can get higher ranking if they perform well. It is possible to invite colleagues, friends and family to participate in the test for free. Figure 8: Beta Family’s test set up service. 39 Beta Family’s strengths Beta Family’s weaknesses • Not possible to specify a screener question to ensure the test user is of the desired target group, and make sure that the user fulfills specific requirements before taking the test. It is possible to specify test user requirements, but this do not provide the same insurance as a screener question since users can ignore the text and participate in the test anyway. • Good overview of each part necessary for setting up a test. Easy to create new test, possible to use previously created tests and questions. It is also possible set test deadlines for how long the tests should be active. • Easy for the test users to get an overview of the test tasks and give feedback in each section. • The administrator has a lot of control on who takes the test (if private test) since it’s up to the test facilitator to invite users, it is also possible to specify demographics, devices and to contact the test users. • Would like more alternatives for setting up various test questions, it is not possible to specify for example a multiple choice question like ”pick 3 emotions” and limit the test users to only click exactly 3 checkboxes. The users can still add more or less answers. • Rating system - there is information available about how many reports the user has submitted and how they have been rated. • Not possible to see the specific date and time that the report was submitted. • Difficult to know which of the test users that should be connected to which session recording since the test user information and their response is uploaded to Beta Family and the recording is uploaded to the dashboard of the SRT. • Statistics functionality which gives an overview of the test users and their gender, age, country and device. 40 4.3.2 PlaytestCloud PlaytestCloud has a very easy test set up service, but the drawback is that it is not possible to specify any tasks and only a maximum of five post gameplay questions. However, the recruitment service is fast and the test users generally give honest and valuable feedback. Since the test users seem to play a lot of mobile games, they can give good usability feedback and compare the game and it’s concept to other similar games. This can be both positive and negative, it can present a lot of valuable feedback but sometimes it is desirable to test the game application on users that are not very technically skilled or do not have that much previous experience of games. Figure 9: PlaytestCloud’s test set up service. 41 PlaytestCloud’s strengths PlaytestCloud’s weaknesses • When creating a test on the website, it is not possible to specify: tasks, customised demographics, requirements (including devices), a screener or instructions without contacting the company. It would be faster and easier if this could be handled directly in the test set up. • Finds test users within 48 hours. • Creating a new test is straightforward and easy. • Can see what games the test user normally plays, valuable information about the test users. • The users are really thorough and give interesting feedback. • Not possible to contact the users directly, without first contacting the company. • The video, annotations and survey results are displayed in the same view giving a clear overview. • Can only add a limited number of post test questions. • No ranking system, not possible to review users or see how many tests they have participated in. 42 4.3.3 UserTesting UserTesting’s test set up is very extensive and all desired features are avaiable (depending on the subscription plan). The tests in this study were carried out using a trial of UserTestings PRO subscription plan and hence the following observations are based on the features included in that subscription plan. The UI seems a bit outdated, and in the beginning it can be difficult to find everything. UserTesting’s biggest advantage is that they are very fast, recordings can be collected within one hour. The demographics and possibility to choose a specific target group is also extensive, making it possible to specify exact demographics requirements and to retrieve a lot of information from each test user. Figure 10: UserTesting’s test set up service. 43 UserTesting’s strengths UserTesting’s weaknesses • Has screener questions. • The fact that the test tasks are visible in the app may withdraw attention from the game, especially since there seemed to be some difficulties for the user to open or close the menu. But this depends on preferences. • Possible to run tests with several demographic groups simultaneously. • Profound information in the “User’s profile” view. • The timestamps on the tests are not in local time, however this is not a major issue. • Recordings are available within an hour. • Test instructions and tasks are displayed in the test user’s device so there is no need for an additional device to be able to take part in the test. • Possible to duplicate a previous test and alter it to fit the new test objectives. 44 4.3.4 Distribution Without Test Set Up Service When testing with SRTs like Lookback and UXCam, it is possible to create and distribute the test plan without using a third party test set up service. These tools do currently not offer any test set up and distribution services and therefore this can be carried out according to preference. This means that it is possible to record test sessions performed by test users and colleagues at the office or another suitable location. When conducting remote user testing, it is necessary to send the test plan (containing test instructions, tasks, questionnaires, etc.) to the test users. This can be done using various approaches. In this study the test plan was created using Google Forms but for example a customised website could work just as well. In order to distribute the application, however, it is easier and safer to use an additional service where the installation file for the application can be uploaded and the test user can download it. The service used in this study was Beta Family’s SuperSend, where it is possible to specify a message to the test users in an e-mail along with the installation file for the application. There are also other distribution services available, like for example HockeyApp and Crashlytics. In order to avoid using any additional test services, test users were recruited amongst friends and family. Strengths Weaknesses • More control of the entire test process. • Can not do everything with one service. Need at least two services, one for test set up and one for distribution of the application. • Can specify and customise the test plan without any restrictions. • Possible to handpick the test users. • More information is needed in the test plan which places higher demands on the user. Many things are handled automatically when using a test service. • Suitable if the game is in a sensitive development stage, to avoid confidentiality issues (less risk that information about the game and the idea will be leaked if testing on people you trust or if having them sign a NDA (nondisclosure agreement) ) • Takes time to create and set up a test. • Takes time to recruit test users. Even though test users were recruited amongst friends and family in this study, it was time consuming and some people were reluctant to participate since they did not like to be recorded. • Difficult to connect user to survey response. The user has to speak a unique ID out loud and this has to be connected to the correct survey. This means that it is not possible to send out the test to all test users at once, (if the test users are supposed to be anonymous). 45 4.4 Comparison of Test Services and Session Recording Tools In order to get a clearer overview, and to be able to decide on which SRT and test set up service that are most suitable, the experiences from this study have been summarised into a grading table. Based on our analysis, features of the SRTs and the test services were compared and graded on a scale of 1 to 5, the result is displayed in table 8. The grades are explained and justified in the discussion in section 5. Table 8: Grading of tools and services. 4.5 Outcome of Test Session Analysis In the following section, statistics from the test sessions will be displayed in form of charts. Additionally, insights gained during the test sessions have been documented. In total 26 test users were recruited through the different channels but only 13 complete recordings could be collected for analysis. From these 13 recordings, two were filmed with web camera and not with a built in SRT and were therefore disregarded, another recording was disregarded due to wrong platform (tablet). Additionally, one recording which had face recording but lacked the screen recording was analysed. This means that in total 11 recordings were analysed. The rest of the recordings failed to upload in some sense; the facial recording was not uploaded (only the screen recording), the recordings were not uploaded at all, only a couple of seconds of the recording was uploaded, or it was not possible to view the recording due to file error. One user also had problem with the application crashing multiple times and did not want to complete the test session. Some additional videos were disregarded since the test users did not meet the test user requirements (they had previous experience of playing the game) and one recording was uploaded after the analysis deadline had passed and was therefore not analysed. 46 4.5.1 Questionnaire Results A total of 26 survey responses were collected. These include survey results from all test sessions which were carried out with the SRTs specified in section 4.1. Hence, the responses regarding a specific tool are based on a smaller group of respondents. The age distribution among the participants is displayed in figure 11a, while the gender distribution is presented in figure 11b. Age 15% 20-30 31-45 46-65 54% 31% (a) Age distribution. Total number of test participants: 26. Gender 0% Male Female Other 46% 54% (b) Gender distribution. Total number of test participants: 26. Figure 11: Age and gender distribution from a total of 26 post gameplay questionnaire respondents. 47 The test users who were recorded using UXCam or Lookback were asked some additional questions in regard to the session recording and the testing methodology. This was possible since Google forms and Beta Family had no limitations on the number of post test questions. There was a total of 15 survey participants who answered these additional questions, and out of these 7 had completed the test session using UXCam and 8 using Lookback. When using PlaytestCloud and UserTesting the post gameplay questionnaire focused only on the gaming experience. The test users were asked if they would prefer to be in control of starting and stopping the recording by themselves or if they would prefer that it is handled automatically, removing one step in the test process. The test users have been divided into subgroups depending on which of the tools Lookback and UXCam that they used. The total result is also displayed, see figure 12a. The test users who were recorded using UXCam were asked if they would prefer to be able to preview the recording of the test session before uploading it to the facilitator. The users who were recorded using Lookback and had the possibility to preview the recording, were asked if they liked the possibility to preview the recording before uploading it and if they had used this functionality. The result is displayed in figure 13. The respondents which answered that they did not use the preview option were also the one which replied ”don’t know” when they were asked if they liked the possibility to preview or not. In order to investigate if the test users appreciated to take part in the test in their natural environment, they were also asked if they would have preferred to do the test at home on their own device or at a test facility where they could be observed in real-time. A majority of the respondents stated that they would prefer to participate in the test at home using their own device, the result is displayed in figure 12b. 48 8 Number of Respondents 7 Lookback 6 UXcam Total 5 4 3 2 1 0 Automatically Manually Not sure How to start/stop recording (a) Manual or automatic recording. The test users which were recorded using Lookback or UXCam were asked whether they would prefer if the recording would start automatically or if they would prefer to start it manually. How would you have preferred to do the test? At home on my own device 20% At a test facility where I would get observed in real time 80% (b) Where do the test users prefer to conduct the test session, at a remote location like at home or locally at a test facility. Figure 12: Test users preferences to start and stop the recording manually or automatically and where they would have preferred to conduct the user test. There was a total of 15 survey participants, out of which 7 had completed the test session using UXCam and 8 using Lookback. 49 Would you have preferred to be able to preview the recording before uploading it? Number of Respondents 3,5 3 2,5 2 1,5 1 0,5 0 Yes No I don't know Replies (a) Do the test users appreciate the possibility to preview the recording of the test session before uploading it to the facilitator or not. Number of Respondents 7 6 5 4 3 2 1 0 Liked the preview functionality Not sure Did not like the preview functionality Previewed the Did not preview recording before the recording uploading before uploading Replies (b) Lookback preview functionality. The test users which were recorded using Lookback were asked if they previewed the recording before uploading it and if they liked the possibility to do so. Figure 13: The test users’ preferences regarding preview functionality in the session recording tool. There was a total of 15 survey participants, of which 7 had completed the test session using UXCam and 8 using Lookback. 50 4.5.2 Insights Gained from Screen, Facial and Voice Recordings We did not notice any substantial differences between analysing sessions with facial recording or without. How much information that could be extracted from the facial recording depended on the personality and the body language of the test user. Some recordings provided information in form of facial expressions, some which could be related to the feeling stated in the post play questionnaire. While some facial recordings did not seem to provide much additional information at all. In some cases it was valuable to see when the user was paying attention to the test session and when not. The facial recording also gave a better overall picture and subconsciously conveyed a sense of closeness to the UX researcher. 4.5.3 The Test Object: Ruzzle The observations from the analysed recordings from the test sessions were compiled into a document containing feedback on the mobile game application Ruzzle. Due to confidentiality reasons, and the fact that this is outside the scope of this thesis there is no need to go into further details in this report. 4.6 Resulting Workflow Below is the developed workflow, which is a part of the results of the thesis work. It has been developed as guidelines for MAG Interactive, with instructions for how to conduct user testing with the use of session recording tools in order to evaluate the PX. In this section, the workflow has been made concise and easy to read, but it is further explained and discussed in section 5.4. The focus is on mobile games, but the workflow has been made general in the sense that it can be suitable regardless of what the test objectives are and what test service and SRT that are being used. This was a conscious decision, since the available tools are constantly being updated and new tools are becoming available, thus the recommended tools can vary in the future. The tested tools and services are also suitable in different situations. Because of this, the workflow explains what to think about when choosing a SRT and a test service, instead of simply stating what SRT and test service to use and how. It is important to conduct the user testing often and iteratively, preferably for every new feature or release. 4.6.1 Test Plan The list below displays eight main steps for creating a test plan. Consult with the team throughout the test period regarding what should be tested - what questions are in need of being answered - and ensure tasks and questions are in agreement to achieve that. In order to correct the issues discovered during user testing, a summarised analysis and highlights from the recordings should be shared with the team. 1. Test Objective - Decide on what to test 2. Test Users - Decide on whom the test users should be 3. Tool and Test Service - Decide on which tool and test service to use 4. Time Plan - Set a time frame for the entire test 5. Prepare Test Details 51 • Preparations • Instructions • Pre Gameplay Questionnaire • Tasks • Post Gameplay Questionnaire 6. Perform Test • Pilot Test • Actual User Tests 7. Analysis 8. Summarise Results and Share with the Team A more detailed explanation of the different steps is displayed below. 4.6.2 Test Objective First, decide on the test objective: • What questions are in need of being answered? • What part of the game should be tested? 4.6.3 Test Users Decide on whom should test the game, should it be the main target group of the game or a group of test users who are new to the game? Examples of factors to consider are: • Age • Nationality • Gender • Casual or Hardcore gamers (i.e. how often they play mobile games) 4.6.4 Tool and Test Service A test service refers to an online service where a user test can be created, the installation file for the application can be uploaded and both test and application can be distributed to the test users. Some test services also provide a session recording tool, and some offer recruitment of test users. Tools (SRTs which are installed into the application and records the screen) can also be used without a test service, where the UX researcher takes care of test set up, distribution, recruitment and analysis personally. It is also possible to combine a SRT together with a test service which provides test set up, recruitment of test users and distribution of the test and the game application (for example the SRT Lookback with the test service Beta Family). 52 Deciding on which test service to use depends on: • Resources How much time and money can be spent on the user test? • Control How much should the researcher be able to control and customise the test? What specifications are needed in the test set up and how specific demographics for the test users are needed? • Confidentiality If the game or concept is not yet launched and should be kept private, it might be desirable to not use a third party company to handle the testing. Confidentiality aspects also affect if the user testing should be conducted locally or remotely, i.e. for privacy reasons it might be conducted locally at the office. 4.6.4.1 Session Recording Tool A session recording tool is integrated into the application using a SDK and it is then possible to record the screen, sound and in some cases also the input from the front camera of the device. The SRT also provides an online dashboard where it is possible to view the uploaded recordings and add annotations. Table 9: Choice of session recording tool. The choice of tool is affected by: • Development Stage Some SRTs will not continue recording after an application crash, resulting in the session recording not being uploaded or other recording issues. The risk for application crashes 53 is generally higher in early development stages. Different tools might be more suitable depending on which development stage the game application is currently in. See table 9 for information about which tools that support recording of sessions even if the application would crash. • Platform Make sure that the SRT supports the platform of the game application, see table 9 for information about which tools that supports the desired platform. • Facial Recordings Not all SRTs provide facial recordings. If there is a need to test the UX of the game using facial recordings, see table 9 for information about which tools who provide this. 4.6.4.2 Distribution and Test Set Up When conducting remote user testing, a test has to be set up and distributed to the test users together with the application. The SRTs Lookback and UXCam do not provide test set up and distribution, hence they need to be combined with an additional service such as Beta Family’s SuperSend where it is possible to upload the application with an integrated SDK from an independent SRT. PlaytestCloud and UserTesting have their own session recording tools, hence it is not possible to use them for distribution and test user recruitment. When using UXCam or Lookback it is also necessary to use a separate service for test set up and distribution of test instructions, for example Google Forms, a custom made website or Beta Family. When using SRTs and test distribution services which are independent from each other, it is important to make sure that the recordings and the answers to the post gameplay questionnaire can somehow be related to each other. This can for example be solved by giving each test user a unique ID which they have to state in both the recording and the post gameplay questionnaire. 54 Table 10: Choice of test set up and distribution service The choice of distribution service depends on the factors: • Time Frame What is the time frame for the test, i.e. how much time is available? Some test services that provide test users and recordings are faster than others, see table 10 for information about the time it takes to gain access to the videos. • Recruiting Test Users Table 10 displays which test services that provide recruitment of test users. In some services, it is possible to specify whether the test should be private or public and also to invite own test users or choose from their test user base. If there is a lot of time available, it is also possible for the researcher to personally handle the recruitment process and search for test users in the streets, cafés, social gatherings or online through forums or social media. • Specified Test User Demographics How specific should the test user demographics be? All test services provide some kind of test user demographics, but it is not always possible to specify additional requirements. 55 See table 10 for information about which services that provide the possibility to specify special demographics requirements. • Contacting Test Users If there are any questions regarding what the test user experienced, for example if something happened that was not displayed in the recordings or explained in the questionnaire, it might be desirable to contact the user for follow-up questions. See table 10 for information about which services that support contacting test users. • Longitudinal Studies Sometimes it is desirable to perform longitudinal studies by carrying out several tests with the same test users. The services which support this are displayed in table 10. • Easy and Quick Test Set Up As can be seen in table 10, all the investigated tools have a quick and easy test set up. If the goal is to create the test quickly, without a need for specifying an advanced and customised test, then PlaytestCloud is a good option. • Customisable Test Set Up Sometimes there is a need to design the test exactly according to a detailed description, which might be problematic if the required functionality is not available from the test service. For example if a question should be answered with multiple choice alternatives. See table 10 for information about which test set up services that provide a customisable test set up. 4.6.5 Time Plan Make a schedule and set a time limit for all parts of the user testing process: • Plan and prepare test details (including preparations, instructions, tasks and questionnaires) • Integrate SRT (if using a SRT where integration is required) • Distribute application and test to the pilot tester • Time for test user to conduct pilot test • Correct issues found in the pilot test • Distribute application and test to actual test users • Time for test users to perform the test • Analyse recordings and questionnaires • Summarise, and share the results with the team 56 4.6.6 Prepare Test Details Prepare the background information necessary for the test users to perform the test. It is not always possible to specify background information when using a test set up service, see table 14 to see which services that provide this. Figure 14: Properties for test set up. 4.6.6.1 Preparations • Declaration of Informed Consent It can be useful to have the user sign a declaration of informed consent in order to avoid legal issues by making sure the users are okay with being recorded. • Non-Disclosure Agreement If needed, prepare a non-disclosure agreement (NDA). • Specify Technical Requirements For example that a Wi-Fi connection is necessary for uploading the video or if the user test should be performed on a specific device etc. 4.6.6.2 Introduction • Test information What is the test about and what mindset should the user have. • Specify limitations in the SRT This depends on which tool that is being used. In the case where the application can not be closed/sent to the background during the test, the test users should be informed about this. The test user should also be informed whether or not the recording can be paused and resumed during the test session. 57 • Specify limitations in the test object Specify whether the game is live or not and if there are any parts of the game which might not work. Be sure to specify if, for example, in-app purchases are not working in the test version of the game. 4.6.6.3 Instructions • Length of game session How long the test users should play the game. • Start recording How to start the recording of the SRT. • How to upload the recordings Write instructions of how the test user can stop and upload the recording with the SRT. 4.6.6.4 Screener A screener is a question that the test user is asked before beginning the actual test, the researcher gets to specify the correct answer, thus only test users answering a specific alternative gets to continue to the test. Should only specific test users be included in the study? For example, if testing how understandable a new feature on the latest release is, it might be desirable to test on users that have not played or not updated to the latest version. However, not all test services provide a screener feature. 4.6.6.5 Pre Gameplay Questionnaire A pre gameplay questionnaire contains questions that the test user should answer before playing the game. It is not always necessary to have pre gameplay questions when using a test service, since information about the test user (age, gender, etc.) is already available in their profile view. When the test set up and user recruitment is handled personally by the researcher however, pre gameplay questions can be valuable. Try to keep the number of questions as short as possible, preferably between 3-5 questions. Examples of pre gameplay questions: • Age • Gender • How often do you play mobile games? • What games do you usually play? • Have you played [the name of the game] before? 58 4.6.6.6 Tasks Tasks are pre-defined activities that the test user should perform while using the app. The recommended number of tasks are 3-5, also here it is important to keep it short and simple. It might also be desirable to have no tasks, or simply one task saying play the game, which is also fine. When testing a specific part of the game, however, it might provide more information if the user performs tasks in relation to that area. Examples of tasks: • Create an account • Start a new game • Reach level 2 Sometimes it is okay if the tasks are a bit unclear, this can be used for example when testing if the user easily understands how to start a new game etc. In other cases, it might be better if it is specified in detail how to perform a task. 4.6.6.7 Post Gameplay Questionnaire A post gameplay questionnaire contains questions that the users answers after playing the game. When designing the questionnaire, it is important to keep the number of questions as reduced as possible, preferably between 3-5. Use scale formatted answers or other predefined answer alternatives as often as possible in order to make it easier and faster for the test user to answer the questions. Some test set up services only provide a limited number of questions, see table 14 to make sure that the test service allows you to ask enough questions. Examples of post gameplay questions: • Would you recommend [the name of the game] to a friend or colleague? On a scale of 0-10. (Net Promoter Score) • Name 3 emotions you experienced during gameplay. • Was it easy to understand the game? • What did you think about the game? • Any suggestions, comments or recommendations? 4.6.7 Perform Test In most SRTs, a SDK has to be implemented in the application in order to record the session (although, no implementation is needed when using for example PlaytestCloud). The test details and the application with the integrated SRT can then be distributed to the test users so that they can start the test. 59 4.6.7.1 Pilot Test Before starting the actual test, send the test with the instructions to a test user that is doing the pilot test. This is a way to test if everything in the test plan works and is understandable, if not it is easy to modify it before sending the test to the actual test users. Preferable one pilot test should be carried out while the facilitator observes the test user on the set, in this way the test user is more prone to question the things that are unclear. Additionally, one pilot test should be carried out by a test user at a remote location. It is also possible to add a few additional questions in a survey at the end of the test (after the users have completed the test), to ask what they thought about the test set up and the tool. Questions to ask the test user after completing the regular part of the test: • Did you experience any difficulties with the recording tool during the test session? • Were the instructions before the test sufficient? If not, what was missing? • Was there anything you did not understand in the instructions or regarding the recording tool? • Was it easy to understand the tasks? If not, how would you suggest they can be improved? • Is there anything you think should be changed for future testing? 4.6.7.2 Actual User Test When the pilot test has been conducted, and the test has been updated (if needed), send the game application and the test plan to the actual test users. 4.6.8 Analysis Analysis of the recorded test session can be performed by making annotations regarding interesting test user events throughout the video. Comments can be made regarding how the user is experiencing different parts of the game, for example if the user is getting annoyed at any part of the game or is experiencing any difficulties in fulfilling a task. The annotations can then be used as an aid when creating a feedback document where all user experience and usability issues should be documented. It can also be interesting to compare the actual experience, as it is perceived in the recording, with what the test user has stated in the post gameplay questionnaire. • Annotate First, access the session recording through the dashboard of the chosen SRT and analyse the recordings. Create annotations for every event or feedback, both good and bad, which can be deduced from the recordings. • Create feedback document Create an additional document and elaborate the feedback based on the annotations. Write down both the negative and the positive feedback. Some questions to keep in mind when analysing the recordings: – Can the user perform the tasks? 60 – Is the user getting annoyed at anything? – Is there anything that the user can not find, or have trouble finding? – Is the user trying to click on buttons that are not clickable? It can also be useful to add a link to the recording and specify at what time the event happened in the feedback documentation, or take print screens, to make it easier to share and explain the issues that were found. • Draw conclusions and compare analysis with questionnaire responses When the recordings have been analysed, see if the conclusions drawn from observing and listening to the recording agrees with what the user is stating in the post gameplay questionnaire. If it does not, try to understand the reason for this. 4.6.9 Summarise Results and Share with the Team In order to correct the issues found, it is important that the team understands where and why usability and UX issues occur. In order to convey the most severe issues, it can be valuable to summarise the issues found during the analysis. This way, the team will not have to watch all the recordings and read every annotation, but can instead focus on the major issues. It can also be helpful and time saving to prepare proposed solutions for the issues. • Summarise the feedback documents Make a summary document with feedback based on the recordings from all of the test users. Be sure to emphasise the issues found in more than one of the videos, i.e. where several of the test users encountered the same problem. • Proposed solution For each issue found: add a proposed solution section in the feedback document with a suggestion of how to solve the problem. • Compile statistics from questionnaires Summarise the questionnaire responses into one document and use this data to compile statistics. To get a clearer view of the responses, graphs and charts can be drawn from the statistics. • Share results with the team Share the summarised results of feedback and statistics with the rest of team. Show highlight reels of the recordings where the issues were discovered in order to gain understanding of the problems and discuss how to solve the issues. 61 5 Discussion This chapter discusses the benefits and challenges with remote user testing of mobile games in section 5.1, as well as motivate the use of session recording tools (in section 5.2). It discusses the test method and procedure for the user testing conducted in this study. The chapter also contain one section which discusses analysis of the recordings (section 5.3) and one section for the workflow (section 5.4). In this section, it is also discussed what aspects to consider when planning the test, conducting the pilot test and also deciding on a SRT and test set up and distribution service. Finally, section 5.5 discusses topics suitable for further research. 5.1 User Testing of Mobile Games Below, different aspects regarding user testing of mobile games such as remote testing, test users, test methods and procedures are discussed, Also social aspects and post gameplay questionnaires are being discussed. 5.1.1 Remote Testing Remote testing differs from testing locally at the office or at a special testing facility. Remote testing can be performed at home, at the workplace or during the bus ride home from school, which means a significant difference for the test users compared to if they had to be at a specific place at a certain time. Remote user testing with mobile SRTs do not require real-time testing and the test does not have to be completed at a certain time. The users can carry out the test by themselves without being in contact with the test observer, and the observer do not need to be there asking questions and giving instructions. According to Nemberg [36], see section 2.5.3, many test moderators do the mistake of talking too much during the session. This is one argument to why it might be better to not observe the tests in real-time and instead ask questions afterwards, instead of in the middle of the test session. Another advantage with remote user testing is that the test can be carried out in the user’s natural environment. When inviting test users to take part in a user test in a laboratory setting, they are put in an unfamiliar setting and sometimes introduced to another platform or operating system than what they are used to. This gives rise to different player experiences. During this study, a survey was conducted regarding user testing and SRTs, in addition to the post gameplay questionnaire. Due to limitations in the number of follow-up questions allowed during test set up for some of the test distribution services, these questions were only included when performing tests using Lookback or UXCam as SRT and Google Forms or Beta Family for test set up. According to the survey, 80% of the respondents would prefer to do the test at home on their own device, while 20% would prefer to do it at a test facility where they could be observed in real-time, see figure 12b. This strengthens the arguments for choosing a remote approach over conducting test sessions in a laboratory setting. SRTs allow the user to carry out the test as part of their everyday life in their natural environment. They can also use their own device which they are already feeling comfortable with, and no additional equipment is needed. But this also places higher demands on the test facilitator when planning the test. Test instructions need to be exceptionally clear, and it is also important to specify what OS versions that are supported both by the application and by the SRT. While the application and the SRT work on an updated high tech device at the office, the users might have older outdated versions on which it will not work. When conducting our tests, we were not entirely sure which versions of the operating system that were supported by the session recording tools and at least one user experienced trouble 62 with the application slowing down the device evidently, to the point that she did not want to continue the test session. The issue can however be both the SRT, the device or the application. In this study the application was only tested on a couple of different devices to see if the SRT worked after integration. As a pilot test it might be a good idea to test locally with several devices before sending out the test to remote users. This study has been carried out without any expenses. Thanks to the motive of the thesis, several SRT and test service companies were willing to contribute by offering free trials of their tools and test services, additionally no premises or devices were needed for the test sessions. However, a couple of iPhones were borrowed from the accommodating company in order for us to be able try out the tools, and also to lend to our friends and families who wanted to take part in the tests but did not have their own device. In total there were 6 test participants which did not perform the test on their own device. The elimination of the need for carrying out the test sessions in a laboratory environment both saves expenses and allows for the test user to carry out the test session in their natural environment. It was also very convenient to set up the test once, and then send it out and just wait for multiple responses. There was no need to meet each and every test participant, walk them through the instructions and wait for them to complete the test before moving on to the next participant. This saved a lot of time since we could simply work on something else while waiting for the test participants to take the test. Usability.gov (see table 2, section 2.3.3) mentioned that some challenges in remote user testing includes security of sensitive information, the observer getting a restricted view of the user’s body language and technical problems. We noticed a worry among the employees at the commissioning company regarding uploading the game to a third party or the test users own devices. They were not willing to take that risk unless it was a game which had already been launched. Unreleased games, still in the prototype phase, could only be tested with coworkers and their families. There is a fear of compromising the security of products still in the prototype stage, because of this the remote testing approach may be ruled out when testing games which are in a sensitive stage of the development process. Regarding the restricted view of the user’s body language, we did not feel like this was a problem. Their tone of voice and their interaction with the application said very much by itself. One technical problem that was avoided by using mobile session recording tools was the risk of the user feeling uncomfortable with the technology. Since most of our test users performed the test on their own device there were no problems related to this. However, other technical problems with the SRTs were revealed. We did for example notice a problem with having a slow internet connection. At the end of the test session the test participant was asked to uninstall the application from their device. This was to prevent them from opening it by mistake and uploading recordings which were not part of the test. However, we believe that uninstalling the application too soon may have prevented some of the recorded sessions from uploading to the online dashboard and hence they were lost. The videos not being uploaded on time, can be a product of a slow internet connection, an unstable Wi-Fi connection or problems with the SRT itself. Based on this, it is important to make sure the test user has a stable Wi-Fi connection and also to tell them to wait for some time before uninstalling the application. In both the tools we tried, Lookback and UXCam, there was no way to let the test user know if the recording had been uploaded or not. All in all, the benefits and challenges with remote user testing which were mentioned by Usability.gov (table 2, section 2.3.3) conforms to our observations and experiences during this study. 63 5.1.2 Test Users The commissioning company decided that the test could be carried out by anyone from the age of 20 and up, regardless of gender. This was in order to gain more information about all types of players. The game is generally being played by middle-aged women, but when testing the on-boarding process in this study, all kinds of adult users were of interest. When recruiting our own test users for the sessions with Lookback and UXCam, we had trouble finding volunteers. Even though volunteers were searched for in large communities through social media, there were no voluntary participants. This could be due to the fact that people got intimidated when we told them that they would be recorded or it could also be due to the recruitment message containing too much information. Another reason could also be the lack of incentives and no more compensation than a simple thank you. In section 2.5.3, Suen explains how to conduct a good usability study for mobile games. He recommends to do tests in smaller batches of 3-5 persons which was an approach that we followed, and it is also in agreement with Nielsen’s statement in section 2.3.7 that five users are enough. Suen also states that people in the near surroundings usually are open to help testing the game, which was one of the approaches that we used in order to find test users. People need some kind of motivation in order to take part in user test studies, this can be in form of monetary compensation or some other incentive. One incentive can be the fact that they are helping someone else, which is often the case with friends and family. Due to the lack of voluntary participants, we too turned to our friends and families. Here one of the main obstacles was the fact that most of our acquaintances do not own an iPhone. Since the study, apart from investigating the tools, also aimed to investigate the on-boarding process of the game, the test participants also had to be new to the game. These two requirements ruled out everyone at the commissioning company MAG Interactive as well as many of our friends and family members. Eventually, we ended up testing on friends, family and acquaintances using a borrowed device. This means that some of the participants were in fact partaking in the test using a device and operating system they were not used to and did not feel entirely comfortable with. This should not have affected the outcome of the evaluation of the session recording tools, however it might have affected their interaction with the game. This has however been overlooked since the only major difference between the two main mobile platforms Android and iOS is how the ”back” option works (as far as the target game is concerned). Still, one can argue that some of the benefits of remote testing with mobile SRTs were lost due to this, since one big advantage of remote user testing with mobile session recording tools is that the user can carry out the test on their own device, which they are confident in using. However, this should not have affected the outcome of the study. An interesting aspect when recruiting test users is to decide whether they should be recruited through a test service or not, and if they should be paid or asked to participate in the test without monetary (or other) compensation. Based on the user tests carried out in this study, it can be observed that users who were recruited through a test service were more likely to write more detailed answers. During this study, we came across user recruitment systems which motivates test users with incentives like monetary compensation or ranking systems. These incentives seems to make the user more keen to answer the questions more thoroughly. There was a clear distinction between the answers to the post gameplay questionnaire filled out by our friends and family and the ones filled out by unknown test users recruited through a third party test service. The test users previously unfamiliar to us, generally left more elaborate and constructive answers, while the test users recruited amongst our friends and family mainly gave very short answers. Some test users even wrote nonsense as reply to non-optional questions requiring a text answer, since they did not have the patience or interest to continue. The written nonsense could also be 64 addressing issues with the test itself, which may be in need of modification. The user might have thought that the test was too long or that the questions were too complicated. Another aspect that needs to be considered is the fact that the test instructions and post gameplay questionnaire were written in English. The fact that the users recruited form test services often were native English speakers (UserTesting, PlaytestCloud) while our friends and family are not, may have contributed to a difference in the result. The language barrier might have made them less comfortable with describing their thoughts and experiences, even though they were encouraged to speak and answer in their native language. Maybe should the instructions, tasks and questions also have been presented in their native language and not in a foreign language. However, several of the test users from Beta Family were non-native English speakers and still gave more elaborate answers. Based on these observations, it might be better to pay a small amount to get more elaborated feedback and more useful information. Additionally, it can not be made certain based on this study if it matters if the users get paid or not, since the data base is too small and test users at Beta Family also are motivated with ranking and not always monetary incentives. Further investigations could be made for example to see if users at Beta Family answers sooner if they are being paid (we did not notice any difference when setting up both paid and unpaid tests). When hiring test users from a test user network, it is important to keep in mind that the users might eventually become closer to ”professional” testers than the ordinary user (the ordinary user is probably closer to the target group of the application). Users who are not accustomed to take part in user test studies might be able to provide additional information which will not be revealed by an experienced (possibly tech-savvy) test user. This topic could be further investigated in order to see how it affects the result if someone is a regular test user or not, but that is outside the scope of this thesis. PlaytestCloud claims that their testers get on average one test invitation per month. They also stress that since the users get no special instructions besides from “play this game and think out loud”, they will not fall into a testing schema and every test will therefore be different. They claim that this approach makes the test session become more natural and that the game experience can be compared to the experience the player gets when playing a new game they have found on the Appstore. Our observations, however, tells us that the test users we watched from PlaytestCloud seemed to be pretty experienced. This may be due to the fact that the test users play a lot of games during their spare time. Since PlaytestCloud are focused on games, the test users have a pretty vast experience of playing mobile games and spotting usability and UX issues specific for games. They also compare the test game application to other games they have played. This can be very good for discovering issues due to differing from the ”norm” or spotting concept errors due to similarities with other games on the market. On the other hand, they do not seem to represent the inexperienced casual player and there is currently no way to know if they have participated in many user tests or not. In some services this can be prevented by choosing a user with a lower ranking or a lower number of submitted reports. This is possible at for example Beta Family where every user has a ranking and information about the number of submitted reports displayed in their profile, and it is possible to handpick the test users based on this. At PlaytestCloud and UserTesting there is currently no easy way of choosing test users according to this approach. During our online meeting with a representative from UserTesting, they claimed that there was no risk that the test users became professional testers since they did not get new tests all the time. In the webinar with Krug and Sharon [24] which was mentioned in section 2.3.3, they answer questions from the audience and customers of UserTesting. According to the author of one of these questions, the UserTesting test users are highly experienced participants. However, Krug and Sharon argues that in spite of having experienced users, the serious 65 problems seem to be found anyway as long as you test. They also recommend UserTesting’s rating feature, where it is possible to choose test users who have a low rating score. Another advantage with the scoring system is that the users want you to rate them higher, therefore they are also good at answering follow-up questions. When trying out UserTesting, we only found one toggle option saying ”Any (use highest rated testers)” for the demographics options: gender, country and social networking. But since it is possible to customise the demographics by writing the requirements in plain text, it should be possible to specify that the test should be carried out by users with a lower rating. Although it seems to be implied that the services assume you want high rated test users and not the opposite. It is also important to consider the time frame of the study. Different services have different obligations to provide results within a certain time limit. PlaytestCloud for example, promises results within 48 hours, while UserTesting has a time limit of one hour. If you recruit your own test users from the street, office, friends and family it can be difficult to predict how much time it will take to find test users and to have them go through with the test. Beta Family differs a bit from the other test services. Firstly, it is both possible to pay the test users or to ask them to partake in the test for free. Secondly, you can make a test public, allowing all Beta Family testers to partake in the test. Or you can choose to make it private and handpick the test users yourself from their test user base. This gives the test facilitator more freedom, but also places higher demands on planning the test and to consider the time frame of the study. It is possible to set the time limit of a user test to 1-21 days. When creating a private test, the facilitator has to wait for the test users to accept the invite, do the test and send in the report. You are not guaranteed that the test users you invite will accept and not even that the test users who have accepted will complete the test and hand in the report. With this approach it is necessary to invite several test users and then hope that the desired number of test users will complete the test, using this service hence requires a looser time frame. When creating a public test, there is a higher chance of getting a larger number of test users within the time frame, although there is still no guarantee, even though Beta Family currently has 17000 test users, 154 nationalities and 470 different devices [40]. When using, for example, PlaytestCloud or UserTesting, you are guaranteed to have the ordered recordings within the specified time frame. But if recruiting test users yourself or using a service like Beta Family, it is important to take into consideration a possible time delay due to too big no-show numbers. In section 2.3.7 it was mentioned that Nielsen [42] implies that the no-show rate for remote user tests might be higher than for user tests conducted locally. For this reason, Schade [49] recommends recruiting some extra test users just in case. This should also be applied when working with test services like Beta Family. However, the users at Beta Family could probably be assumed to be wanting a good grade in order to improve their ranking and hence they should be keen to complete the test. A test study can be delayed due to bad planning and other distractions or obstructions. If it is crucial to complete the user tests in time it might be better to choose a service which provides results within a specified time frame. Therefore it is also important to have a good test plan in order to make sure the test users are found in time. Another aspect to keep in mind is that it can sometimes take longer time to find the right test users, even with services like UserTesting and PlaytestCloud, if the demographics are very specific and narrow. 5.1.3 Test Method and Procedure Since this study investigates remote testing specific for mobile games, it is not enough to simply use test guidelines for regular task-orientated software, as explained in section 2.2. This is because they have different goals and usage compared to games and mobile applications. Also Korhonen states that mobile games need to be evaluated differently compared to other 66 digital games, see section 2.2.1.3. Ülger also explains some of the playability issues that mobile games can encounter, for example handle interruptions such as receiving a phone call. When using a SRT with facial recordings, the analysis of the recorded test sessions is similar to traditional observational user testing, which is explained in section 2.3.2. The main difference when using a mobile SRT is that the recording only displays facial reactions and does not reveal much about the rest of the test user’s body language. Traditional observational testing methods are also often conducted in real-time and the researcher is usually located at the same location as the test user when the test is performed. The user testing method that has been conducted in this study is called unmoderated remote testing, since the researcher can analyse the recordings retrospectively and there is no real-time interaction as explained is section 2.3.3. Since this study investigates remote testing specific for mobile games, it was not enough to simply use test guidelines for regular task-orientated software. This is because they have different goals and usage compared to games and mobile applications (as explained in section 2.2). When conducting user testing with SRTs, both attitudinal and behavioral information regarding each specific user and their actions are collected, making it an appropriate method for qualitative studies. Since the aim of this study was to investigate not only what people did but also how and why they did it, it is considered to be a qualitative study. The study generated information about both the test users attitudes towards the game and their behavior when playing the game. See section 2.3.1 for further explanation of these concepts. The study also contains quantitative elements; information regarding the users’ preferences about how the test should be conducted (see figure 12b, section 4.5.1) and their opinions about preview functionalities (see table 13a, section 4.5.1) and how to start and stop the recordings were collected (see figure 12a and 13b, section 4.5.1). However, there were not enough answers collected for the study to be a valid quantitative study. As mentioned in section 2.3.3 Nielsen recommends 20 participants in a quantitative study. In this study, there were 26 participants but only 15 of them were asked the questions mentioned above. The behavioral approach aims to answer what people do, while the attitudinal approach focus on what people say. Both approaches are directly applicable when using a SRTs to record the users. During the course of the test session, the test users were asked to think out loud, verbalising and explaining their behavior and attitude towards the elements of the game. However, in order to make the gaming experience as natural as possible, it was pointed out that the test users did not have to think out loud while playing the game. The test users were also asked to fill in a questionnaire outlining their thoughts, feelings and attitude towards the game. Except for the self-reported information (think aloud and questionnaire), behavioral information was collected from the recordings through observations. 5.1.4 Social Aspects Isbister argues that social games (see section 2.2.1) should be tested in a social context in order to get correct results [18]. Since Ruzzle is a social (as well as a casual) game where you play against other people, both randomly chosen and friends and family, it can be argued that it could be tested in a more social context. Most of the users were mainly interested in playing against others - which is also part of the game concept itself. There were some players though, who expressed a wish for not having to challenge opponents, but mainly users want to compare scores and see how well they did against the opponent. One test user wrote: ”I think this is one of those games where it will me much more enjoyable when you know you are playing someone real at the other end, regardless of whether you are evenly matched or not. I was only able to play in practice mode today, unfortunately the ”find a random opponent” remained on search mode throughout the nearly 10 minutes recording today.” This experience was not unique 67 amongst the participants in this study, and it highlights an issue with the random opponent functionality. The fact that it takes so long to find an opponent in a social game makes the experience less enjoyable. Maybe this approach is not optimal when testing social games like Ruzzle, where gameplay is dependent on others. Maybe it would be better to organise test sessions where the opponent is predetermined in some way. One of the test users just happened to play against a family member who was in the same room. They communicated with each other when playing and as a bonus revealed more issues and information regarding the social aspects of the gaming experience. But how can the social aspect be made part of the test session? Organising test sessions with multiple participants might be a better way to test a player vs player game like Ruzzle. Our experience was that the test user became more relaxed compared to other test participants, talked more during gameplay and made honest comments about the game, even though being recorded on video. This is however based on one single observation and should be further investigated. In a game like Ruzzle, where the gameplay is dependent on two individuals playing the game almost simultainously in order to get a good flow, it might be better to have a certain predetermined opponent to challenge. When testing the on-boarding process, this can be divided into at least two scenarios. One being the player finding the app somehow and downloading it independently, which will require the test user to investigate and discover the app by itself and hence it is not suitable to supply a predetermined opponent. And the other being the player was recommended to play the game by a friend which asks the player to challenge them, when testing this scenario it would be a good idea to supply a predetermined opponent. Both these approaches could be tested but require different test set up. The problem when requiring several test users to take part in the test, is that they have to be available at the same time and this removes some of the advantages with remote testing. One also has to take into consideration if the test users should be co-located or not and if they should already have some kind of relationship to each other. It might also be difficult to test on multiple test users if hiring a recruitment service, depending on if this feature is provided or not. 5.1.5 Post Gameplay Questionnaire More detailed answers can be received if using text input instead of radiobuttons, checkboxes etc. But as mentioned in section 5.1.2, it can be concluded that some users do not reply at all when asked to answer in text. The best would be to balance questions where answers are provided and questions which require text answers. It can also be concluded that the balance might depend on if the test users are compensated or not. If paying for the test users, it might be a good idea to have more open questions asking for text answers in order to gain as much information as possible. But this also makes the questions open for misinterpretations, which is one of the reasons why the entire test, including the questionnaire, should be pilot tested. All the services included in this study provide post gameplay questionnaires (or post test session questions, since all services are not specialised in games). However, the number of questions it is possible to ask the test participant differs, as does the way the questions are presented. For example, PlaytestCloud offers five post gamplay questions, UserTesting offers four and Beta Family offers an unlimited number, see table 7 in section 4.3. Additionally, PlaytestCloud and UserTesting only offers plain text questions while Beta Family provides radio buttons, check boxes, and selection alternatives which can be used to, for example, visualise scales. This functionality comes in handy if collecting quantitative data as well as qualitative. It can be discussed if a post gameplay questionnaire is necessary at all, since a lot of information can be gained from the recordings and the test user can be urged to sum up their experience or answer questions in speech. But written responses give the user the opportunity 68 to express themselves differently as well as being a good base for statistical observations. It is also an opportunity for the UX researcher to ask the test users about their feelings for the game and the test session. The questionnaire can be used to compare the emotions perceived from the recordings and the feelings stated by the test users themselves. The answers can also be used as a complement for test sessions where the test users did not express any distinct emotions. Not everything can be observed, sometimes the player has to be asked certain questions directly. This can be a disadvantage with using SRTs only, but it can also be a motivation to use questionnaires as a complementary procedure. It might be possible to notice if a test user gets irritated just by watching the recording, but this depends on the player and it is sometimes difficult to know exactly what makes the user feel the way they do. This is also one of the reason why it might be necessary to be able to contact the user with follow-up questions. Emotions can also be difficult to read just from observing and listening to the player playing the game. During the execution of this study, an effort was made to try to formulate the questions similarly even though they were presented in different formats to the test user (depending on which test service that was being used). It can be discussed if a survey where the user is given a question with a visual scale with radio button options, can be compared to a survey where the same question were given in plain text with the scale described in words instead of with a visual scale. In this study the two approaches have been treated equally but this could be further investigated. The NPS, which is described in section 2.3.4, was investigated in this study, but due to confidentiality reasons the results will not be published. 5.2 Session Recording Tools There are many factors to consider when evaluating and comparing session recording tools and test services and their features. In this section some of these factors and how they affect the testing process are discussed. It is also explained how the tools have been graded (see table 8, section 4.4), this includes grading of website, integration, test set up, customisation, demographics specification, test user profile information and researcher environment. 5.2.1 Evaluation of tools and features Both in Lookback and UXCam, it is possible to customise the settings for how to start and stop the recording etc. In Lookback it is also possible to preview the recording before uploading it and to have the camera input visible on the screen. An interesting aspect to discuss in relation to this is how much the user should be involved in the recording process and how much freedom that should be given to them. Could too much freedom and too much involvement affect the PX? Do more options affect the test results? Is it better if the recording starts automatically or should the test users themselves be able to start and stop the recording? According to our test users (see figure 12a), the majority of the respondents preferred manual start and stop or were not sure what they would prefer. However, an interesting observation is that from the respondents which replied ”automatic”, all had completed the test using UXCam which automatically starts and stops the recording. This may imply that the result would have been different if the test users would have had the possibility to try out both options. But even amongst the users which tried UXCam, most preferred to be able to manually start and stop the recordings. The data from the survey is limited due to a small number of participants and could only be seen as guidance, not fact. In Lookback’s SRT, there is a possibility for the user to preview the recording before uploading it. This might be good, for example, if the user realises that their face was not visible to the camera during the session or similar, giving them the possibility to record a new session. 69 But it can also be a way for the user to censor the test session. What if the user does not like the way their face looks from the angle the device is recording? Or what if the user think they were playing too badly and are embarrassed to upload the recording? There is a risk that the test user misuses this functionality and to be on the safe side, maybe this feature should be disabled when testing first time players (if it is only their first time playing the game which is of interest). The limited number of survey responses makes the result a bit unreliable but according to the collected responses, all of the test users which used the preview functionality in Lookback liked it (see figure 13b) and three out of seven of the test users which were recorded with UXCam would have preferred to be able to preview the recording before uploading it (see figure 13a). One problem we experienced when testing the tools with facial recording, was to position the device so that the entire face was visible to the front camera at the same time as playing the game in a natural and comfortable position. In Lookback’s SRT there is a setting which can be enabled which shows the face recording in the lower right corner (see figure 15) while playing the game. The advantage is that the test user can make sure that the face is visible to the camera throughout the test session, but it may also cause distraction from the game and affect the PX. The player might get self-consious and the picture might obstruct important parts of the UI and this could potentially remove focus from the gaming experience. Figure 15: Using Lookback with the setting to show camera input in the lower right corner set to on. When the integration phase of the study had already been initiated, we discovered that Beta Family had a SRT of their own called SuperRecorder. This tool was not included in the study because we were unaware of its existence when planning it, and also because the tool did not 70 come with a dashboard with annotation possibilites and at a first attempt of implementing the SDK it seemed to cause errors in the game application. However, SuperRecorder does have facial recording and hence we found it extra interesting, but it was decided that analysing yet another tool was outside the time frame for this project. In SuperRecorder, however, the camera positioning issue has been solved by giving the user the opportunity to adjust the device position by showing the camera input for a couple of seconds before starting the test. This solution gives the user the possibility to adjust the device but prevents distraction. Even though we found it rather difficult to keep the camera positioned correctly during gameplay, we were pleasantly surprised when we noticed that this had not been a problem for our test users. In some recordings, the mouth was occasionally missing, but we realised that this was not a very big problem since the user experience could be deduced anyway. However, it is good if the tool has many options, so it can be customised depending on what is desired for the specific test session. The drawback is that it can be time consuming to customise the settings if several of the options should be considered in the study. The main issue when carrying out tests using UXCam or Lookback, is that there is no way of telling the test user if the recording was uploaded or not. There should be a message which tells the user if the video was successfully uploaded or not, and an option to retry to upload if it fails. Otherwise the user will not know when they can delete the app. We consider this to be a big drawback for these tools. SuperRecorder is the only tool we came across during this study which gives feedback if the recording was successfully uploaded or not and which also offers the possibility to upload a session recording multiple times. 5.2.2 Grading of SRTs and Test Services The grading of the SRTs in table 8, see section 4.4, is based on the experiences from conducting this study. The things we consider to be of extra importance when choosing a tool has been included in the table and will be discussed here. 5.2.2.1 Website The website grading has been based on how usable the website is, how well it works and its features. UXCam has many good features which the other services lack, but has got a lower grade since the website does not work properly. The website is currently under development and the site and the recordings are loading very slowly and often things are not loaded at all. This is a major drawback and the poorly working website makes it almost impossible, from time to time, to use their SRT. Towards the end of this study it did not work at all, but when it is working properly we believe it will be very good. UserTesting has a simple and clear website which is reasonably easy to navigate and has hence got a higher grade. The site has many features and possibilities though, thence it can be a bit difficult to find everything before getting used to it. Therefore, their website has not got the highest grade. Similarly Beta Family’s website is pretty easy to navigate but also here one has to click around for some time in order to reach the correct view. PlaytestCloud lacks many features which the others have, but this also makes the navigation of the website very easy. The lack of features is the reason why it did not get grade five. Lookback also has a clear website which is easy to navigate but the researcher environment can be a bit cumbersome to work with. 71 5.2.2.2 Easy to integrate All tools were fairly easy to integrate. There were clear instructions, sometimes a bit outdated though. However, amongst the tools included in the final study, the integration instructions were up to date and all worked well. There was some trouble encountered later in the study when some of the SDKs were updated and had to be reintegrated. 5.2.2.3 Easy to set up test All test services have got the highest grade regarding how easy it is to set up the test. Both UserTesting and Beta Family provide example tasks, and both have a category for suitable tasks for game testing. At UserTesting it is also possible to specify if the user should give a verbal response to something during gameplay or if it is a ”do this”-task. Both services allow for creation of similar tests using an old test setup as the base, and allow for saving of drafts. Since there is no possibility to add tasks or other test instructions at PlaytestCloud, a test can be set up very quickly. It is not possible to make a similar test based on a previous test, but on the other hand there is not much to copy from any old ones. 5.2.2.4 Customise test The lack of an option to easily specify test instructions and tasks, is the reason why PlaytestCloud got a low grade on customisability. It is possible to specify what type of players to test on and there is an option to ask the player to fill in a gameplay questionnaire, but even though the demographics is somewhat customisable, there is no other way to specify test properties like instructions and tasks without getting in contact with the PlaytestCloud crew. They are very helpful and everything seems to be possible but the fact that there is a need to contact them means it will take more time and hence they have got a lower grade. When using UserTesing or Beta Family it is possible to add initial instructions, introducing the test and telling the user which mindset they should have. It is also possible to specify tasks and post gameplay questionnaires. At UserTesting it is possible to add a screener question which further weeds out the test users you are not interested in. This is the reason why UserTesting has got a high grade. Beta Family’s tests lacks this option but weighs up for it by offering an unlimited number of questions with possibilities to display them in scale, selection dropdown, multiple choice or text format, in comparison with UserTesting where it is only possible to ask 4 text based questions at the end of the session. However, there are some more things missing in the test set up at Beta Family as well. For example, the possibility to specify a maximum number of checked checkboxes allowed and the presentation of the scale questions is not very user friendly. But this still gives more freedom than the other services, UserTesting only offer 4 post test questions and these have to be written and replied to in plain text. 5.2.2.5 Demographics Specification The demographics grade has been based on how specific the demographics can be. Since UserTesting has an option where it is possible to specify other demographics requirements, it is possible to specify the demographics as detailed as needed, without any limitations. The screener question also makes it easier to reach the desired test users. Beta Family does not have a screener functionality but it is possible to specify requirements in text. However, if using the approach where specific users are invited to take part in the test, there are only a few predefined specifications to search for. At Beta Family it is also possible to handpick test users and invite the same users on a regular basis, if longitudinal studies are required. Since 72 PlaytestCloud is specialised in games, it is possible to specify the test users according to games they have played or by age, gender and what gaming type they are (casual, midcore, hardcore). But it is not possible to specify both played games and other demographics at the same time. Additionally, there is no option to specify other requirements, or to use a screener question, which makes it difficult to specify more exact demographics. It is for example not possible to say “I want users who have not played this game before” without contacting them. 5.2.2.6 Profile Information All test user recruitment services provides information about the users gender, age and nationality or country. UserTesting has the most extensive profile information, including for example gaming genres, income, ranking, web expertise, social networks etc. Hence they have got the highest grade. PlaytestCloud’s user profiles contains favorite games, hours spent playing games per week and currently played games. They have got a higher grade due to the focus on mobile games. Beta Family’s profiles are more sparse but contains ranking, submitted reports and feedback from previous test sessions. It is also possible to see which user tests they have already participated in. The number of submitted reports could be used as an indicator about how experienced the test users are. 5.2.2.7 Researcher Environment Under this category, aspects like organisation of recordings (in the dashboard where all videos are available), properties of the progress bar, video viewing and annotation features have been summed up into one grade. UXCam has a combined progress bar and annotation timeline, see figure 5, section 4.2.2. The annotations are directly connected to a timestamp and swipe directions and taps are displayed along the progress bar. It is also possible to reply to the annotations which can be convenient if there is more than one UX researcher working with the session. There are several nice features but the whole annotation functionality is currently a bit unstable and this has lowered the grade. Lookback och UXCam both show the name of the current view in relation to the progress bar. This makes the test user’s navigation in the application extra clear to the observer and it is easier to navigate to the desired time in the recording. However, Lookback shows this in a much clearer way than UXCam, see figure 4, section 4.2.1. At PlaytestCloud (see figure 6, section 4.2.3) and UserTesting (7, section 2.3), the focus is instead on relating the annotations with the progress bar. UserTesting only has timestamps. UserTesting and PlaytestCloud both provide a clearer overview of the recording and the annotations, and it is easier to work in their environments. There are less scrolling and mouse movements required and things are well organised. The annotation functionality in Lookback is especially tortuous since one has to click several times to create an annotation and then move across the entire screen in order to type it in. It is also difficult to get the correct timestamp since there is no ”create annotation at -5 seconds” feature, and it is quite difficult to hit the correct time when moving backwards on the progress bar using the mouse. UXCam has the same problem, and sometimes comments disappear. It is much easier to get the annotation at the correct time when using UserTesting or PlaytestCloud. At UXCam it is not possible to name folders and to reorganise recordings. All recordings are collected in folders in the current project according to date. The recordings are named ”Session 1”, ”Session 2”, ”Session 3”, and so on. Name and type of the device, country, length of recording, date and time is also displayed. The inability to rename and reorganise the recordings has lowered the grade, but if working properly the researcher environment would overall be very nice and easy to use. In Lookback all recordings are uploaded to the main 73 directory where length, date and device are displayed. It is possible to create new folders and to rename the recordings, which can hence be organised according to preference. This works very smoothly and has contributed to a good overall grade. In UserTesting, all projects with their associated recordings can be seen in the same view, the recordings can also be minimised leaving only the title of the test. This allows for a clear overview at the same time as there is no need to enter a folder in order to know which test session that has been uploaded, which is very convenient. The recording summary contains date, time, name and ranking of test user and demographics. At PlaytestCloud there is one directory for every test, marked with test name and date, and the test sessions are named according to the name of the test user, also specifying the length of the recording, age and country for the test user. When adding all this up, PlaytestCloud and UserTesting are superior as far as the research environment goes. 5.3 Analysis of Recordings The SRT records the screen, making it possible to see how the user interacts with the application through the touch screen. The device microphone records what the user is saying and some tools also provide facial recording using the front camera of the device, allowing the observer to see the user’s face when playing the game. The initial thought about discerning an unknown person’s emotions through facial, screen and voice recordings, was that it would be difficult and should require expert experience. However, it proved to be pretty straightforward and relatively easy to understand if the test user had a good or bad experience regardless of the observer’s previous experiences. It was also, in most cases, easy to notice when the test users were annoyed. This can however depend on how honest the user is, how they show their feelings and how much he or she talks about them out loud. When analysing the UX of the test users, the aspects explained in 2.1 were considered. In most cases it is possible to determine if the user is having fun and enjoy the game or not, and usability issues can be discovered if the user, for example, clicks on graphical elements that are not clickable or is unable to find what he or she is looking for in the game menu. But some factors can be vague to read. It can for example be difficult to conclude if the test user is concentrated, stressed or is experiencing flow, only based on observations from the recordings. There are also different motivations for playing a game, as described in section 2.2. It can be both task oriented and fun orientated, the menus and navigation need to be usable but the game itself should also be playable (see table 1 for more information about usability vs playability). From the analysis of the recordings, it was discovered that in most cases the test users experienced some kind of flow (see section 2.2) and most users rather felt that the game was challenging than boring.It is also important to keep in mind that the test users were first time players, which makes the game a bit more difficult. 5.3.1 Voice The voice recordings proved to provide a lot of valuable information regarding the user experience. It was discovered that the session recordings from PlaytestCloud and UserTesting provided a lot of important information even though they did not record the face. The test users voice both conveyed insights about their emotions, actions and reactions. In this study, the voice analysis was conducted by two fairly inexperienced human beings. Another method would be to use a computer system to interpret the voice, similar to the VIS mentioned in section 2.4.3. The system might, however, pick up less information than a human researcher, since voice reactions are natural and well known to the researcher. He or she might have easier to make correct interpretations than the computer might have. The drawback with the researcher analysing the voice recording is that it is time consuming. One of the goals with conducting 74 user testing and analysing the audio is that it should be effortless and not very time consuming. It is likely that a system similar to a VIS could be used in the future to automatically interpret the test users emotions by reading their voice. 5.3.2 Facial Since the facial reactions and expressions varied depending on the test users, it was difficult to know exactly what the users felt based only on analysis of the facial recordings. In this study, the analysis was conducted based on regular human observations, instead of by using for example the FACS, which was explained in section 2.4.2. An automatic system did not seem necessary in this study, since one of the goals with the research was to investigate how easy it was to conduct the user testing with session recording without an expert nor an automatic system that interprets the facial expressions. One of the investigation points of this study was to discern if facial recordings generate more valuable data compared to the use of screen and voice recordings only. Even though the screen and voice recordings provided enough information about the PX, the facial recording contributed to a greater overall impression. We should take advantage of this and use the facial recordings in order to get as much information as possible. If using a computer system for automatically evaluating the user experience, advanced algorithms and testing equipment would be necessary to achieve the same or similar results. 5.3.3 Read Emotions The emotional part is essential to the entire PX, as was explained in section 2.2, where Lazzaro’s four keys of fun were described and correlated to different emotions. Discerning the users emotions is an important part of PX evaluation in games, and as was concluded by Oliveira et al. (see section 2.3.5) the combination of screen recordings and facial recordings can be used to determine the emotions of the test user and to improve the evaluation of a user test. Oliveira et al. were not specifically discussing emotions in games but rather emotions emerging when interacting with a medical interface. But since the user’s emotions can be read by the same means, regardless of the purpose of the product that is being tested, their statements about how to capture these emotions are also relevant in regard to session recording of mobile games user tests. One interesting investigation point in this study was to compare the annotations about the test users’ actions and emotions during gameplay, with the questionnaire where the users were asked to state their actual emotions. When analysing the facial recordings it was rather difficult to gain insight about the test user’s emotions during gameplay by just watching their facial expressions, but this depended on who the test user was; some persons revealed a lot of information due to vivid facial expressions while some did not express any visible emotions at all. It can be difficult to establish if the perceived emotions match the user’s actual emotions even when using an additional post gameplay questionnaire. The questionnaire was filled out after the test session and the answers were based on the gaming experience as a whole, while the observations were made during the test session when the user’s feelings were based on what had happened previously in the game and what was happening right at that exact moment. Hence more specific emotional data could theoretically be collected from the observations than from the post gameplay questionnaire. This also makes it difficult to compare the observations with the answers to the questionnaire since they rather represent the player’s overall emotions during the gaming experience. However, the recordings could be used for understanding the origin of the emotions stated in the post gameplay questionnaire, and at the same time the post gameplay questionnaire could be used to confirm the players overall experience of the game. 75 In this study, it was discovered that the test users were balancing between the feelings stress, frustration and challenge, but also interest, engagement and excitement. Considering the fact that the test object was a stressful word game, this could be considered to be a pretty good result. One of the main reasons why the UX in games is more difficult to evaluate compared to the UX of regular software is that emotions that are usually considered to be negative can in fact be positive, see section 2.2. However, if challenge and stress would be related to, for example, navigating the game menu, there would be a usability issue, it should not be a challenge trying to find the correct buttons. 5.4 Workflow The resulting workflow developed in this thesis work is a set of guidelines for how to conduct user testing of mobile games using SRTs. This procedure should be carried out often and repeatedly, as explained in the resulting workflow in section 4.6. This complies with Lookback’s guide for user testing in section 2.5.4, as well as with Nielsen’s statements which was mentioned in section 2.3.7. Inokon also recommends researchers to not postpone testing and analysis since the game is constantly changing 2.3.6. The workflow has also been developed based on the existing workflow for remote usability testing from Usability.gov [54], see section2.5.1, and the checklist for how to conduct mobile application testing which was developed by UserTesting [60] and was covered in section 2.5.2. 5.4.1 Planning the Test and Writing Instructions As mentioned in section 2.3.3, it is important to write short and clear instructions when preparing a remote user test. One should be extra careful with the choice of words and make sure it matches the words used in the the actual application. For example, when writing instructions about the UI, it is important to refer to things with their proper names. We made the mistake that in task one (see step 6 in appendix C) we told the users to register instead of asking them to ”create an account” as was stated in the first level of the application menu. One important insight acquired during the execution of this study was that things we take for granted can be very difficult for someone else. For example, several of the test users did not understand how to start a new game, and one of them thought that an ad was part of the game. This is important to keep in mind when designing the test. Assume that the technical level is not very high and that everything that can be misinterpreted will be misinterpreted. It is also important to take into consideration how tasks influence the UX and the users interaction with the application. Is the use of specific tasks limiting to the user, are they resulting in loss of information? Or are they an asset and a prerequisite, for example when wanting to test a specific part of the application? PlaytestCloud’s approach is that the gaming experience becomes more natural when the player plays the game like they would normally do. This can be a good starting point, but it is not really what actually happens on PlaytestCloud. For one, the test users are required to play for a specified amount of time, where the shortest test session is 15 minutes. Secondly, the players are still aware of the fact that they are participating in a UX test and uses the think-aloud concept where they are encouraged to talk out loud. This results in the test users exploring the entire application and commenting on everything from the game concept, to the graphics and the navigation. This is all valuable information but if there is some specific part which need to be investigated (like for example the on-boarding process), time could be saved both for the test users and the UX researcher who will watch the recordings, if it was possible to specify tasks or provide instructions. The use of tasks is a way of guiding the test user in the desired direction in order to gain insight into the relevant area. But of course, if 76 the tasks are too limiting, information will be lost and this is no good either. For example, in our test, task number one was ”Register”. We thought this would be a clear but not too limiting way to investigate the on-boarding process. But when watching the recordings from PlaytestCloud (where it was not possible to specify tasks), we discovered that several users never even created an account but instead they chose to play offline. This revealed valuable information about the nature of new players, as well as both usability and UX issues in the offline gaming mode. If we had not been testing without specifying tasks, these problems would not have been highlighted, unless players skipped task one in our instructions. This if proof of how important it is to thoroughly consider what instructions to give the players and sometimes it might be better to let them explore the game by themselves without any guidelines. The think-aloud approach, which is described in section 2.2.1.3, is another thing which needs to be carefully considered if it should be applied or not. Not only can it be difficult for some users to feel comfortable when speaking their thoughts out loud and also to verbalise what they are doing, the think-aloud approach can also move focus away from the gameplay and distract the players. This is why we included in the instructions, see appendix C and D, that the users should speak out loud only when navigating the application and not during gameplay. But the think-aloud approach is a good way to get to know the users thoughts which can not be seen on screen or face recordings. A lot of feedback regarding the game concept can also be collected in this way and think-aloud during gameplay can give valuable information about the PX. The test users at Beta Family, PlaytestCloud and UserTesting are encouraged to speak their thoughts out loud during the test session, so if think-aloud is not required the test users have to be informed about this. However, it is unsure how this affects the gaming experience. When using recording tools like Lookback and UXCam, it is also important to specify in the instructions that the application can not be closed or sent to the background during the test session. In the case of Lookback, using default settings, the recording will be lost if upload is not clicked and if using the default settings in UXCam, the recording will be uploaded when the application is sent to the background and a new will start when the application is opened again. In hindsight, we should have emphasised this in the test instructions. Since the player is supposed to challenge opponents in the game, it can take some time before a round can be played and some users got bored, closed the application and then resumed playing when someone had accepted their game request or when it was their turn to play. Since it stopped recording when the user left the game, parts of the sessions were lost or divided into several parts. 5.4.2 Pilot Testing When preparing a UX test it is important to write clear test instructions, define tasks the test user should perform, compose a clear and relevant post gameplay questionnaire and then pilot test these in order to make sure that everything is clear to the test user and that the test will generate the desired information. In the pilot tests carried out in this study some additional questions were added after the post gameplay questionnaire and the SRT survey, see appendix H. The value of pilot testing was discovered when testing with Lookback and UXCam. The pilot test was conducted with a test user located at a remote location. The first test that was sent out contained the wrong .ipa-file (the iOS installation file for the application) containing another SRT which had different instructions, and the pilot test made us aware that we need to be more careful and make sure everything is correct before sending out the test. However, it was not until after completing a couple of more tests that we discovered that there was a problem with application crashes and also with uploading the recordings. Some users also found it problematic to know what to do next because they had navigated to the next page 77 of the test instructions too quickly. To prevent this, a checkbox was added at the bottom of the instructions and tasks page to make the user confirm that all of the instructions and tasks had been carried out before clicking forward to the next page. This problem was only discovered since one of the test facilitators were on the set and could advice the test user to go back when they expressed their confusion. Furthermore, initially there were instructions about deleting the application right after the test tasks had been completed, but when not receiving all recordings, this was changed to asking the user not to uninstall the application until someone from the research team had confirmed that the recording had been uploaded. This last change meant more job for the facilitators and could easily be avoided by implementing a confirmation message feature in the SRT, but that is currently not possible to do with neither Lookback nor UXCam. SuperRecorder does however have this functionality. Lessons learned from initial testing: • Be careful to send out the correct files and links • Make sure the recordings uploads and works • Make sure the application is not keen to crash, unless the recording tool can handle this • Make sure there are no possibilities for the user to move forward too quickly. Changes after initial testing: • Added a checkbox for the user to confirm that all the instructions or tasks had been carried out before they could click forward to the next page. • Added instructions about not uninstalling the app until getting the green light from the test facilitator. When carrying out the test sessions with friends and family members, being on the set with them, we realised that it would be a good idea to perform a local pilot test and observe the test user while they are going through with the test. In this way ambiguities in the instructions and regarding how to use the tool are easier to spot. It can also be a good idea to carry out more than one pilot test with different kinds of users if having a diverse target group. It could for example be good to test on both tech-savvy and less tech-savvy users, on different devices and also to perform one pilot test locally and one remotely before making the test live. But of course, time and cost has to be taken into account. When using a test service you have less control over navigation throughout the test and also over formatting and page content. But it would still be a good idea to perform a light version of the test with friends, family or co-workers to make sure the instructions and the tasks are good enough and that the post gameplay questionnaire is easy to understand. 5.4.3 Deciding on a Session Recording Tool The market for session recording tools is rapidly changing and their features and websites are constantly being updated. While doing this study; new tools have become available, tools have been bought by other companies, features and platform support have been added, SDKs have been updated more than one time and websites and dashboards have been relaunched. Features have been released in beta and tools have gone from beta to full release. All of this has happened within just a couple of months. The fact that things are changing so rapidly 78 makes it difficult to compare and evaluate the tools. Our tables with tool/service properties will probably be outdated in a couple of months time and the initial tables used for deciding on which tools to test (see table 3 and 4, section 4.1) has already been updated several times during the course of the study. We can only assess what is available right now, and it is difficult to give the tools a fair judgment since they are all still in development, containing bugs and lacking functionalities. For example, UXCam seemed to work fine first when we tested it by ourselves at the office, but towards the end of the study almost nothing on their website worked. During the three weeks we dedicated to user testing with the SRTs, a new website and a new SDK were released and unfortunately the old SDK which we had implemented was no longer compatible with the website so the recordings from half of the test sessions were lost (there was no indication or warning about this from the company). Having to upgrade the SDKs and redo the integration is time consuming and this could be a brake block in the UX evaluation and development process if UX tests are performed frequently. Additionally, when being forced to reintegrate the SDK it is easy to miss small settings like, for example, turning on the front camera or changing default start or stop procedure to custom settings, etc. When choosing which tool to use, it is important to have a clear test plan in order to know what properties the tool should have (or one can do the opposite, and choose a tool and adjust the test after the limitations of the tool). Some important things to consider when choosing a tool is if there is a possibility to specify test instructions, tasks, time limits etc. The time limit of the recordings is important to consider since the user has to have plenty of time to complete the test session but also because if the test tasks can be completed quickly the player should not have to play for longer than needed, since this will generate unnecessary material that no one will have time to go through. For example, one of the average sessions recorded with Lookback (where we provided our own test instructions) lasted for about 7 minutes while one of the sessions provided by PlaytestCloud (where the minimum session time was 15 minutes and it was not possible to specify instructions or tasks) lasted for 40 minutes. This was, however, an extreme case with an enthusiastic player, most PlaytestCloud sessions lasted about 20 minutes. On one hand, it might be nice to see for how long the test users continue to play, but on the other hand, that could also be a task. Our last task was ”Play the game until you have leveled up to at least level 2” but most of the players which succeeded in reaching level two stopped when reaching it. When deciding on a tool, it is also important to consider how high frame rate that is needed. This mainly depends on how swift movements the user is expected to make while navigating and playing the game. If the frame rate is too low, there is a risk that important information is lost and it also gets less natural to watch for the UX researcher and issues with the game application itself may pass unnoticed. AppSee’s and TestFairy’s SRTs were removed from the study since the frame rate was too low and it did not record all the movements of the player when the game was played. All the other tools we tried out had a high enough frame rate. In UXCam’s SRT it is also possible to adjust the frame rate and the quality of the video. The need for a higher frame rate depends on which type of game that is to be tested, if it is a stressful game requiring many swift movements a higher frame rate is needed. But it is also important to keep in mind that a higher frame rate and a higher quality also puts higher demands on storage memory, processing power and bandwidth.It is also important to consider which development stage the game is currently in, since not all of the SRTs continues recording after an application crash. Application crashes was one of the main reasons why many of the session recordings from the test sessions conducted in this study was lost. 79 5.4.4 Deciding on Recruitment, Distribution and Test Set Up When deciding on which distribution and test set up service to use, it is important to have a clear plan regarding whom the test users should be. When that is clear, it is easier to see if the test service fulfills the requirements or not. This is important for example when considering aspects like demographic specifications for the test users. If the test users should be from for example Australia, it is important to use a test service which can recruit test users from Australia. It is also necessary to know if the test should be conducted remotely or locally. If the test should be conducted locally at the office (perhaps because of confidentiality reasons), there is a need to be able to handpick and invite specific test users and hence it is necessary to use a service where this is possible. Another factor to consider is the possibility to contact the test users for follow-up questions, which might be necessary if there is a need to supplement the answers to the questionnaire or if anything is unclear from the test session. Making sure the tool supports the platform for the game is also essential, since otherwise it is not possible to install the SRT in the game. Important factors to consider are also the price of the service, the time it takes to gain access to the recordings, provision of crash logs and also how much time the researcher can invest in recruitment and test set up. When using separate services for distribution, test set up and SRT, it is important to consider how the recordings and the collected answers can be connected to each other (if there is a need for it). When conducting our tests using Lookback’s and UXCam’s SRTs in combination with Beta Family’s SuperSend (as distribution service) and Google Forms (for presentation of test instructions and post gameplay questionnaire), an issue occurred regarding how to relate the recordings to the questionnaire answers. This was solved by giving each test user an unique ID which they had to type in at the beginning of the test and also speak out loud at the beginning of the recording. The name of the recordings on Lookback was thereafter manually changed to the correct ID, but at UXCam it was not possible to change the name of the recordings. This approach was quite cumbersome since unique test instructions had to be sent out to each user in order to give them a unique ID. This meant that a record had to be held for which of the pre composed IDs that had already been used, and when sending out the tests; the information text had to be altered for every test user and the .ipa-file had to be uploaded to SuperRecorder once for every test user, instead of just sending the same test to all the participants e-mail addresses. Furthermore, with this approach, the beginning of the session recording had to be watched before it could be named properly. Since the test user base was fairly small, it would probably have worked out well just by comparing the timestamps of the survey and the session recording. But in practice it is possible to get several recordings and several surveys with the same or very similar timestamp. Furthermore, the questionnaire and the recording will probably not have the exact same timestamp, hence it can be a problem to relate them to each other. When the testing period was initiated, UXCam did only show the date the recording was uploaded and not the time, hence we had to come up with another solution. The use of IDs is rather time consuming and it also puts more responsibility on the test users which has to remember to state their ID at the beginning of the recording and in the questionnaire. This method is also prone to errors since both the test participant and the UX researcher might mix up the IDs. Furthermore, it might be a good idea to use a tool where the tasks and the instructions can be displayed in the application itself thorough the SRT, which is the case in UserTesting and SuperRecorder. This allows the test participants to use only one device. If using, for example, Lookback and UXCam, the test instructions, tasks and questionnaire needs to be viewed on a second device or be sent to the test user analogously, or else the recording will stop each time the user needs to read the instructions. It is important to inform the test users that they can not leave the application before the test session has been completed. 80 5.5 Further Research An interesting investigation topic for future research is if monetary incentives affect the test users and makes them sign up for the test and submit the test report faster, on for example Beta Family. Another interesting aspect that can be investigated is how the test results are affected depending on if the test user is a regular participant in user tests or not. It is also interesting to examine how freedom and options in the recording tools could affect the PX. This study has focused on the iOS platform only. It could be interesting to look into how well the SRTs are working with other platforms and devices as well. Like for example Android, Unity, tablets. Maybe some tools are more appropriate than others depending on the platform which will be used in the test. Furthermore, there are several tools available on the market which were not further investigated in this study, also the market is rapidly changing and soon there might be even more tools and services available, it could be interesting to take a closer look at these. Regarding the workflow, it could be further tested in practice in order to see if there is anything that could be improved. It would be interesting to iterate the process while testing on different kinds of games, in this way it would be possible to discover more issues and benefits with the various tools. Separate, less general, workflows could also be developed for certain tools, games, testing objectives or other contexts. 81 6 Conclusion This chapter aims to answer the initial objectives of the thesis, and to present the conclusions that can be drawn based on this study. Based on the results of the study, it has been made clear that it is possible to conduct user testing of mobile games with the use of SRTs. A workflow for how to conduct UX testing in mobile games is presented in section 4.6. The course of action for how the recordings can be analysed and interpreted into information that can be used to address UX and usability issues, is also described in the workflow in section 4.6.8. Remote testing with the use of a SRT is a suitable method for testing of mobile games since the user can perform the test in a natural environment on a familiar device, which implies that the test should have a minimal effect on the PX compared to traditional testing methods. Another advantage is that the testing can be done unmoderated, meaning it does not have to be in real-time which allows multiple test users to perform the test simultaneously. In order to perform the test smoothly it is however important to watch out for technical issues, which can be avoided by writing detailed instructions and doing thorough pilot testing. Based on the study, it can be concluded that there is no one perfect method for conducting user tests with SRTs and there is no ”one method fits all”. There are many factors to consider; it is important to know what to test, who should test it and then decide on a test service and a SRT which suits the specific need. None of the investigated SRTs work perfectly, but each of the tools investigated has their advantages as well as drawbacks, and the ideal tool would be a mixture of all of these. Since many of the SRTs are still in an early development stage, the tools and websites are constantly being updated and they are therefore not completely reliable. The ideal tool would be connected to a service which provides both test user recruitment, test set up and distribution, since it would be appropriate to store everything in the same place and not have to worry about connecting the test user’s questionnaires with the corresponding recordings. Another valuable feature would be the possibility to summarise the results and receive autogenerated diagrams of statistics and quantitative data directly on the website. It would then be possible to share the results with the rest of the team instantly, without using multiple services for writing and storing documents, recordings and other data. This would both facilitate organisation of test data and save time. The ideal SRT would also provide features and properties like: • Tasks and questions being displayed directly in the application • Feedback stating if the recording succeeded to upload or if something went wrong • Possibility to upload the recordings again • Possibility to pause the recording (for example, when waiting for an opponent or if someone interrupts during the test session) • Possibility to change settings from the online dashboard (and not only in the code) • Preview possibility • Statistics, metrics and auto generated diagrams in the dashboard • Possibility to adjust camera position (possibility to see the face for a few seconds before starting the test, to make sure the face is visible to the camera) • No loss of information due to application crashes 82 However, since the ideal tool with all the desired functionalities is not currently available on the market, the recommended SRT and test service depends on how much control the researcher wants to have over the test and also which resources that are available. If there is a short time frame, UserTesting is recommended since they provide test users within an hour. If the game application is in an early development stage, PlaytestCloud is recommended since they continue recording after an application crash, hence making it possible to view the entire test session, while also gaining insights in when and why application crashes occur. If a quick test set up is required and the only desire is to just have the test users play the game and there is no need for advanced test specifications, PlaytestCloud can be used since the test set up process is very quick but also limited. The SRTs without facial recordings can still provide a lot of information, and in many cases it is enough to be able to identify UX and usability issues. However, the facial reactions can help when analysing and identifying emotions. If there is a need for facial recordings, the SRTs Lookback and UXCam can be used. The recommended tool out of these two are, however, Lookback; since UXCam has been unstable the last month and the dashboard and researcher environment has not been working properly (however this might be changed shortly). Lookback is also suitable when the game application needs to be tested in the office due to, for example, confidentiality reasons. If this is not an issue, however, and the test can be performed by people outside of the office, we do not recommend independent recruitment of test users (if there is not already an existing test user base) since this is a time consuming process which will require a larger time frame. Lookback can, however, advantageously be used together with Beta Family’s test user recruitment and test set up service. Beta Family offer the possibility to make a public test where anyone of Beta Family’s test users can participate in the test, it is also possible to create a private test where the test users can be handpicked from their test user base or recruited by other means and invited through e-mail. There is a lack of easy-accessible functioning standardised methodologies for conducting user tests using SRTs. Many companies are reluctant to perform testing due to lack of knowledge or resources, but by discovering UX and usability issues at an early stage both time and money can be saved. It is also important to test often and to make it a natural part of the development process. The workflow produced in this thesis work can be applied regardless of which session recording tool or test service that is being used. User testing with the help of session recording tools is a rapidly changing area and tools and test services are constantly being updated and new tools and services emerge. Therefore, the workflow contains the most important parts involved in UX and usability testing and do not focus on any specific tool or service. This will hopefully make it applicable also in the long run, no matter which tools or test services that are available on the market. In order to be able to decide on the most suitable tool or test service for the specific test object and the testing objectives, important factors to considered have been included in the workflow. Also tables displaying the current properties of the investigated tools have been included in order to make it easier to compare the tools and to decide on which tool that is most appropriate in the context, however these do only cover a handful of the tools and services which are available on the market. See appendix I for the final version of the workflow which was presented to MAG Interactive. 83 References [1] Albert W, Tullis T. Measuring the User Experience: Collecting, Analyzing, and Presenting Usability Metrics. Waltham: Morgan Kaufmann; 2013. [2] Bernhaupt R. User Experience Evaluation in Entertainment, In: Bernhaupt R, editor. Evaluating User Experience in Games - Concepts and Methods. London: Springer; 2003. p 3-10. [3] Becerra S, Smith L. How Research Plays: The UX of Mobile Gaming. [Internet]. AnswerLab. 2011 [cited 2015 Feb 5]. Slideshow presentation available from: http://answerlab.com/ wp-content/uploads/2012/05/how-research-plays.pdf. [4] Brown E. The Life and Tools of a Games Designer, In: Bernhaupt R, editor. Evaluating User Experience in Games - Concepts and Methods. London: Springer; 2003. p 73-87. [5] Cavillo-Gámez EH, Cairns P, Cox AL. Assessing the Core Elements of the gaming experience, In: Bernhaupt R, editor. Evaluating User Experience in Games - Concepts and Methods. London: Springer; 2003. p 47-73. [6] Csikszentmihalyi M. Flow and Education [Internet]. Quality of Life Research Center Claremont, Claremont Graduate University, Claremont. 2007 [cited 2015 Feb 9]. Presentation available from: http://www.ppc.sas.upenn.edu/csikszentmihalyipowerpoint.pdf. [7] Darwin C. The Expression of the Emotion in Man and Animal [Internet]. London 1872 [cited 2015 February 12]. p 366. Available from: http://darwin-online.org.uk/content/ frameset?pageseq=1&itemID=F1142&viewtype=text. [8] Fortugno N. The Strange Case of the Casual Gamer, In: Isbister K, Schaffer N, editor. Game Usability - Advice from the Experts for Advancing the Player Experience. Burlington: Morgan Kaufmann; 2008. p 143-145. [9] Genise P. Usability Evaluation: Methods and Techniques: Version 2.0 [Internet]. 2002 [cited 2015 Feb 24]. Available from: http://www.netlibrary.net/article/whebn0014468774/ comparison%20of%20usability%20evaluation%20methods. [10] TechTerms.com [Internet]. [cited 2015 May 17]. Available from: http://techterms.com/ definition/gui. [11] Hamm J, Kohler CG, Gur RC, Verma R. Automated Facial Action Coding System for Dynamic Analysis of Facial Expressions in Neuropsychiatric Disorders. Journal of Neuroscience Methods [Internet]. 2011 September 15 [accessed: 2015 Feb 17];200(2):237–256. Available from: http://www.ncbi.nlm.nih.gov/pmc/articles/ PMC3402717/pdf/nihms313731.pdf. [12] Helppi VV. Mobile Game Testing – Part 1: The Importance and Difference from App Testing. Testdroid. 2014 [cited 2015 Feb 9]. Available from: http://testdroid.com/tech/ mobile-game-testing-the-importance-and-difference-from-app-testing. [13] Helppi VV. Mobile Game Testing – Part 3: Graphics Performance and UX with Automation. Testdroid [Internet]. 2014 [cited 2015 Feb 9]. Available from: http://testdroid.com/tech/ mobile-game-testing-part-3-graphics-performance-and-ux-with-automation. 84 [14] Merriam-Webster’s Learner’s Dictionary [Internet]. [cited 2015 May 18]. Available from: http://www.learnersdictionary.com/definition/heuristic. [15] Hicks M, Heywood A, Küpper A. EXPOSING THE SYNERGY... Between Market Research & User Experience (UX). Slideshare [Internet] 2014 [cited 2015 April 13]. Slideshow presentation available from: http://www.slideshare.net/martinjhicks/ gf-k-market-research-ux-uxpa-2014final-37452766. Slide 18. [16] IJsselsteijn1 W, van den Hoogen W, Klimmt C, de Kort Y, Lindley C, Mathiak K, Poels K, Ravaja N, Turpeinen M, Vorderer P. Measuring the Experience of Digital Game Enjoyment [Internet]. [cited 2015 Feb 24]. Available from: http://www.noldus.com/mb2008/individual_papers/Symposium%20vanderHeijden/ Symposium_vanderHeijden_IJsselsteijn.pdf. [17] Isbister K, Schaffer N. Game Usability - Advice from the Experts for Advancing the Player Experience. Burlington. Morgan Kaufmann; 2008. [18] Isbister K. Enabling Social Play: A Framework for Design and Evaluation, In: Bernhaupt R, editor. Evaluating User Experience in Games - Concepts and Methods. London: Springer; 2003. p 11-22. [19] ISO ISO 9241-210:2010 Ergonomics of human-system interaction – Part 210: Humancentred design for interactive systems. 2010. [20] Kolko BE. Digital Games Course - Definitions. Course page for Technical Communication 498: Digital Games, University of Washington [Internet]. 2005 [cited 2015 Feb 9]. Available from: http://faculty.washington.edu/bkolko/games/definitions.shtml. [21] Korhonen H, Koivisto EMI. Playability heuristics for mobile games. Nokia Research Center [Internet]. 2006 [cited 2015 Feb 9]. Available from: http://citeseerx.ist.psu. edu/viewdoc/download;jsessionid=241373B4E699D5338FE43E9F073E021C?doi=10.1. 1.494.3021&rep=rep1&type=pdf. [22] Kostov V, Fukoda S. Emotion in User Interface, Voice Interaction System. Tokyo Metropolitan institute of Technology, Department of Production, Information and Systems Engineering, Tokyo 191, Japan. 2000. [23] Krug S. Don’t Make Me Think, Revisited: A Common Sense Approach to Web Usability. 3rd Edition. Berkeley, CA: New Riders; 2014. [24] Krug S, Sharon T. Remotely Possible - Part 1: Remote Unmoderated Usability Testing Webinar, UserTesting.com [Internet] 2015 [cited 2015 April 8]. Available from: http:// www.usertesting.com/resources/remotely-possible . [25] Lankes M, Bernhaupt R, Tscheligi M. Evaluating User Experience Factors Using Experiments: Expressive Artificial Faces Embedded in Contexts, In: Bernhaupt R, editor. Evaluating User Experience in Games - Concepts and Methods. London: Springer; 2003. p 165-183. [26] Lazzaro N. Why We Play Games: Four Keys to More Emotion Without Story. XEODesign [Internet]. 2004 [cited 2015 Feb 9]. Available from: http://www.slideshare.net/ NicoleLazzaro/gdc-4-emotions-social-games-lazzaro-slides-100311?related=1. 85 [27] Lazzaro N. Enabling Social Play: A Framework for Design and Evaluation, In: Isbister K, Schaffer N, editor. Game Usability - Advice from the Experts for Advancing the Player Experience. Burlington. Morgan Kaufmann; 2008. p. 319-320. [28] Lemay P, Maheux-Lessard M. Investigating Experience and Attitudes Toward Videogames Using a Semantic Differential Methodology, In: Bernhaupt R, editor. Evaluating User Experience in Games - Concepts and Methods. London: Springer; 2003. p 11-22. [29] Long S, Foley E, McAllister G. Understanding the Contribution of Biometrics to Games User Research [Internet]. 2011 [cited 2015 Feb 24]. Available from: http://citeseerx. ist.psu.edu/viewdoc/download?doi=10.1.1.224.9755&rep=rep1&type=pdf. [30] Lookback Documentation. Using Lookback for UX research [Internet]. [cited 2015 Feb 27]. Available from: https://lookback.io/docs/usability-testing. [31] Lookback. Lookback’s Official Website [Internet]. https://lookback.io. [32] Maymin S. Flow This Emotional Life. [Internet]. [cited 2015 May 24]. Available from: http://www.pbs.org/thisemotionallife/blogs/flow. [33] McAllister G. Delivering Successful Games. Player Research [Internet]. Presented at Games Industry Analytics Forum 5. London. 2014 [cited 2015 Feb 24]. Slideshow presentation available from: http://www.slideshare.net/gamesanalytics/ delivering-successful-games-graham-mcallister-player-research-giaf-5 Slide 4. [34] Mueller F, Bianchi-Berthouze N. Evaluating Exertion Games, In: Bernhaupt R, editor. Evaluating User Experience in Games - Concepts and Methods. London: Springer; 2003. p 187-207. [35] Nacke L, Drachen A, Korhonen H, Kuikkaniemi K, Niesenhaus J, van den Hoogen W, Poels K, IJsselsteijn W, de Kort Y. Playability & Player Experience Research [Internet]. Presented at DiGRA. London. 2009 [cited 2015 Feb 6]. Slideshow presentation available from: http: //www.slideshare.net/acagamic/playability-player-experience-research. Slide 25. [36] Nemberg M. Five Common Mistakes in Game Usability Testing And How To Avoid Them [Internet]. Trinidad Consulting; 2013 May 22 [cited 2015 Feb 26]. Available from: http://www.trinidad.ee/blog/ 5-common-mistakes-in-game-usability-testing-and-how-to-avoid-them/. [37] Net Promoter The Net Promoter Score and System [Internet]. [cited 2015 Mar 12]. Available from: http://www.netpromoter.com/why-net-promoter/know. [38] Net Promoter About Net Promoter [Internet]. [cited 2015 Mar 17]. Available from: http: //www.netpromoter.com/why-net-promoter/about-net-promoter. [39] Nielsen J. How to Conduct a Heuristic Evaluation [Internet]. Nielsen Norman Group; 1995 January 1 [cited 2015 Feb 23]. Available from: http://www.nngroup.com/articles/ how-to-conduct-a-heuristic-evaluation/. [40] Beta Family [Internet]. 2015 [cited 2015 May 18]. http://betafamily.com. 86 [41] Nielsen J. Why You Only Need to Test with 5 Users [Internet]. Nielsen Norman Group; 2000 March 19 [cited 2015 Feb 11]. Available from: http://www.nngroup.com/articles/ why-you-only-need-to-test-with-5-users/. [42] Nielsen J. Recruiting Test Participants for Usability Studies [Internet]. Nielsen Norman Group; 2003 January 20 [cited 2015 Feb 11]. Available from: http://www.nngroup.com/ articles/recruiting-test-participants-for-usability-studies/. [43] Nielsen J. Keep Online Surveys Short [Internet]. Nielsen Norman Group; 2004 February 2 [cited 2015 Mar 10]. Available from: http://www.nngroup.com/articles/ keep-online-surveys-short/. [44] Nielsen J. Quantitative Studies: How Many Users to Test? [Internet]. Nielsen Norman Group; 2006 June 26. Available from: http://www.nngroup.com/articles/ quantitative-studies-how-many-users/ (Accessed: February 11 2015). [45] Oliveira A, Pinho C, Monteiro S, Marcos A, Marques A. ”It is always a lot of fun!” Exploring Dimensions of Digital Game Experience using Focus Group Methodology. Computers in Biology and Medicine [Internet]. 2013 December 1 [cited 2015 Feb 11];43(12):2205–2213. Available: http://www.sciencedirect.com/science/article/ pii/S0010482513002898. [46] PlaytestCloud. Playtestcloud’s Official Website [Internet]. https://www.playtestcloud. com. [47] Reichheld FF. The One Number You Need to Grow [Internet]. Harvards Business Review; 2003 Dec [cited 2015 Mar 10]. Available from: https://hbr.org/2003/12/ the-one-number-you-need-to-grow/ar/1. [48] Rohrer C. When to Use Which User-Experience Research Methods [Internet]. Norman Nielsen Group; 2014 October 12 [cited 2015 Feb 10]. Available from: http://www.nngroup. com/articles/which-ux-research-methods/. [49] Schade A. Remote Usability Tests: Moderated and Unmoderated [Internet]. Nielsen Norman Group; 2013 October 12 [cited 2015 Feb 10]. Available from: http://www.nngroup.com/ articles/remote-usability-tests/ [50] Suen D. How do you conduct a good usability study for a mobile game? [Internet]. [cited 2015 Feb 26]. Available from: http://www.quora.com/ How-do-you-conduct-a-good-usability-study-for-a-mobile-game. [51] Takatalo J, Häkkinen J, Kaistinen J, Nyman G. Presence, Involvement and Flow in Digital Games, In: Bernhaupt R, editor. Evaluating User Experience in Games - Concepts and Methods. London: Springer; 2003. p 23-46. [52] Ülger G. Playability heuristics for mobile games using touchscreen displays. M.Sc. thesis. Middle East Technical University; 2013. [53] The University of Texas at Austin. Observational methods for usability testing [Internet]. Instructional Assessment Resources; 2011 Sep 21 [cited 2015 Feb 23]. Available from: https://www.utexas.edu/academic/ctl/assessment/iar/tech/gather/method/ use-types.php. 87 [54] Usability.gov. Remote Testing [Internet]. [cited 2015 Feb 26]. Available from: http://www. usability.gov/how-to-and-tools/methods/remote-testing.html. [55] Usability first. Usability First’s glossary [Internet]. [cited 2015 Feb 9]. Available from: http://www.usabilityfirst.com/glossary/playability. [56] Usability Partners ISO standards, Standards in usability and user-centred design [Internet]. [cited 2015 Jan 26]. Available from: http://www.usabilitypartners.se/ about-usability/iso-standards. [57] UserTesting. UserTesting’s Official Website [Internet]. https://www.usertesting.com. [58] UserTesting. 2014 UX Industry Survey [Internet]. 2014 [cited 2015 Feb 26]. Available from: http://info.usertesting.com/ux-industry-survey-2014-results.html. [59] UserTesting. 2013 UX Industry Survey [Internet]. 2013 [cited 2015 Feb 27]. Available from: http://info.usertesting.com/ux-industry-survey-2013-results.html. [60] UserTesting. Checklist: Mobile App Testing [Internet]. [cited 2015 Feb 27] Available from: http://info.usertesting.com/Mobile-Usability-Testing-Checklist.html. [61] UserZoom. A demo study to illustrate the impact of screen recordings in usability research [Internet]. 2014 August 11 [cited 2015 Feb 26]. Available from: http://www.userzoom. com/casestudy/capture-screen-video-every-user-session/. [62] UXCam. UXCam’s Official Website [Internet]. https://uxcam.com. 88 Appendix A - Initial Test Instructions Testing the mobile game Ruzzle This test is part of a Master’s Thesis study at Linköping University, spring 2015. The thesis aims to investigate session recording tools used in user experience and usability tests of mobile games. Initially, you will be asked to accept a declaration of informed consent and fill in some information about your background. Thereafter you will be asked to perform 3 tasks in the game application Ruzzle, while the test session is being recorded. After completing the test session you will be asked to complete a questionnaire about your experience of the game and the recording tool. The study will take approximately 20 minutes in total and you will need a stable WiFi connection. Also make sure to be in a brightly lit environment. Thank you for your participation in this user experience test! Best Wishes Karro and Veronica Appendix B - Declaration of Informed Consent This test is part of a Master’s Thesis study at Linköping University, spring 2015. Observations from the test will be discussed in the thesis but no pictures of your face will be used. No names will be mentioned, all data collected during the test will be handled anonymously and the test can be withdrawn from the study if the user wish so. The purpose of this study is to evaluate the built in test session recording tool as well as the on-boarding process of the game application. The study will test the user experience of the game application and the efficiency of the session recording tool and not the user’s skills in using it. You will be performing some predefined tasks which will be recorded on video. Your face och voice will be recorded using the front camera and the microphone of the device. Additionally the screen will be recorded in order to investigate the navigation in the app. The test facilitators will watch the recordings in order to evaluate the tool used for recording the test as well as the game application. There is no obligation to participate in this study and you are free to withdraw your participation at any time without further explanation. Thank you for your participation! I declare that I wish to take part in this study X I have read and understand the statements in this informed consent document Appendix C - Test Procedure: Lookback Do not go to the next page before completing all the test instructions, thank you. Try to hold the device so that your entire face will be visible to the front camera of the device, but please feel comfortable and try not to think about the fact that you are being recorded. The game and the session recording tools will be assessed, not your performance. We appreciate if you are completely honest and speak your mind. If possible, try to verbalise your thoughts in your native language when navigating in the app (e.g explain your thoughts and intentions when registering, using the menus/buttons etc.). Please comment on anything you find good or bad, all criticism will be of value. However when playing the game, act naturally like you would when normally playing a game, no need to verbalise your thoughts. Cursing or laughter etc. is encouraged if coming naturally. Please take the following into consideration: • It is not possible to do in-app purchases. • The study will take approximately 20 minutes. • Please play the game in your native language. The default setting of the game is the same as the language on your device. If you want to change the language of the game, click on ”Start a new game” and click on the button displaying a language in the upper right corner. • If the app crashes during the game session, please start again and report the crash in the questionnaire (under the question ”Did you experience any difficulties with the recording tool during the test session?”). This study is voluntary and your contribution may be withdrawn from the study if you wish so. Before starting the test make sure your device fulfills the technical requirements: iOS 6.0 or later. Also make sure the device has enough memory to install the app and store the recordings. 1. If you have not already downloaded the application: Download and install the game using the link in the e-mail. Open the link in your mobile browser or mobile e-mail client and click on ”Download app”. You can also open the link in your computer browser and download the app by scanning the QR-code with the barcode reader on your device. 2. Read through the tasks (see below) and make sure you understand them. 3. Start the game application. 4. Shake the device, a menu will appear. Press record (the big, red, round button) to start recording. Make sure the camera is set to on (this is the default setting). 5. Say your test number out loud (this number was given to you in the e-mail). 6. Perform the tasks: (a) Register (b) Play the game until you have levelled up to at least level 2 (c) Please summarise your thoughts and experience of the game by speaking out loud. 7. Shake the device, a menu will appear. Press stop (the big, red, square button) to stop recording. A preview of your game session will appear on the screen. If you want to, feel free to watch it. 8. Click on upload in the upper right corner to upload the video. Appendix D - Test Procedure: UXCam Do not go to the next page before completing all the test instructions, thank you. Try to hold the device so that your entire face will be visible to the front camera of the device, but please feel comfortable and try not to think about the fact that you are being recorded. The game and the session recording tools will be assessed, not your performance. We appreciate if you are completely honest and speak your mind. If possible, try to verbalise your thoughts in your native language when navigating in the app (e.g explain your thoughts and intentions when registering, using the menus/buttons etc.). Please comment on anything you find good or bad, all criticism will be of value. However when playing the game, act naturally like you would when normally playing a game, no need to verbalise your thoughts. Cursing or laughter etc. is encouraged if coming naturally. Please take the following into consideration: • It is not possible to do in-app purchases. • The study will take approximately 20 minutes. • Please play the game in your native language. The default setting of the game is the same as the language on your device. If you want to change the language of the game, click on ”Start a new game” and click on the button displaying a language in the upper right corner. • If the app crashes during the game session, please start again and report the crash in the questionnaire (under the question ”Did you experience any difficulties with the recording tool during the test session?”). This study is voluntary and your contribution may be withdrawn from the study if you wish so. Before starting the test make sure your device fullfills the technical requirements: iOS 6.0 or later. Also make sure the device has enough memory to install the app and store the recordings. 1. If you have not already downloaded the application: Download and install the game using the link in the e-mail. Open the link in your mobile browser or mobile e-mail client and click on ”Download app”. You can also open the link in your computer browser and download the app by scanning the QR-code with the barcode reader on your device. 2. Read through the tasks (see below) and make sure you understand them. 3. Start the game application and click yes when the message ”Ruzzle would like to record your Camera Video” appears on the screen. You will now see a red dot in the upper right corner. If more messages appear, click ”ok” on all of them. 4. Say your test number out loud (this number was given to you in the e-mail). 5. Perform the tasks: (a) Register (b) Play the game until you have leveled up to at least level 2 (c) Please summarise your thoughts and experience of the game by speaking out loud. 6. Press the home button to close the app/minimise it/send it to the background. Appendix E - Pre Gameplay Questionnaire In order to collect statistical data, a pre gameplay questionnaire was applied when testing the tools/services which did not offer test user profile information. In order to be able to weed out the test users who had played Ruzzle before, a question about this was added in the beginning of the test for the tools/services where it was not possible to specify exact demographics requirements. Questions for the test user to answer before gameplay: • Age • Gender • Have you played Ruzzle before? Appendix F - Post Gameplay Questionnaire The following questions were used in the post gameplay questionnaires. Some test set up services only offered a limited number of questions and therefore all questions could not be included. Where it was possible to present the questions using visual aids such as scale options or check boxes, this was applied. The first 4 questions were included in all the tests. Where it was possible to ask more questions, the questions were divided into several questions. The last question was only asked when using Lookback and UXCam in combination with Beta Family or Google Forms, where an unlimited number of questions could be specified. • How likely is it that you would recommend Ruzzle to a friend or colleague (0=Not at all likely, and 10=Very Likely)? • Did you understand how to play the game? What did you think about the tutorial? Did the tutorial help your understanding of the game or did you think it was unnecessary? • Pick 3 emotions you felt during game play (e.g. engagement, positivity, challenge, stress, excitement, confusion, frustration, boredom, happiness, sadness, uselessness, mastery, effectiveness, meaningfulness, interest, tiredness, energetic) • Would you like to play Ruzzle again? • Have you played Ruzzle before? • Do you have any recommendations, suggestions or comments regarding the game? Appendix G - Session Recording Tool Survey When it was possible to specify many post test questions, a survey regarding the session recording tool and the test user’s testing preferences was included. This survey was conducted after the test users had completed the regular part of the test, including the post gameplay questionnaire. This was because we did not want the survey to interfere with the PX or the result of the post gameplay questionnaire. • Did you experience any difficulties with the recording tool during the test session? Did you get disturbed, felt uncomfortable or did any technical issues occur? Did the app crash during the test session? Please elaborate your answer. • Would you have preferred if the tool had started and stopped recording automatically without you having to navigate its menu? / Would it have been better if you could start and stop the recording yourself instead of it being handled automatically when opening/closing the app? • Why, or why would you not, prefer to automatically start and stop the recording? • Would you have preferred to be able to preview the recording before uploading it? / Did you like the possibility to preview the recording before uploading? • How would you have preferred to perform the test? Would you rather do it at home on your own device where you could chose place and time yourself, or would you rather have visited a test facility where you would have been observed in person while playing the game? • Do you have any recommendations, suggestions or comments regarding the session recording tool? Appendix H - Questions for the Pilot Test These questions were asked after the pilot test user had completed the entire regular test session. • Were the instructions before the test sufficient? If not, what was missing? • Was there anything you did not understand in the instructions or regarding the recording tool? • Was it easy to understand the tasks? If not, how would you suggest we improve them? The tasks were: A. Register. B. Play the game until you have leveled up to at least level 2. C. Please summarise your thoughts and experience of the game by speaking out loud. • Is there anything you think we should change for future testing? Did you think anything in the test was redundant or missing? Any additional comments on the declaration of consent, the instructions, the questions, etc.? Appendix I - Final Workflow The following pages contains the workflow which was produced for MAG Interactive.
© Copyright 2024