fulltext

LIU-ITN-TEK-A--15/034--SE
Utvärdering av
användarupplevelsen av
mobilspel med hjälp av
sessionsinspelningsverktyg
Veronica Börjesson
Karolin Jonsson
2015-06-12
Department of Science and Technology
Linköping University
SE- 6 0 1 7 4 No r r köping , Sw ed en
Institutionen för teknik och naturvetenskap
Linköpings universitet
6 0 1 7 4 No r r köping
LIU-ITN-TEK-A--15/034--SE
Utvärdering av
användarupplevelsen av
mobilspel med hjälp av
sessionsinspelningsverktyg
Examensarbete utfört i Medieteknik
vid Tekniska högskolan vid
Linköpings universitet
Veronica Börjesson
Karolin Jonsson
Handledare Camilla Forsell
Examinator Katerina Vrotsou
Norrköping 2015-06-12
Upphovsrätt
Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –
under en längre tid från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår.
Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,
skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för
ickekommersiell forskning och för undervisning. Överföring av upphovsrätten
vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av
dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,
säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ
art.
Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i
den omfattning som god sed kräver vid användning av dokumentet på ovan
beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan
form eller i sådant sammanhang som är kränkande för upphovsmannens litterära
eller konstnärliga anseende eller egenart.
För ytterligare information om Linköping University Electronic Press se
förlagets hemsida http://www.ep.liu.se/
Copyright
The publishers will keep this document online on the Internet - or its possible
replacement - for a considerable time from the date of publication barring
exceptional circumstances.
The online availability of the document implies a permanent permission for
anyone to read, to download, to print out single copies for your own use and to
use it unchanged for any non-commercial research and educational purpose.
Subsequent transfers of copyright cannot revoke this permission. All other uses
of the document are conditional on the consent of the copyright owner. The
publisher has taken technical and administrative measures to assure authenticity,
security and accessibility.
According to intellectual property law the author has the right to be
mentioned when his/her work is accessed as described above and to be protected
against infringement.
For additional information about the Linköping University Electronic Press
and its procedures for publication and for assurance of document integrity,
please refer to its WWW home page: http://www.ep.liu.se/
© Veronica Börjesson , Karolin Jonsson
Evaluating the user experience in mobile
games using session recording tools
A thesis presented for the degree of Master of Science
in Media Technology and Engineering
Linköping University, Sweden
Veronica Börjesson
Karolin Jonsson
Supervisor: Camilla Forsell
Examiner: Katerina Vrotsou
Stockholm 2015–06–20
Abstract
This thesis work examines how the user experience of mobile games can be evaluated with the
use of session recording tools. The aim is to produce a workflow for user testing with session
recording tools for mobile devices. In order to evaluate the tools and services, and to develop
the workflow, several user tests have been conducted.
When using mobile session recording tools, it is possible to record the screen of the device
and the microphone input while the user is playing the game. In some tools it is also possible
to record the input from the front camera of the device, making it possible to capture the user’s
facial expressions and reactions during the test session. Recording the test session makes it
easier to understand and evaluate the player experience of the game, and also to identify issues
such as difficulties with the navigation in the application or annoyance due to non intuitive
interaction patterns. It is also a good way to get feedback about what the user likes and
dislikes in the application. The fact that no additional equipment is needed for recording the
test session, and that the user can perform the test comfortably on their own device in their own
home, increases the chances for the test itself to have a minimal impact on the user experience,
since the user can complete the test in their natural environment. Session recording tools
are appropriate when conducting remote user testing since the users and the user experience
researcher do not have to be at the same location. It is also a flexible approach since the testing
does not have to be carried out in real-time. The test users can perform the test when they
have time and even simultaneously, while the user experience researcher can watch and analyse
the recordings afterwards. When conducting user testing with session recording tools, there
are also other parts necessary besides the actual tool. The test has to be set up (instructions,
tasks, questions etc.) and both the test and the game application containing the integrated
session recording tool need to be distributed to the test user in some way. The test users need
to be recruited from somewhere, and they have to match the desired target group for the test
session. There are test services which provide all this; test set up, recruitment of test users,
distribution of test and game application, and also some which even provide analysis of the
recordings. When not using a test service, the test facilitator needs to take care of recruitment
of test participants, test set up, distribution and analysis of test data by him or herself. During
this study, methods for conducting user testing using session recording tools both with and
without test services have been tested and evaluated. The mobile game Ruzzle, developed by
MAG Interactive, has been used as test object. This thesis also covers how the user experience
in mobile games differs from other software, and it also investigates how the user experience
can be analysed from the session recordings, i.e. how the user’s emotions can be read from
the recorded screen, voice and face. As a part of the thesis work, a testing workflow has been
developed for the commissioning company MAG Interactive. It contains guidelines for how to
conduct user testing with session recording tools and which parts that are necessary in order
to carry out a successful test process. Tables with information regarding the tools and test
services are also presented, in order to facilitate the decision on which tool/service that is most
suitable for the specific test objective.
Key words:
User Experience, Player Experience, User Testing, Session Recording Tool, Testing Workflow
i
Sammanfattning
Detta examensarbete undersöker hur användarupplevelsen i mobilspel kan utvärderas genom
användning av sessionsinspelningsverktyg. Syftet är att producera riktlinjer, i form av ett
arbetsflöde (på engelska även kallat workflow), för hur användartester med sessionsinspelningsverktyg för mobila enheter kan genomföras. För att kunna utvärdera verktyg och testtjänster,
samt förvärva de kunskaper och den erfarenhet som krävs för att kunna utveckla detta arbetsflöde, har flertalet användartester genomförts.
Mobila sessionsinspelningsverktyg möjliggör inspelning av mobilenhetens skärm medan
användaren spelar mobilspelet. Vissa verktyg har även inspelning av mobilenhetens frontkamera vilket gör det möjligt att spela in användarens ansiktsuttryck under testsessionen. Detta
underlättar vid försök att förstå användarupplevelsen och vid utvärdering av spelupplevelsen,
samt vid identifiering av problem som till exempel svårigheter att navigera i applikationen eller
irritation över icke intuitiva interaktionsmönster. Det är även ett bra sätt att få återkoppling
om vad användaren gillar och ogillar med applikationen.
Det faktum att ingen extra utrustning behövs för att kunna spela in testsessionen, samt
att användaren kan utföra testet på en enhet som de känner sig bekväm med, i sitt eget
hem, ökar chanserna för att testet skall ha en minimal inverkan på användarupplevelsen, eftersom användaren kan utföra testet i sin vardagsmiljö. Sessionsinspelningsverktyg passar bra
vid utförandet av användartester på distans eftersom användare och testanordnare ej behöver
befinna sig på samma geografiska plats. Det är även ett flexibelt tillvägagångssätt då testerna
ej behöver utföras i realtid. Testanvändarna kan utföra testet när de har tid, till och med
samtidigt, medan testanordnaren kan analysera inspelningarna efteråt. Vid genomförandet av
användartester med sessionsinspelningsverktyg, finns även andra essentiella delar förutom själva
verktyget. Det måste finnas en metod för att skapa ett test (innehållandes bl.a. testinstruktioner, uppgifter och frågor) och möjlighet att distribuera testet till användarna. Användarna
skall även rekryteras någonstans ifrån, och de måste överensstämma med den önskade målgruppen
för testsessionen. Dessutom behöver applikationen, med det integrerade verktyget, distribueras
till användarna så att de kan ladda ned och installera applikationen för att kunna delta i
testet. Det finns testtjänster som erbjuder både skapande av test, rekrytering av testanvändare,
distribuering av test och applikation, och vissa erbjuder även analys av sessionsinspelning.
Då ingen testtjänst används är det upp till testanordnaren själv att rekrytera testanvändare,
skapa testet, distribuera test och applikation, samt analysera inspelningar. I denna studie har
metoder för att genomföra användartester med sessionsinspelningsverktyg, både med och utan
testtjänst, testats och utvärderats. Mobilspelsapplikationen Ruzzle, utvecklad av MAG Interactive, har använts som testobjekt. Detta examensarbete omfattar även hur användarupplevelsen
i mobilspel skiljer sig från användarupplevelsen i vanlig mjukvara och det undersöks även hur
användarupplevelsen kan analyseras från sessionsinspelningar, d.v.s. hur användarens känslor
kan utläsas från inspelad skärm, röst och ansikte. Som en del i examensarbetet har ett arbetsflöde för testning producerats för företaget MAG Interactive. Detta innehåller riktlinjer
för genomförandet av användartester med sessionsinspelningsverktyg och vilka delar som är
nödvändiga för att genomföra en framgångsrik testprocess. Tabeller med information gällande
olika verktyg och testtjänster presenteras för att underlätta valet av sessionsinspelningsverktyg/testtjänst.
Nyckelord:
Användarupplevelse, Spelupplevelse, Användartester, Sessionsinspelningsverktyg, Testriktlinjer
ii
Acknowledgements
We would like to thank our families and friends for their love and support, through all of our
lives which has led up to this moment. A big thank you to our supervisor Camilla who has
been an inspiration and given us lots of good feedback and advice. We would also like to thank
all of the participating companies who has taken the time to answer all of our questions and
letting us test their tools and services.
Finally, a big thank you to MAG Interactive, and especially to the Ruzzle team, for giving
us this opportunity and making us a part of the team. We have learned so much and gained
valuable experiences that we will carry with us for the rest of our lives. It has been super fun
and we are so glad for getting the opportunity to get to know all of the amazing people at
MAG Interactive, it has truly been a pleasure.
Karolin and Veronica, Stockholm May 2015.
iii
iv
Contents
1 Background
1.1 Introduction . . . . . . .
1.2 Motivation . . . . . . . .
1.3 Aim . . . . . . . . . . .
1.3.1 Objectives . . . .
1.4 The Test Object: Ruzzle
1.5 Disposition . . . . . . .
1.6 Limitations . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2 Theory
2.1 User Experience . . . . . . . . . . . . . .
2.2 Digital Games . . . . . . . . . . . . . . .
2.2.1 Relevant Genres of Digital Games
2.2.1.1 Social Games . . . . . .
2.2.1.2 Casual Games . . . . . .
2.2.1.3 Mobile Games . . . . . .
2.3 User Testing . . . . . . . . . . . . . . . .
2.3.1 Different Testing Methods . . . .
2.3.2 Testing Methods . . . . . . . . .
2.3.3 Remote User Testing . . . . . . .
2.3.4 Post Test Questionnaire . . . . .
2.3.5 Testing of Digital Games . . . . .
2.3.6 Testing of Mobile Games . . . . .
2.3.7 Test Users . . . . . . . . . . . . .
2.4 Session Recording Tool . . . . . . . . . .
2.4.1 Metrics . . . . . . . . . . . . . .
2.4.2 Facial Reactions . . . . . . . . . .
2.4.3 Audio Recordings . . . . . . . . .
2.5 Workflows for User Testing . . . . . . . .
2.5.1 Remote Usability Testing . . . .
2.5.2 Mobile Applications . . . . . . .
2.5.3 Mobile Games . . . . . . . . . . .
2.5.4 Session Recording Tools . . . . .
3 Approach
3.1 Production of Testing Workflow . . . .
3.2 Materials . . . . . . . . . . . . . . . .
3.3 Research . . . . . . . . . . . . . . . . .
3.4 Evaluation of Session Recording Tools
3.5 Integration . . . . . . . . . . . . . . . .
3.6 Finding Test Users . . . . . . . . . . .
3.7 Creating the Test Plan . . . . . . . . .
3.8 Distribution . . . . . . . . . . . . . . .
3.9 Execution of User Tests . . . . . . . .
3.10 Analysis of Session Recordings . . . . .
3.11 User Feedback on the Test Object . . .
v
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
1
1
2
2
3
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4
4
5
6
7
7
7
8
8
9
10
11
12
14
14
15
16
17
17
18
18
19
19
21
.
.
.
.
.
.
.
.
.
.
.
22
22
22
22
23
23
23
24
24
24
25
25
3.12 Final Evaluation of Session Recording Tools and Test Services . . . . . . . . . . 26
4 Results
4.1 Test Services and Session Recording Tools Initially Investigated .
4.2 Tested Session Recording Tools . . . . . . . . . . . . . . . . . . .
4.2.1 Lookback . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.2 UXCam . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.3 PlaytestCloud . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.4 UserTesting . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Distribution and Test Set Up Services . . . . . . . . . . . . . . . .
4.3.1 Beta Family . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.2 PlaytestCloud . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.3 UserTesting . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.4 Distribution Without Test Set Up Service . . . . . . . . .
4.4 Comparison of Test Services and Session Recording Tools . . . . .
4.5 Outcome of Test Session Analysis . . . . . . . . . . . . . . . . . .
4.5.1 Questionnaire Results . . . . . . . . . . . . . . . . . . . .
4.5.2 Insights Gained from Screen, Facial and Voice Recordings .
4.5.3 The Test Object: Ruzzle . . . . . . . . . . . . . . . . . . .
4.6 Resulting Workflow . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6.1 Test Plan . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6.2 Test Objective . . . . . . . . . . . . . . . . . . . . . . . . .
4.6.3 Test Users . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6.4 Tool and Test Service . . . . . . . . . . . . . . . . . . . . .
4.6.4.1 Session Recording Tool . . . . . . . . . . . . . . .
4.6.4.2 Distribution and Test Set Up . . . . . . . . . . .
4.6.5 Time Plan . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6.6 Prepare Test Details . . . . . . . . . . . . . . . . . . . . .
4.6.6.1 Preparations . . . . . . . . . . . . . . . . . . . .
4.6.6.2 Introduction . . . . . . . . . . . . . . . . . . . . .
4.6.6.3 Instructions . . . . . . . . . . . . . . . . . . . . .
4.6.6.4 Screener . . . . . . . . . . . . . . . . . . . . . . .
4.6.6.5 Pre Gameplay Questionnaire . . . . . . . . . . . .
4.6.6.6 Tasks . . . . . . . . . . . . . . . . . . . . . . . .
4.6.6.7 Post Gameplay Questionnaire . . . . . . . . . . .
4.6.7 Perform Test . . . . . . . . . . . . . . . . . . . . . . . . .
4.6.7.1 Pilot Test . . . . . . . . . . . . . . . . . . . . . .
4.6.7.2 Actual User Test . . . . . . . . . . . . . . . . . .
4.6.8 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6.9 Summarise Results and Share with the Team . . . . . . . .
5 Discussion
5.1 User Testing of Mobile Games . . . .
5.1.1 Remote Testing . . . . . . . .
5.1.2 Test Users . . . . . . . . . . .
5.1.3 Test Method and Procedure .
5.1.4 Social Aspects . . . . . . . . .
5.1.5 Post Gameplay Questionnaire
5.2 Session Recording Tools . . . . . . .
.
.
.
.
.
.
.
vi
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
27
27
30
31
33
34
36
37
38
41
43
45
46
46
47
51
51
51
51
52
52
52
53
54
56
57
57
57
58
58
58
59
59
59
60
60
60
61
.
.
.
.
.
.
.
62
62
62
64
66
67
68
69
5.2.1
5.2.2
5.3
5.4
5.5
Evaluation of tools and features . . . . . . . . . . .
Grading of SRTs and Test Services . . . . . . . . .
5.2.2.1 Website . . . . . . . . . . . . . . . . . . .
5.2.2.2 Easy to integrate . . . . . . . . . . . . . .
5.2.2.3 Easy to set up test . . . . . . . . . . . . .
5.2.2.4 Customise test . . . . . . . . . . . . . . .
5.2.2.5 Demographics Specification . . . . . . . .
5.2.2.6 Profile Information . . . . . . . . . . . . .
5.2.2.7 Researcher Environment . . . . . . . . . .
Analysis of Recordings . . . . . . . . . . . . . . . . . . . .
5.3.1 Voice . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.2 Facial . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.3 Read Emotions . . . . . . . . . . . . . . . . . . . .
Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4.1 Planning the Test and Writing Instructions . . . . .
5.4.2 Pilot Testing . . . . . . . . . . . . . . . . . . . . .
5.4.3 Deciding on a Session Recording Tool . . . . . . . .
5.4.4 Deciding on Recruitment, Distribution and Test Set
Further Research . . . . . . . . . . . . . . . . . . . . . . .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
Up
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
69
71
71
72
72
72
72
73
73
74
74
75
75
76
76
77
78
80
81
6 Conclusion
82
References
84
Appendix A - Initial Test Instructions
89
Appendix B - Declaration of Informed Consent
90
Appendix C - Test Procedure: Lookback
91
Appendix D - Test Procedure: UXCam
93
Appendix E - Pre Gameplay Questionnaire
94
Appendix F - Post Gameplay Questionnaire
95
Appendix G - Session Recording Tool Survey
96
Appendix H - Questions for the Pilot Test
97
Appendix I
98
- Final Workflow
List of Figures
1
2
3
4
Ruzzle, a mobile game developed by MAG Interactive. . . . . . . . . . . . . . . 2
Illustration of the flow concept developed by Mihaly Csikszentmihalyi [6] (adapted
from an illustration by Senia Maymin [32]). . . . . . . . . . . . . . . . . . . . . . 6
Questions answered by different UX research methods (adapted from [48] and
[15]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Lookback’s UX researcher environment. . . . . . . . . . . . . . . . . . . . . . . . 32
vii
5
6
7
8
9
10
11
12
13
14
15
UXCam’s UX researcher environment. . . . . . . . . . . . . . . . . . . . . . . .
PlaytestCloud’s UX researcher environment. . . . . . . . . . . . . . . . . . . . .
UserTesting’s UX researcher environment. . . . . . . . . . . . . . . . . . . . . .
Beta Family’s test set up service. . . . . . . . . . . . . . . . . . . . . . . . . . .
PlaytestCloud’s test set up service. . . . . . . . . . . . . . . . . . . . . . . . . .
UserTesting’s test set up service. . . . . . . . . . . . . . . . . . . . . . . . . . .
Age and gender distribution from a total of 26 post gameplay questionnaire
respondents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Test users preferences to start and stop the recording manually or automatically
and where they would have preferred to conduct the user test. There was a total
of 15 survey participants, out of which 7 had completed the test session using
UXCam and 8 using Lookback. . . . . . . . . . . . . . . . . . . . . . . . . . . .
The test users’ preferences regarding preview functionality in the session recording tool. There was a total of 15 survey participants, of which 7 had completed
the test session using UXCam and 8 using Lookback. . . . . . . . . . . . . . . .
Properties for test set up. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Using Lookback with the setting to show camera input in the lower right corner
set to on. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
35
36
39
41
43
47
49
50
57
70
List of Tables
1
2
3
4
5
6
7
8
9
10
Clarification of the differences between playability and usability according to
Becerra and Smith [3]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Benefits and challenges with remote usability testing. . . . . . . . . . . . . . . .
Session recording tools included in the initial investigation. . . . . . . . . . . . .
Test services which also provide SRTs and were included in the initial investigation.
Features available in the UX researcher environment (where recordings can be
watched and annotated) for the respective services. . . . . . . . . . . . . . . . .
Properties for the SRTs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Features for test set up and distribution services. . . . . . . . . . . . . . . . . .
Grading of tools and services. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Choice of session recording tool. . . . . . . . . . . . . . . . . . . . . . . . . . . .
Choice of test set up and distribution service . . . . . . . . . . . . . . . . . . . .
viii
6
11
28
29
30
31
38
46
53
55
Abbreviations
FACS
GEQ
GUI
HCI
ISO
NDA
NPS
PX
QA
SDK
SRT
UI
UX
VIS
Facial Acting Coding System
Game Experience Questionnaire
Graphical User Interface
Human Computer Interaction
International Standards Organization
Non-Disclosure Agreement
Net Promoter Score
Player Experience
Quality Assurance
Software Development Kit
Session Recording Tool
User Interface
User Experience
Voice Interaction System
Definitions
Attitudinal methods: The attitudinal approach aims to collect data about “what people
say”.
Behavioural methods: The behavioral approach aims to answer “what people do”.
Dashboard: In this thesis, the term dashboard refers to the session recording tool’s website
where the recordings are uploaded and organised and where it is possible to view the user tests.
Facial Acting Coding System: Facial Acting Coding System (FACS) is a guide which provides categorisation of the movement of facial muscles by assigning all facial muscles a number
which is modified when the muscles move.
Gameplay: Gameplay is created by the game developer and the player together. It is the
model developed through game rules, interaction with the player, challenges, skills to overcome
these challenges, themes and immersion.
Graphical User Interface: A User Interface (see UI below) that includes graphical elements
such as windows, icons and buttons [10].
Heuristic: A heuristic is a method or procedure where experience is used in order to learn
and improve [14].
Heuristic Evaluation: A couple of experts review and examine the UI and decide how well
it conforms to recognised usability principles called “heuristics”.
Net Promoter Score: Metric for customer loyalty. The primary purpose is to evaluate the
loyalty of the customers, and the net promoter score (NPS) is based on the question ”How
ix
likely is it that you would recommend [company X] to a friend or colleague?”
Non-disclosure Agreement A legal contract through which the parties agree to not disclose
any information covered by the agreement. This can cover for example confidential material,
information or knowledge.
Playability: How fun the game is to play, how usable it is and how well the interaction style
and plot-quality is. Playability is affected by the quality of the storyline, responsiveness, pace,
usability, customisability, control, intensity of interaction, intricacy, strategy, degree of realism
and quality of graphics and sound.
Player Experience: The experience a player has when playing a game, the UX of games. PX
is targeting the player and the interaction between the player and the game.
Screener: A pre-defined first question in the test setup, to prevent people who are not in the
target group to become a test user and continue with the test. The persons who give the correct
answer can continue with the test while the others are denied, the users do not know what the
correct answer is beforehand.
Session Recording Tool: In this thesis a session recording tool refers to a digital recording
tool for mobile devices, using the built in camera of the device.
Test Service: In addition to session recording, this term also includes recruitment of test
users, test set up and distribution of the test and the application. Some test services also offer
analysis of the recordings.
Think aloud: Users vocally explain their thoughts while using the product.
Quantitative methods: Quantitative methods are good for answering questions like “how
many” and “ how much”.
Qualitative methods: Qualitative studies collect data about behavior or attitudes through
direct observations.
User Interface: The interface features through which users interact with the hardware and
software of computers and other electronic devices.
Usability: How effective, efficient and satisfactory a user can achieve a specific goal in a product.
User Experience: The perception a user gets from using a product, or the anticipation of
using it. This also includes the experience afterwards and subjective emotions.
UX Researcher Environment: In this thesis, the UX researcher environment refers to the
session recording tool’s online environment where it is possible to analyse a recording, i.e. watch
a video and add annotations.
x
1
Background
The background chapter contains six sections, where the first two introduces the study 1.1 and
motivates why it has been conducted 1.2. The third section 1.3 explains the aim of the study
as well as the objectives, and the fourth section 1.4 introduces and describes the test object.
Finally, the disposition of the thesis 1.5 as well as the limitations of the study 1.6 are described.
1.1
Introduction
This thesis is written as a part of the Master’s program in Media Technology and Engineering at
the Department of Science and Technology, Linköping University during spring 2015. The thesis
work has been carried out in association with the mobile games company MAG Interactive at
their headquarter in Stockholm, Sweden.
The aim of this thesis work is to evaluate different tools for testing the user experience
(UX) in mobile games and to produce a workflow for how to conduct user testing with session
recording tools (SRTs). The workflow will be used as guidelines for MAG Interactive, describing
how the process of user testing should be conducted. When performing user testing with a SRT,
there are other necessary parts besides the SRT. It is important to find suitable methods for
recruiting the test users and creating a test plan. Also, both the application and the test
instructions need to be distributed to the test users if conducting remote user testing. Methods
for all of these parts will be discussed in the thesis and suitable services will be evaluated. Since
proper user tests, including analysis, has to be carried out in order to evaluate the tools, it has
been decided that the user tests should also generate valuable feedback about the game which
can be used in future iterations.
1.2
Motivation
Currently, there is little research available regarding the use of SRTs in UX evaluation of mobile
games. Therefore, the aim of this thesis work is to compare and evaluate easily accessible tools
and produce guidelines which can be applied during user testing with these tools. Mobile
games development is a rapidly growing and changing area, but it is difficult to make accurate
evaluations of the UX and to perform user tests which will yield in reliable results. It is generally
preferable to test the UX in the player’s natural environment. However, this usually means that
the result can only be based on subsequent feedback, which can be problematic due to memory
limitations of the player. Other factors, like personal preferences and difficulties explaining the
experience in detail, can also affect the result. It is possible to conduct observational tests in
focus groups (a moderated test session where a group of players discuss the game), but this
involves placing the player in an unnatural environment which might affect the performance and
behavior of the player. Additionally, focus groups typically only test new players and not old
ones which might be desirable. Another method to test the UX, and the player’s understanding
of the interface, is to use mobile SRTs. These tools can be used to record taps on a touch display
along with everything the user sees on the screen, and in some cases even facial expressions and
sound. The use of SRTs allows for tests to be performed in the player’s natural environment.
1.3
Aim
The aim of this thesis project is to produce a workflow for how to conduct user tests of mobile
games using a SRT. Various mobile SRTs will be evaluated and compared against each other,
1
thence the top candidates will be integrated into the mobile game application Ruzzle. User
tests will be carried out in order to investigate the on-boarding process of the game, as well as
to establish which tool (if any) is the most appropriate in the context. The resulting workflow
will contain general guidelines for how to conduct user testing.
1.3.1
Objectives
The thesis aims to answer the following questions:
• How can UX be tested using mobile SRTs?
• Why is remote session recording a suitable approach and which available tools are the
most appropriate for mobile games?
• How can recorded test data from SRTs be interpreted into information that can be used
to address UX and usability issues?
1.4
The Test Object: Ruzzle
The test object, which the users have tested
in the user tests during this study, is a mobile game called Ruzzle. Ruzzle is a social
player vs. player word game developed by the
Swedish mobile games company MAG Interactive. The players can choose to challenge
friends or strangers and the board is constituted of 16 letters in a 4x4 grid (see figure 1).
The game was inspired by the board game
Boggle. The aim of the game is to find as
many words as possible in two minutes. A
word is formed by dragging a finger between
adjacent letters on the board, where one word
has to consist of a minimum of two letters. It
is not possible to use the same letter-box more
than one time in a word, and each word will
only be awarded points once per round. One
game is constituted of three rounds, each of
the rounds are two minutes in duration and
the total score determines the winner. Additionally, the different letters awards different points and the goal is to collect more
points than your opponent before the game
finishes. As you gain experience your level
will increase. There is also a tournament
mode where the player, after reaching level
Figure 1: Ruzzle, a mobile game developed by
six, takes part in weekly tournaments. The
MAG Interactive.
goal in the tournaments is to get an as high
score as possible, and every player can play as many rounds as they like. Every week, the player
competes against 19 opponents with the scores from the best round played. Ruzzle requires a
network connection and is available on iOS, Android, Windows phone and Facebook. The test
2
object is an established game which has been downloaded around 60 million times, according
to MAG Interactive’s observations in May 2015.
1.5
Disposition
In order to gain a profound understanding of what is needed when evaluating the UX of mobile
games, this report begins with a theoretical part in chapter 2. This chapter contains various
theories and definitions, ranging from UX to Player Experience (PX) and finally addresses
mobile games and testing workflows. “Approach” in section 3 introduces the method and steps
for conducting the study, while the chapter “Results” (section 4) demonstrates the results of
the study. In chapter 5, “Discussion”, the results are discussed and evaluated based on the
theoretical section and also based on knowledge gained from conducting the user tests. Finally,
chapter 6, “Conclusion”, aims to answer the initial objectives presented in the introduction
(chapter 1.1).
1.6
Limitations
This research aims to find a testing workflow suitable for the specific target game and similar
games, hence the aim is not to declare an entirely general method that will work for every type
of game. The thesis project is conducted during the course of 20 weeks, which is the time limit
for the master’s thesis. When this study was initiated, several of the relevant testing tools were
only available for iOS, hence iOS became the choice of platform for this study.
Since session recording of user testing using mobile devices is a relatively new field, especially
within the area of mobile games, many of the investigated test services and SRTs are under
development. They are frequently being updated and new features and tools are becoming
available. The tables and other details regarding the tools and test services in this report has
been collected with the information available when writing the report. Therefore, we reserve
the right for eventual printing errors and outdated information.
3
2
Theory
The theoretical chapter consists of five sections. Section 2.1 explains what UX is, how it can
be defined and what separates UX from usability. Section 2.2 defines what a digital game is
and how the UX in games differentiates from regular software. Section 2.2.1.3 focuses on UX
in mobile games and what separates it from UX in other digital games. Section 2.3 is about
user testing; what it is and what different methods that exist including what parts they consist
of. Section 2.4 covers SRTs; what it is and how it can be used in UX testing. Finally section
2.5 deals with workflows and what guidelines to follow when conducting user testing.
2.1
User Experience
The International Organization for Standardization (ISO) defines UX as: ”a person’s perceptions and responses resulting from the use and/or anticipated use of a product, system or
service” [19]. This includes the user’s emotions, beliefs, preferences, perceptions, physical and
psychological responses, behaviours and accomplishments before, during and after use of the
product. UX is also a consequence of interaction with a product or system, and the internal
and physical state of the user as a result from prior experiences, attitudes, skills, personality
and context of use [19]. According to the ISO, usability can be defined as: “The extent to which
a product can be used by specified users to achieve specified goals with effectiveness, efficiency
and satisfaction in a specified context of use.” (ISO 9241-11) [56]. The usability criterias can
be used to assess aspects of UX, for example functionality, system performance, interactivity,
assistance from the system, etc [19]. Usability refers to implementing an application in a way
so that the user can perform a task effectively and efficiently, without having to put too much
effort into it. The result should be of good quality and the time it takes to perform the task
should be minimised, i.e the main focus is productivity [5]. Usability Partners [56] describes
UX as a subjective extension of usability, which focuses on how satisfied the user is. In order
to evaluate the UX it is important to go beyond the aspect of usability and evaluate the whole
experience. UX is an important part of human computer interaction (HCI), but initially the
term was not associated with games development [2]. Nowadays, the field of HCI and game
research are learning from each other and HCI UX evaluation methods are used in games development for improving the UX of the game [2]. Due to the ”intentional challenge and emotion”
in games, many HCI methods can not be used in the same way as when evaluating the usability of productivity applications [29]. Creating a good (or bad) UX in games depends on the
aspects: immersion, fun, enjoyment, usability, aesthetic, pleasure and flow. It is also important
how all of these aspects influence the user [5]. Therefore, these factors have to be considered
when evaluating the UX. In HCI, focus is often on the end result of the experience. The current
UX evaluation methods often offer insights about the experience, but not objective knowledge.
As stated by Calvillo et. al; “experience is both the process and outcome of the interaction” [5].
When interacting with an application, the user should feel that all “elements of the experience
acted in symphony” [5], which will give rise to a positive experience. Hence Calvillo et al. [5]
means that by evaluating the elements present in the interaction process, also the experience
can be evaluated. An experience is both personal and subjective, but from a scientific point
of view an evaluation of the general UX is needed. Even though the experience is personal, it
is not often unique. It is possible to share an experience with others and to empathise with
it. Even if an action is performed by an individual and gives rise to a personal experience, the
same process of interaction when completing the task is used by many individuals. In the same
way, the experience is many times the same or similar [5] which makes it possible to get an
4
idea about the general perception of the experience by observing or asking some of the users.
2.2
Digital Games
All games available on digital platforms such as PCs, consoles or mobile devices are digital
games. A digital game can be distributed both online and offline, and can be available as both
singleplayer and multiplayer. Games can be described as an activity one is participating in for
fun. Entertainment can, however, be difficult to define since it is objective and depends on
what the player is experiencing as fun [20]. For a game to be fun, it needs to be motivating,
have rules and be engaging. It also needs to have a purpose and a feeling of presence. Rules
can be displayed by success and failure rates in the graphical user interface (GUI), or be made
up during the game process. Isbister states that specific usability measures are necessary for
digital games. This is due to the fact that a game is a complex software with different goals and
area of use, compared to traditional task-orientated software, which currently most usability
evaluation methods are targeting [17] (page 8).
Researchers have different point of views, standpoints, methods and terminologies for developing good UX in games. According to Isbister et al. [17] (page 5), there is a difference in
testing the UX and the player experience (PX). The PX is the experience a player has during
game play, i.e. the UX of the game. Testing the UX covers what it is like to interact with the
software, including how engaging the experience is, regardless of the end goals. The focus of
PX testing is to determine if the game is enjoyable and fun to play, but also to find out where
the players might encounter problems and get stuck. The playability determines the quality
of the game, i.e. playability is a kind of usability measurement for games, including the UX
quality and how enjoyable the game is. The definition of playability is: “.. the degree to which
a game is fun to play and usable, with an emphasis on the interaction style and plot-quality
of the game; the quality of the gameplay..” [55]. Playability is affected by the quality of the
storyline, responsiveness, pace, usability, customizability, control, intensity of interaction, intricacy, strategy, degree of realism and quality of graphics and sound. Since games are supposed
to give rich and meaningful experiences, where also the gamers personal feelings are involved,
the study of PX requires additional methods besides the usability methods used in the field
of HCI. When playing a game, the player continuously evaluates his/her own performance in
the game. This can be done both consciously and subconsciously. Is the player able to perform, meet challenges and attain the desired goals? After reaching the goals, the player will
experience positive feelings and perceive him- or herself to be competent [51]. Immersion, fun,
presence, involvement, engagement and flow are concepts that has been used to describe PX,
and these terms are often broadly defined. The concepts can be related to various psychological
compartments, i.e the concepts are overlapping each other making it more difficult to measure
and understand them. By using psychologically valid metrics when evaluating games, it makes
it easier to measure the experience [51].
Becerra from AnswerLab and Smith from Electronic Arts state that “we don’t use games, we
play them” [3]. Therefore, the PX should be measured from a playability, and not a usability,
perspective (see table 1). Becerra and Smith say that if a game would be usable, it would be
boring, since the players would understand everything instantly. Without the challenge, a game
would not be exciting. However, menus, navigation and such still have to be usable in order
to make it possible to play the game. Becerra and Smith explain that there are two different
types of motivations for playing a game; task oriented and fun orientated.
5
Playability
Usability
Challenges are good
Challenges are bad
Surprising or unclear elements can
be positive and enjoyable
Localisation and understanding of
all main elements should be instant
Motivation is focused on the fun
Motivation is focused on the tasks
Fun is a big factor of success
Ease of use is a big factor of success
Table 1: Clarification of the differences between playability and usability according to Becerra
and Smith [3].
Csikszentmihalyi came up with the concept “Flow” [6], displayed in figure 2. It is
about finding the balance between boredom
and anxiety and can be applied on mobile
games. If the player has good gaming skills,
and the game is found to be too easy, it also
gets boring. If the player on the other hand
has bad gaming skills, the game can be too
challenging which leads to anxiety. According
to Lemay et al. [28], the flow-model can help
designers grasp the answer to a fundamental
question about their work: “What motivates
people to play?”. Lazzaro studied why some
players prefer action games, while others prefer word puzzles. The answer is that people
play different games for different reasons, they
have various goals and motivations. Lazzaro
defined the Four keys of fun [26] which represent the four most important feelings that Figure 2: Illustration of the flow concept devela game can generate. The first key is called oped by Mihaly Csikszentmihalyi [6] (adapted
hard fun and acts on the emotions frustration from an illustration by Senia Maymin [32]).
and relief, while the second key; easy fun, focuses on curiosity and an easy going approach. The next key is called serious fun which provides
relaxation and excitement. The last key, people fun, provides amusement and is enjoyable since
it focuses on social bonding and interactions with other people. Lazzaro defines PX as how the
player interaction creates emotion, and how well the game provides and satisfies the sort of fun
that the players want [27]. The most popular games utilize all of these emotions in order to
draw attention and motivate the player.
2.2.1
Relevant Genres of Digital Games
Digital games can be divided into different kinds of genres. Below follows information about
genres of digital games which are of relevance in this study.
6
2.2.1.1
Social Games
A social game is where more than one person is simultaneously engaged actively in a game [18].
When evaluating the UX of a game, the social aspects of the experience also has to be taken
into account. Isbister states that most games, both singleplayer and multiplayer, are usually
played in a social context. It has been concluded that social play leads to more positive effects
than solo play. Social games also provide more competence, while requiring less effort and
inducing less frustration [18]. Isbister states that in order to thoroughly understand the user
experience of games it is important to also consider the social nature of it. By adding people to
the play session the player/players will have an entirely different end experience. Isbister also
points out that social games should be tested in a social context [18].
2.2.1.2
Casual Games
Furtugno describes casual games as games that can be played by anyone regardless of age or
gender [8]. They are available on all kinds of platforms, even game consoles (such as Nintento
Wii) which has previously been used mainly by hardcore gamers. The success of casual games
has made games reach a wider audience, but it has also generated new design issues. Since
there is no restricted target audience, designing a game that appeals to everyone can be difficult.
There are various definitions of casual games in different parts of the games industry, depending
on the game content, game medium and play length or the current market. All of these are
factors in the design of casual games, but Furtugno claims that the most important factor is
to start thinking about who is intended to play the game. Proceeding from this, decisions and
assumptions can be made regarding expectations, experiences and what they consider intuitive,
challenging and fun.
2.2.1.3
Mobile Games
A mobile game is a digital game and as the name of the genre implies, it is played on a specific
platform. Gaming has become mainstream, not only hardcore gamers are playing games as
leisure activities anymore. The use of smartphones and tablets have made games more accessible and also made them available to a low cost, which has increased the number of so called
casual players (players who occasionally or frequently plays easily accessible and easy to play
games just for fun or in order to relax). Game designers are standing before the challenge to
create appealing, accessible and usable games for players who are not typical hardcore gamers
[28]. It is important to consider the general preferences of the targeted audience, Lemay investigated if the typical hardcore gamer and the casual gamer experience games differently and
concluded that different audiences are drawn to games. There are no universal guidelines for
what constitutes a good gaming experience that can be applied to all groups of players [28].
When analysing a mobile game, it is important to find a balance between flow and challenge
[52]. The problem is to find a good difficulty level so that the game is not too easy since that
usually becomes boring, but also not too difficult since that becomes frustrating for the player.
Regardless of how good the graphics quality is in the game, these kind of problems can lower
the quality of the entire PX. When analysing a game it is also necessary to examine and find
a balance between the playability, mobility and usability, as well as mobile device and touch
screen properties. Korhonen discusses that usability of mobile games should be evaluated differently than other digital games since they are mobile and are used in different contexts than
other digital games. Mobile games can also be used at various locations with differing lightning
conditions and noise levels etc. and the player might need to focus their attention on other
7
things in their surroundings from time to time. Additionally, it is not possible to measure
playability with the same heuristics as those developed for task-orientated software, since it
is a game many different parameters, paths and stories are created by each player and hence
the scenarios will differ from player to player [21]. Ülger also states that mobile games may
encounter several playability issues. One of the issues is handling interruptions, for example
when receiving a phone call during a game session. Additionally, control of sounds and environment can be restricted on the device. Another difficulty can be to help the player understand
the game. Having a simpler user interface (UI) than in PC or video games is necessary since
the screen resolution and size is more restricted. The character status (if the game contains
a character) and the game goals need to be evidently visible to the player. Touchscreen displays also present some specific issues, such as distribution of game items on the screen. The
distribution of items should suit both right-handed and left-handed people. Since the mobile
device is portable and can be used everywhere, the environment, noise and light may vary.
Distinctive is also the fact that traditional mobile devices have rather small screens, insufficient
audio capabilities, limited processing power and battery limitations [52].
2.3
User Testing
User testing, and measurements of the user experience, is conducted in order to improve the
UX of existing UIs [1]. It is also conducted throughout the development process in order to
collect feedback on concept ideas, game mechanics, design decisions, etc. which can be used to
ensure that a project is heading in the right direction.
2.3.1
Different Testing Methods
There is a wide range of UX research methods, some are older and some are more modern.
Rohrer states that it is nearly always best to use a combination of multiple methods, providing
different insights. The methods can be separated into behavioural vs. attitudinal, and quantitative vs. qualitative (see figure 3). The attitudinal approach aims to collect data about “what
people say”, while the behavioral approach aims to answer “what people do”. Usability studies
are in-between the attitudinal and behavioral methods, mixing the best from both areas by
combining self-reported information with behavioral observations. According to Rohrer, it is
generally recommended to go closer to the behavioral side when performing usability studies
[48].
8
Figure 3: Questions answered by different UX research methods (adapted from [48] and [15]).
Qualitative studies aim to collect data about behavior or attitudes through direct observations.
Qualitative methods are recommended to answer why an issue occur and how to fix it [48]. In
quantitative studies, data is gathered indirectly using measurements, analytics tools or surveys.
Quantitative methods are better for answering questions like “how many” and “ how much”.
These methods can be used to grade issues according to severity, by concluding which issues
are the most crucial ones.
2.3.2
Testing Methods
“When studying natural use of the product, the goal is to minimise interference from the study
in order to understand behavior or attitudes as close to reality as possible” [48]. The natural use
approach provides a greater validity to the study, but it also offers less control for the facilitator.
If the main focus of the study is to investigate what people do, why they do it and how to
fix potential problems; the study will focus on qualitative behavioral methods, observing the
players in their natural environment and their everyday use. Heuristic evaluation is a method
that consists of having a few experts review and examine a UI and thereafter decide how well it
conforms to recognised usability principles or guidelines called “heuristics” [39]. Observational
methods for usability testing are used to gather information about the user’s behavior. This
can be face and body expressions, focus level, preferences, actions while performing tasks as
well as when encountering errors, and also opinions and comments from the test participant.
There are three different observational testing approaches; test monitoring, direct recording
and think-aloud. In test monitoring, the observer directly observes, records and comments on
the participant’s behavior throughout a test session. Typically, the practitioner follows a data
sheet with guidelines containing checklists of tasks, test time, comments from the practitioner
and the participant, as well as an explanatory statement of the event. It is also possible to
use one-way mirrors which means that the test user is observed through a one-way mirror from
an adjacent room. The test monitoring approach is the most common method where only a
few test users are needed. Direct recording is suitable when having many test participants and
when there is a need to eliminate potential bias from the test observer. There are different
recording alternatives available, such as audio and video taping and screen recording. In the
9
think-aloud approach, the respondents are asked to verbally explain what they are doing while
using the product, and they are reminded to regularly verbalise their thoughts throughout the
test session [53]. There is also a usability evaluation method called focus groups, where a group
of users discuss a product while being moderated by a test facilitator. The benefits of this
method is that the users can discuss their experiences and results, which can lead to many
useful ideas and feedback. This is also a cheap testing method, especially if being carried out
before developing a product. The drawback is that it is an unnatural environment for the test
users and an experience in a focus group could usually not be compared to a real experience in
the users everyday lives[9].
User tests can be carried out in different contexts and environments. There is natural use,
where the test participant is carrying out the test in his or her normal environment, like they
would during everyday activity. There are also scripted user tests, where the participant is
following a script. In some studies, a mixture of different contexts is used, which is called a
hybrid. A testing method which can be conducted from the user’s natural environment is called
remote usability testing, which will be further discussed below in section 2.3.3.
According to Krug and Sharon [24], it is important to ask the whole team (developers,
designers, UX researchers etc.) what questions they want to have answered as a result of the
testing process. They also point out that these questions are not necessarily the ones being
asked to the users but rather what you want to know as a result of the study. They suggest
that tests should be planned together with the entire team. The whole team should be involved
in creating the test, choosing participants and they should be encouraged to provide feedback
regarding the test process.
2.3.3
Remote User Testing
An alternative to regular observation of test sessions is remote user testing. The main difference
between remote user testing and regular user testing, is that the test participants and the
test facilitators are at different physical locations. This approach makes it possible for the
participants to carry out the test in their everyday environment, without being distracted by
the facilitator or disturbing equipment. The test process is perceived to be more natural for
the test user, while the UX expert can still watch and analyse the test procedure from a remote
location [49]. In a webinar (web-based seminar) at UserTesting.com [24], Krug and Sharon
recommend performing a pilot test on one person in order to test the test before using it for
user testing. This should be done in order to make sure that everything is working, and that
the users will understand the tasks and the questions given to them in the test. Sharon insists
on the importance of pilot testing and states that he has never regretted doing a pilot test, only
regretted not doing one. One drawback with the remote testing method is that the possibility to
interpret the user’s body language is essentially lost. The possibility to ask follow up questions
in the middle of the test process is also ruled out (if having an unmoderated test session). On
the other hand, remote testing does not require any traveling, neither for the facilitators nor for
the participants. This saves time and also makes it possible to have geographically dispersed
user groups without requiring a larger budget. Some advantages with remote user testing are
that tests can be carried out with a limited budget, strict time constraints and without the use
of additional testing facilities. It also offers a faster and easier recruitment process. The test
group does not have to be collocated, and the test can be performed in a natural environment
where the test participants feel comfortable [49]. There are both moderated and unmoderated
remote user testing. In a moderated test session, the facilitator and the user are both online
when the test is being performed and they can interact with each other during the test process.
When using the unmoderated test method, there is no real-time interaction between the test
10
participants and the facilitators, and the test can be carried out asynchronously. Unmoderated
user test studies provide flexibility, and the user can complete the test session when they have
time and want to go through with it. Another advantage is that all test users can perform the
test simultaneously, since the test data will be analysed retrospectively [49]. When carrying
out unmoderated remote user testing, it is extra important with clear and specific instructions.
The test facilitator can not assume that the test user will understand how everything works
without thorough and easily interpretable instructions. When performing moderated user tests,
the moderator can make sure that the user stays on the task, hence unmoderated tests place
greater demands on instructions and preparations [24].
Usability.gov [54] summarises the main benefits and challenges with remote usability testing in
table 2.
Benefits
Challenges
• Eliminates the need for a lab as well as
the need to place the test participant
in an unnatural environment, which is
usually the case when performing tests
in a lab.
• Security could be compromised if the
testing information is sensitive (since
data might leak out).
• Restricted view of user’s body language.
• Supports a larger group of test users.
• Technical problems are likely if the test
users:
• Typically, less expensive than in-lab
testing.
– Are not comfortable with the
technology (which can be likely if
the target group is not gamers).
• Possible to run multiple tests simultaneously.
– Have conflicting software or
equipment installed on their
device.
• Unmoderated testing allows for the test
user to perform the test at any time
suitable, increasing the possibility for
all participants to complete the test.
– Have a slow Internet connection.
• Usually possible to test a larger number
of users than in a lab environment.
Table 2: Benefits and challenges with remote usability testing.
2.3.4
Post Test Questionnaire
A post test questionnaire is a survey consisting of questions that the user should answer after
testing the product. When performing online surveys, Nielsen recommends short and well
written surveys which are easy to answer. This will yield in a higher response rate while
avoiding misleading results [43]. According to Nielsen, the most important aspect regarding
questionnaires is to maximise the response rate. Low response rates can be misleading since
there is a chance they are based on a biased group of very committed users, hence the result
could not be viewed as a representation of how most users experience it. In order to get as many
people as possible to respond to the surveys, they should be ”quick and painless” [43]. The best
11
way to achieve this is, according to Nielsen, to reduce the number of questions. The questions
also have to be easy to understand and the survey easy to operate, in order to avoid misleading
answers due to misunderstandings. The questions should only address the core needs, Nielsen
refers to Reichheld’s article ”The One Number You Need to Grow” where it is stated that only
one question is needed in order to get insight into the customer-satisfaction. From this question
the Net Promoter Score (NPS) can be calculated. NPS is a metric for customer loyalty which
was developed by Satmetrix, Bain & Company and Fred Reichheld [38]. The primary purpose
of the NPS methodology is not to evaluate the customers satisfaction with the product, but to
evaluate the loyalty of the customers towards the brand or the company. Reichheld researched
the link between survey responses and actual customer behaviour which resulted in one direct
question ”How likely is it that you would recommend [company X] to a friend or colleague?”
[47] (company X can be replaced by for example a product or a game). Based on their replies,
the respondents are divided into different categories where some are considered to be beneficial
for the company and some can affect the brand negatively [37].
Nielsen also suggests that an alternative approach to having all users participate in the same
short survey, is to ask different questions to different users. In this way more questions can be
included in the study while it is still being kept short for the individual survey taker. This might
provide more insight into the UX [43]. Since digital games can differ a lot regarding application
area, target players and game experiences, there is no universal measurement approach which
fits all. This has lead to an absence of coherent tools to measure entertainment experiences in a
reliable manner [16]. In an attempt to develop a generally applicable measurement, IJsselsteijn
et al. developed the Game Experience Questionnaire (GEQ). GEQ is a post play questionnaire
that takes into consideration seven dimensions of player experience: sensory and imaginative
immersion, tension, competence, flow, negative affect, positive affect and challenge [16].
2.3.5
Testing of Digital Games
User testing of games can be performed using a range of different methods. They can be
small one-to-one sessions or larger test sessions involving groups of people. The procedure
can be everything from think-aloud, questionnaires and post-play interviews to an automated
recording system which collects data about the player. Some user tests are performed in testing
labs with special equipment, where the players can be observed and their actions and reactions
documented. This is however a rather expensive procedure, which only makes it available to
bigger companies with a large testing budget [4]. The data gathered during the test sessions
focuses on both usability and playability issues. By observing the player playing the game,
not only usability issues like problems with navigation, menus and level difficulties etc. can
be deduced, it can also be established if the player had fun, and if so when and where in the
game, as well as how fun it was [4]. Reviews and game forums relying on user feedback are
common post-launch evaluation methods in games development. However, these do typically
not generate detailed information and it can be difficult to deduce the cause of problems [4].
Brown claims that the most powerful UX evaluation tools “offer insight not just into the user
experience but into the exact elements influencing the experience“ [4].
The disadvantages of user testing of digital games are that it takes time and costs money,
while the reward is the ability to create games that the players want to play. This generates
more players and more money for the business [35]. Even though digital games are different
from medical devices, they are both using a UI and hence similar testing methods can be used.
Oliveira et al. [45] studied usability testing of a respiratory UI in a medical environment, and
investigated the possibility to use computer screen and facial expression recordings in usability
testing of the UI. They say that a suggestion is to use computer screen recordings, instead of
12
other qualitative methods like interviews and questionnaires, when performing UX evaluation
of clinical information systems. This is because it provides further objectivity while capturing
UX problems in real-time. Similarly, the use of questionnaires and post-play interviews have
been questioned in the gaming industry. Since questionnaires and post-play interviews are
carried out after the game experience, both methods have been criticised for not being able
to capture the user state during their engagement with the game. After the game, the test
participants have to focus their attention on the evaluation, instead of the experience he or she
just had and is supposed to evaluate [34]. Another source of criticism is the fact that emotional
experiences can be difficult to describe in words since they are not primarily based on spoken
or written language. The use of screen recording tools alone will however not give enough
insight into the UX, the user’s emotions also have to be taken into account. In user testing
of games, usability issues can be detected from analysing screen recordings, but the player’s
emotions when playing the game can not be deduced only from this. To reveal the player’s
feelings; physiological measurements can be recorded, survey methods can be used or facial
recordings can be collected. Collecting physiological measurements will provide high resolution
data, but requires complex and expensive technology and also a high level of expertise. As
already concluded, surveys provide limited information. Recordings of facial expressions on the
other hand, can be collected even with a smaller budget [45] and can provide clues of the players
true emotions. As Darwin puts it in The Expression of the Emotions in Man and Animals:
“They [the movements of expressions in the face and body] reveal the thoughts and intentions
of others more truly than do words, which may be falsified.” [7]. Displaying facial expressions
is a visible and distinctive way of communicating emotions [45] and by observing the users
facial expressions during the test session, additional information about the UX can be deduced.
Oliveira et al. concludes in their study that the combination of (computer) screen recordings
and recordings of facial expressions can improve the evaluation of user interaction tests. Their
study aims to improve user assessment during usability tests in order to improve efficiency and
effectiveness in health information systems. As stated by Oliveira et al. the combination of
screen recordings and facial expression recordings can be used to determine the emotions of
the test participator [45]. Since an essential part of a good gaming experience depends on the
emotional response of the player (see section 2.2), one can assume that the approach could
work well also in user testing of games.
Isbister states that the games development process consists of five stages [18] (page 347)
which are tested and evaluated using different usability methods and tools:
1. The first step is before the project has even started. The product will not be tested,
but it is important to make time for user research in the development process and decide
on what research to conduct. It is also necessary to add time to the planning schedule
for addressing the usability issues discovered. A testing workflow needs to be developed,
making it possible for everyone in the team to test consistently.
2. The second step is the concept phase. Once the target audience, genre and platform have
been identified, it is easier to make decisions about the usability testing and specifically
heuristic evaluation. The aim is also to design a game which is fun and to recognise which
social and psychological aspects that matter.
3. In the third step, called the pre-production stage, it might be desirable to use expert
evaluation in the team, to ensure that the UX objectives are fulfilled.
4. The fourth step is the production phase, where mainly classic usability methods, such as
think-aloud, are used. If time and resources are available, physiological measures can be
13
used to verify the emotions evoked by the game.
5. In the post-production phase, continuous post-launch and playability testing will be
needed if updates and new features are released.
2.3.6
Testing of Mobile Games
According to Isbister, it is preferable to perform user tests on mobile or handheld devices in
a setting where the “players might engage in game play embedded in their daily lives” [18].
It is also important to test the game on a demographically appropriate group. For example,
the people in the test group should have the same relationship to each other, as the people
in the target group has to each other in their natural environment. Both environmental and
contextual factors are important for achieving a natural gaming experience also during the
testing process [18]. Testing the experience that a mobile game provides to the players can not
be done using a computer [12]. Dragging fingers on a touchscreen display generates a different
UX compared to computer mouse clicks. It is important to test the game on a real device in
order to get realistic performance results. Real occurring events such as disruption, battery
consumption, memory restrictions or charging of the mobile device have a large impact on the
overall UX and playability [13]. Therefore, the best way to understand the UX of a mobile
game is to test on a real device and not on a simulator. It is also important to test on many
different devices, since most of the end users do not use the same high quality products as
the developing company might have access to. Isbister states that video recording and screen
capture makes it possible to do rich analyses of gameplay [18].
Inokon, producer at a mobile games company, was interviewed by Isbister and Schaffer [17]
(page 161) and he highly recommends usability testing and describes it as ”removing the blinders”. Often the game developers get so close to the game that they lose the player perspective.
The usability tests can in many cases reveal issues previously unnoticed, and it might be difficult for the developer to recognise these after spending time and energy working on the game,
but it can be these adjustments that help your game to become a market success. Inokon also
explains that “usability can be lethal to a project if not used properly”, and lists his most
important pieces of advice [17]:
• Take time to thoroughly observe the players and grasp the context of the notes.
• Make time in the schedule for solving emerged issues.
• Not every issue will be addressed, choose to fix the most important changes for the game
vision.
• A game is not a snapshot, the game is constantly changing so don’t procrastinate the
evaluation for too long. Test it when the game is in alpha stage and functional.
2.3.7
Test Users
Isbister investigates the question ”If the developers are also players, why can they not test the
games themselves?”, and the answer is: because developers are not typical gamers [17] (page
350). They already have a previous experience and knowledge of the game, resulting in biased
opinions. It is necessary to test with both professional game testers and end users in order to
get a valid player perspective. Recruiting users is the most difficult but also the most important
part of user testing [42]. It is important that the game is tested by test users with a similar
demography as the users who will play the game, in order to guarantee that it meets their
14
requirements. Despite this, not many companies have a procedure for regularly gathering test
users. The traditional ways of finding test users are to recruit colleagues, family members and
friends, or persuade random people in the streets or cafés to participate. It is also possible to
reach out to players through online communities or social media.
Another approach is to use an independent, specialised recruitment agency that finds test users
for you, or to use an online test service that provides a community of test users. There are
online services which offer both SRTs and user testing (such as PlaytestCloud which is further
discussed below in section 4.2.3). The criterias for the test users should match the target
audience regarding demographics such as age, gender, game expertise (casual or hardcore) and
income [42]. However, Krug and Sharon insist that it is not a requirement to test on the exact
target group, since valuable insights will be gained anyway [24]. Test users who are recruited,
except for colleagues or family members, are often offered payment as compensation [42].
A common problem when performing user tests is that not all test users show up. According
to Nielsen, the average no-show rate is 11 %, which is almost one in nine testers [42]. The noshow rates can be higher for remote user test studies, than for studies carried out in person.
Additionally, unmoderated sessions may vary greatly in quality, therefore Schade recommends
test facilitators to recruit some extra test users just in case [49]. Test sessions should be carried
out throughout the development process, and the number of test users in one session differs
depending on the purpose. Nielsen claims that it is enough to test with only five users in one
session when performing qualitative usability testing [41]. Using more than five users is a waste
of resources, it is better to run as many minor tests as possible with few test users. Nielsen and
Landauer studied the number of usability problems found through usability testing [41], and
discovered that about 80% of the usability problems can be found using five test users from
the same target group. Testing the quality metrics in quantitative testing of a game, such as
learning time, efficiency, memorability, user errors and satisfaction, requires four times as many
users as qualitative user testing does [44]. Nielsen concludes that 20 is the optimal number of
test users for quantitative studies.
The result from the 2013 UX Industry Survey [59] shows that most companies (40% of the
respondents) uses an average of 6-10 test users per study while the report from 2014 [58]
declares that 1-5 users is a more common test group size. According to the result from the
2014 survey, now 40% of the participating companies uses test groups consisting of 1-5 users,
and the percentage who uses 6-10 users has slightly decreased. However, according to Krug and
Nielsen, testing on only one person is better than not user testing at all. Krug claims that even
a bad test with the wrong test user will reveal possibilities to make important improvements
[23] [41].
2.4
Session Recording Tool
A Session Recording Tool (SRT) is a software which records the participant’s screen while
using the product during a test session. This can be done remotely or at a specific location, for
example from the user’s home or at the office. Session recording methods have been used for
many years, traditionally using additional equipment (such as cameras and sleds) for recording
the user. However, a new type of session recording, where software is used, has become available
and popular in recent years. This software is called a SRT and can be integrated into an
application, the UX can then be recorded directly in the mobile device or computer without
additional equipment. There are a wide range of SRTs, each with their own benefits and
drawbacks. Since the aim of this thesis is to evaluate the UX of mobile games, the focus
will be on SRT’s with support for mobile applications. Some of the tools support facial and
15
audio recordings, which makes it possible to gather more data about the user experience while
the user is using the application. There are some tools which store the recordings on an
online dashboard, where multiple researchers can view the recordings and navigate along the
timeline. Often also an annotation feature is available, where it is possible for the UX researcher
to comment on specific parts of the recording, for example if the test user experiences any
difficulties in the application or if the test user’s reactions are extra distinctive in a specific
part or view. This makes it possible to perform an extensive analysis which can be used
in order to improve the UX. SRTs are especially handy when conducting remote user tests,
where it is often desirable to record the test session. The use of additional equipment, such as
separate cameras and microphones, places high demands on the test user when participating in
remote test sessions. Reducing the need for additional equipment and software will facilitate
the test process and make it easier and more natural for the test user, this makes mobile SRTs
a preferable alternative. It is also possible to hand out a questionnaire or carry out interviews
after the test session. The user will then answer questions about their experience during the
test session, but this method does not reveal much of the real UX and the user’s reactions
to the product. A combination of a questionnaire together with a SRT can provide more
information, this enables the researcher to compare the experience as stated by the user in the
questionnaire with the result from the analysis of the recordings. Some of the test services which
were investigated in conjunction with the search for SRTs, provide a full test service including
finding test users and performing the tests using SRTs. Some services are also carrying out
analyses of the test data, and summarise the results into a report which is submitted to the
client who ordered the UX evaluation of their product.
Brown claims that the most powerful UX evaluation tools are the ones which also gives
insight into which exact elements that are influencing the UX of the game, as well as how the
UX is perceived [4]. By using SRTs, the UX researcher will get a clear insight into what is
happening in the application and how the test user reacts. Important aspects for good gameplay
are both challenges and emotions [51]. However, the emotions of a player can only be derived
from observations or by questioning the player. SRTs which only records the screen, and not
the facial expressions of the player, will have less information about the emotions of the player.
Krug and Sharon [24] however, claim that when watching recordings from remote usability
tests they do not need to know what the user says and neither do they need to see their faces.
All they need i order to be able to evaluate the user experience is their tone of voice. Krug
and Shannon also give some advice regarding what to look for when choosing a tool; metrics,
videos, turn around times and the possibility to flash forward recordings [24].
UserZoom is a platform for agile usability testing and UX analytics who are using remote
session recording to collect qualitative and quantitative data. In a case study [61] performed by
UserZoom, they used session recordings to evaluate the UX of a website and their conclusion was
that “UX practitioners have been using remote usability testing mostly for collecting valuable
quantitative data, something that was not possible to do in a lab. Now, combining remote
unmoderated usability testing with videos of the user sessions gives you the best of both worlds.
With this approach companies can gather the necessary quantitative data as well as qualitative
data to better optimise online user experience on their site.”
2.4.1
Metrics
Graham McAllister at Player Research divides the player metrics into behaviour, rationale,
perception and experience. Behaviour refers to what the player did during the game play,
rationale refers to why they did it. Perception comprises what the player think happened and
experience refers to how the player felt. Aspects to consider are what did the player do, why
16
did the player do it, how did the player perceive it and how did it make the player feel [33].
2.4.2
Facial Reactions
A common way to read facial reactions is the classical way where two or more humans interact
and intuition and experience is used to read and interpret the expressions of the other participants. In user testing, this has until recently been done face to face, but now it is becoming
increasingly more common to communicate online using video calls. There are also new systems available that interpret facial reactions automatically. Oliveira et. al [45] and Lankes et.
al [25] mentions the Facial Action Coding System (FACS). FACS is a guide which provides
categorisation of the movement of facial muscles by assigning all facial muscles a number which
is modified when the muscles move. This can be used to categorise the movements into facial
expressions corresponding to the basic emotions; happiness, surprise, anger, contempt (some
uncertainty), disgust, sadness and fear [25]. With FACS the facial expressions can be measured
objectively and hence the test participants’ emotions can be deduced. However, using FACS
is time consuming, and prone to bias since the analyser is subjective and thus different results
can be achieved depending on who is performing the analysis. Additionally, extensive training
is required in order to produce FACS ratings [11]. Therefore Hamm et. al developed an automated version for dynamic analysis of facial expressions (in neuropsychiatric disorders). Their
system tracks faces in video footage automatically and extracts geometric and texture features
which are used to produce temporal profiles of the movements of the facial muscles [11].
2.4.3
Audio Recordings
Similar to reading facial reactions, reading reactions from the voice has mainly been conducted
when two persons communicate or interact face to face. Today, modern systems have been
developed to interpret emotions from recorded voices.
Kostov and Fukoda did a study called Emotion in User Interface, Voice Interaction System
[22], where they researched and developed a UI for recognizing emotions in voices, regardless
of the speaker’s age, gender or language. Eight emotional states; neutral (unemotional), anger,
sadness, happiness, disgust, surprised, stressed/troubled and scared are extracted from a speech
database and examined for voice audio resemblance. A VIS (Voice Interaction System) was
developed from these results and it can be used to determine what emotions the voice reveals.
The VIS was developed based on analysis of human speech factors such as pitch, formants,
tempo and the power of the voice. After the speaker’s natural voice properties have been analysed, the VIS interacts with the speaker and the voice emotions are extracted. The researchers
have developed an emotion engine where a voice-based system reveals an indication of what
emotional state the speaker is in. Professional actors and actresses, as well as non professional
subjects as students, were used to record speech for the database, in order to get a reliable
basis of what similar acoustics are presented for the eight different emotional states. In order
to develop a cross cultural standard for voice emotion detection, students speaking Brazilian,
Italian, Spanish, Flemish, Japanese, Macedonian and English were analysed. The demand
for devices with the ability to recognise “emotional messages” is increasing, many users want
devices which understand what they want it to do without having to waste time getting it
to do it. Having the ability to achieve awareness and to interpret the user’s emotional state,
regardless of whom the user is will have a big benefit in HCI and adaptive systems based on
visual, kinesthetics and auditory. [22]. Altough a VIS will not be used in this research since
the analysis of the user’s acoustic emotions will be done manually, Kostov’s and Fukosa’s study
17
can be a valuable starting point. Their study presents what emotional states to examine and
what voice properties to investigate in order to achieve emotional interpretation.
2.5
Workflows for User Testing
Various workflows are used for different types of development, depending on which testing
methods and techniques that are used. There is no general workflow that is suitable for all
purposes, it has to be adjusted for the specific company, product and testing situation. It is
however possible to use existing workflows as guidelines when developing a testing workflow.
SRTs can be advantageously used in remote usability testing and below follows a description of
workflows designed for various testing methods and contexts, and also other information which
is of relevance to mobile games testing with session recording tools. These approaches focus
on the usability aspect of UX och needs to be combined with factors like the player’s emotions
and the flow of the gameplay in order to be used in evaluation of the PX.
2.5.1
Remote Usability Testing
According to Usability.gov [54], there are some important guidelines to consider when conducting remote usability testing:
• The tests should be about 15-30 minutes long and should consist of 3-5 tasks.
• The tasks should be straightforward and have well-defined end states.
• Include the minimum system requirements of both the tool and the product in the instructions.
• Make sure that the contact information for the test user is correct, allowing for follow-ups
and reminders if needed.
• Instructions and test materials should be prepared so that the users know what is expected
from them and also what they can expect from the practitioner.
• Consent forms for the test users should be prepared.
• If the participants are being compensated, make sure to have the compensation and
receipts prepared.
The main difference with remote testing compared to traditional observations is the technology.
Ensure that:
• Whatever product you are testing is available and accessible outside the network/firewall
of the company.
• There are no firewall issues preventing the users from testing the tools and accessing the
product.
• Participants can easily download or access the screen recording tool or service being used.
18
2.5.2
Mobile Applications
UserTesting has developed a checklist containing the four main steps to complete when conducting user testing of mobile applications [60]. These steps are:
1. Create a test plan
2. Organise the details
3. Run your tests
4. Analyse the results
The first part of the test process should focus on defining the testing objectives; what questions
are in need of being answered? It is also necessary to know which parts of the application
to test and to determine whom to test the application on. Another tip is to identify all
requirements, such as specific software, operating systems or devices. The second part, organise
the details, consists of making sure that all the details are correct, for a trouble free testing.
This includes making sure that the application is available for free to the test user, and that
the test participants also know how to access it. There should be clear written instructions, so
that the test participant understands what to do without unnecessary delays. It should also
be clear how to share files between the test participant and the moderator and if sound and
landscape or portrait mode should be switched on/off. The third part involves running the
test and hopefully meet the objectives from part one. It should be specified in detail what
actions the user should perform, if they for example should sign up for a newsletter, or if
they should explore the application on their own. Metric questions can be used, such as task
time (time for completing a task), or task difficulty (how easy the task was to complete on
a scale), and both pre and post test questions can be asked, involving for example the users
previous experience around the product or similar. Post questions can be about describing how
enjoyable, easy or difficult they thought the experience was. One tip is to ask the test users
if the application is something they would recommend to friends and family (NPS, see section
2.3.4). This question can give interesting insights into how enjoyable the test participant found
the UX of the application. The last part of the test process deals with analyzing the resulting
test data. A key indicator to determine whether the test has been completed in a correct way
or not, and if the task was fun and engaging or difficult and confusing, is to investigate the task
time. Try to determine what the user actually thought about the application by comparing
ease-of-use questions with value-based questions. The participant may have written that the
application is easy to use, but that does not mean that he or she would pay for it. Finally,
share a summarised and readily comprehensible report of the test with the development team.
In order to improve the test sessions to the next time; watch recorded test sessions together
with the team and ask what user actions they are missing and write it down in list form [60].
2.5.3
Mobile Games
The question “How do you conduct a good usability study for a mobile game?” published on
Quora (a question-and-answer website), was answered by David Suen from Betable [50]. Suen
says that their usability tests are designed after Steve Krug’s book “Don’t Make Me think”.
The book is developed mainly for web usability, but according to Suen it can still be applicable
for mobile testing. He mentions some key points they are using at Betable:
19
• Do user tests in small batches of 3-5 participants
It is sufficient to use smaller batches in order to identify patterns for frequently recurring
usability problems. Once the issues have been identified and resolved, do another batch
with new testers and find new problems. Do not expect that one test session is enough
to identify all of the usability issues.
• Getting test users
People in the near surroundings such as family, friends or people at meet-ups, are often
open to help testing the game. Make sure the experience is fun for them, and once they
have completed the test, ask them for other people they know who may be willing to
participate in a test. Usually, no compensation is needed but in many cases small things,
like buying them a beer, can enhance the experience even more.
• Length of test session
A test should be about 15-30 minutes.
Nemberg wrote a blog post about five common mistakes in game usability testing and how to
avoid them. They held a playtesting session at the Gamefounders game accelerator, where the
research was conducted. The frequent game usability testing mistakes, as well as the solutions
of how to avoid them, were [36]:
1. Too much guidance
Talk as little as possible when being the moderator of a test session. Being mute, and not
explaining any background information about the game for the test user, is absolutely
fine. Let the players find out how it works by themselves. They do need to understand
the game mechanics from installation to starting the game, otherwise it is necessary to
solve the problem and make it clear.
2. Assuming too much
The player can not be expected to understand everything in the in-game menu. Try to get
the test users to speak about the menus and items in the game, before starting the actual
game. Do the players understand what all of the UI elemens and buttons are meant for?
Are they understanding the navigations correctly? Many usability test moderators skip
the part with letting the participant use the start screen and menu, this is however not
recommended. Usability issues may be occurring also in the start menu.
3. Testing with just one demographic
Casual games usually have target groups consisting of players of various age and gender,
and if that is the case, the game should also be tested on all of these groups. When
testing for example learning games that are targeting young children, their parents, who
actually buy the games, should also be included in the test session. If the parent do not
like or understand the game concept, chances are high that they do not buy it.
4. Talking too much
The participant can get distracted if being asked too many questions, and it removes
their focus from the game world and the gaming experience. Instead, ask the players to
play the game and verbalise what they are experiencing. Is there any part or element
in the game that is triggering a strong emotional response, or is anything making them
frustrated? Try to avoid disturbing the player to the extent it is possible, and limit the
number of moderators to one per participant. The player can get confused if someone
20
new asks questions during the gameplay. Observe the body language of the player; do
they look relaxed or tense, does the body react to specific game actions? Also try to
notice when they get really excited, for example if their eyes tingle.
5. Not recording the sessions
Though taking notes is a good thing, it is not always enough. Try to also record the
actual tests, for games the recommendation is to use a screen recorder, preferably along
with an application that records the face of the player or an additional video camera
directed towards their face. By combining these recording methods with a skin response
sensor, the test results will gain even higher validity.
2.5.4
Session Recording Tools
Lookback has produced a guide on how to conduct user testing using their SRT [30]. Since
Lookback uses the built in camera of the mobile device, there is no need for additional testing
equipment after integrating their Software Development Kit (SDK) into the mobile application.
The testing steps which they recommend are the following:
• Decide on what to test. Is it for example the UX of an entirely new product or a specific
feature?
• Decide who to test on. Should it, for example, be employees, people who do not know
the product or existing users? Every group has different strengths and weaknesses, and
their feedback will depend on existing experience and mindset.
• Send the game application to the testers or let them come into the office. If the recording
is done remotely, the user behavior may be more accurate but the process less organised.
Sending the game application remotely can be done by using for example HockeyApp
which is a platform for distributing beta versions of iOS, Android or Windows Phone
applications.
• Decide on what to test, then compose instructions and questions (if desired) for the test.
Do not forget to write clear instructions on how to open the SRT (For Lookback it is
normally by shaking the device, unless another method has been specified).
Some example questions which might be useful in UX testing are:
• How would you perform [a specific task]?
• What is the game application about?
• How would you create a new account in the game application?
• What parts of the UI are most important?
Lookback also recommend to test early and repeatedly, the earlier in the development process,
the better. For staying on track with an user-centered approach, it is helpful to test repeatedly,
in order to know that everything is still working properly. It is recommended to make a habit
of testing at least once at every new release, even more often if possible [30].
21
3
Approach
This chapter presents the work methodology used to evaluate the various SRT candidates and to
produce the resulting workflow. The first section, 3.1, describes the production of the workflow,
next, section 3.2 presents the materials used. The research conducted is explained in section
3.3, and section 3.4 describes how the initial SRTs and services were evaluated. The following
section: 3.5, explains how the tools are integrated into the game application. Proceedings for
finding test users are presented in section 3.6, following information regarding creation of the
test plan in section 3.7 and distribution of the game application and the questionnaire in section
3.8. The last sections explain how the user test was executed in section 3.9, how the analysis
was conducted 3.10 and how the user feedback was managed, section 3.11.
3.1
Production of Testing Workflow
The aim of the thesis was to produce a comprehensive workflow for how the commissioning
company should conduct UX testing of mobile games using SRTs. The workflow was based on
the research in chapter 2 as well as the results in chapter 4 in combination with experiences
gained during the testing process. The workflow contains: tables displaying the properties of
the SRTs, information about what should be considered when carrying out the tests and how
to find test users and write questionnaires. The workflow also clarifies how to proceed when
analysing the results from the test sessions. The aim was to make it easier and faster for the
company to find an appropriate testing tool and to develop a good testing process.
3.2
Materials
Since some tools supported only iOS when the project was initiated, the decision was made
to integrate the tools into the iOS version of the game application. The computers used for
research and tool integration were two MacBook Pros with OS X Version 10.9.4. Xcode Version
6.1.1 was used to integrate the iOS testing tools into the mobile game. The application with
the integrated tools was tested on various iOS devices at MAG Interactive.
3.3
Research
Initially, research about user experience and testing methods was conducted in order to gain
knowledge about the topics of the thesis. The gathered information also covered UX in games
and mobile games and how it differentiates from regular software. A comparative study of some
of the most popular session recording tools and services found online was carried out. The qualified tools/services and their properties were compiled into two tables allowing a clean and good
overview when comparing them (see table 3 and 4, section 4.1). In order to reassure that
the correct information had been collected and to fill in gaps in the properties tables, e-mails
were sent to the SRT and test service companies in question (Lookback, UXCam, Beta Family,
UserTesting, AppSee, WatchSend, trymyUI, Userlytics, PlaytestCloud, UserZoom). An interview and introduction to the tool/service was also conducted with one of the companies, using
JoinMe. Existing methods and workflows were investigated, and interviews with responsible
testers at the commissioning company MAG Interactive were held in order to gain information
about the current testing procedures.
22
3.4
Evaluation of Session Recording Tools
Interviewing responsible Quality Assurance (QA) and UX testers at MAG Interactive was
valuable in order to find out which properties in the SRT tables (see table 3 and 4, section
4.1) that were most important in the comparison process. A preliminary testing workflow was
developed which contained the parts believed to be necessary in order for the test users to be
able to conduct user testing with the tools. This workflow was then used for testing the tools
considered to be of most relevance based on the requirements from the company and how these
correlated to the information collected from the websites of the tools and services. Based on
these test results, additional tables containing more specific information about the tools and
services were produced. The factors and properties included in these tables focus on which
test users to test on (demographics like age, location and previous, current or new players etc.
and also if testing can be conducted at the office or not), which platforms, how easy the tool
or service is to work with (based on our own experiences) and which services that are offered
by the tools or test services. The first draft of the workflow was used to test the tools and
services and this was then further refined in order to support the testing objectives for testing
mobile games with the help of SRTs. User tests were conducted both with tools which provided
recordings only (here it was up to the researcher to find test users, set up and distribute the
test and analyse the results) and with test services which provided test set up, distribution,
test users and recordings. The purpose of using these two different approaches was to compare
the test process and the results in order to see if there were any differences, or if any of the two
approaches would be preferable, but also to ensure that the workflow would fit both approaches.
The SRTs from table 3 which were selected for further evaluation and testing were Lookback
and UXCam. The tools Appsee and TestFairy were integrated and tested by recording ourselves
but these were excluded due to not fulfilling all requirements for the test objectives. WatchSend
did not offer an easy way to try out their tool and did not respond to e-mails regarding this
and therefore they were also excluded from the study. The test services from table 4 which
were tested and chosen for further investigation were PlaytestCloud and UserTesting. They
both provide session recording, test set up, test user recruitment and distribution of the game
application. The services Userlytics and UserZoom, were excluded from the study because
they did not provide a trial option where it was possible to test their services and they did
not respond to e-mails regarding this. UserZoom did offer a university programme where the
tool could be used for free but when a representative from the university tried to contact them,
they did not reply. Beta Family’s SRT SuperRecorder was integrated into the game but the
tool made it act oddly and it was not possible to play the game in a correct manner and record
the session at the same time, hence only Beta Family’s other test services were included in the
study.
3.5
Integration
The integration was conducted by following the instructions available on each tool’s website
(see bottom line in table 3 and 4, section 4.1) for manually installing the SDKs of the tools
into the test object, i.e. the game application Ruzzle.
3.6
Finding Test Users
When recruiting test users for the test sessions where Lookback’s and UXCam’s SRTs were
used, a recruitment letter was written and sent out through social media. The message was
posted on the Facebook walls of the authors and in various Facebook groups accommodating
23
a vast amount of users. However only one participant was recruited through this channel. The
rest of the participants for these test sessions were recruited by asking friends and family known
to have iPhones and by asking family members at family get-togethers to take part using a
borrowed device. Test sessions with Lookback and UXCam were also carried out using Beta
Family for recruitment of test users. When using UserTesting and PlaytestCloud, the test user
recruitment was included in their service.
3.7
Creating the Test Plan
A test plan was created in order to inform the test users about how to perform the test. This
plan contained a declaration of informed consent, instructions, tasks and pre and post gameplay
questionnaires. Since the tools work differently from each other, specific instructions had to be
specified for each tool. See appendix C, D, E, F, G for instructions and questionnaires used when
carrying out the user tests. An effort was made to try to have the same pre test instructions
and post gameplay questionnaire for all test sessions but since the tools and services differs this
was not always possible. To ensure that the test plan and the questions were satisfactory and
would result in gain of the desired information about the test object, the test plan was sent out
to the entire development team asking for their feedback. When this feedback had been taken
into consideration, two pilot tests (one for Lookback and one for UXCam) were carried out by
two test users at a remote location before it was made accessible to the actual test users. See
appendix H for the additional questions included in the pilot tests.
3.8
Distribution
The test services who provide test users often also provide distribution of the application, but if
they do not, it is necessary to find a suitable distribution option. In order to distribute the game
application to the test users, a test service such as Beta Family’s SuperSend, Crashlytics or
HockeyApp is needed. An additional service for distributing the questionnaire is also needed,
such as Google Forms or a website containing a form for submitting user data. There are
however session recording tools that also provide game application distribution, set up of test
and recruitment of test users. During the course of this project, Beta Family’s SuperSend
was used for distribution of the application and Google Forms was used for distribution of test
instructions, declaration of informed consent and the post gameplay questionnaire. When using
test services like PlaytestCloud, UserTesting and Beta Family, distribution of the application
and the test was included in the service.
3.9
Execution of User Tests
In order to perform the user tests without a test recruitment and distribution service, the game
application and test instructions needed to be distributed to the recruited test users. The
test documents consisted of information regarding the test (see appendix A), a declaration of
informed consent (see appendix B), pre gameplay questionnaire (see appendix E), instructions
and tasks (see appendix C and D) and finally a post gameplay questionnaire (see appendix F).
In the instructions it was also explained how to use the SRT; how to start the recording and
how to upload the video. When the tool/service allowed for an unlimited amount of post test
questions, a survey regarding the session recording tool and the test user’s testing preferences
was included, see appendix G. This survey was conducted after the test users had completed the
regular part of the test, including the post gameplay questionnaire. This was because we did
24
not want the survey to interfere with the PX or the result of the post gameplay questionnaire.
In order to participate, the users had to download and install the game application on their
mobile device. Since Google Forms or Beta Family was used for test set up in this part of the
study, the users had to read the instructions in a web browser on a second device like a computer
a or tablet. This was the only way to digitally enable the users to read the instructions while
at the same time performing the tasks on their mobile device.
When using test services which recruited and distributed both game application and test,
the tasks and questionnaire were set up from the services’ websites. It was also possible to
specify demographics for the test users, only allowing users which matched the requirements to
participate in the test. In this study all test users above the age of 20, which had not played
the game Ruzzle before, could participate, regardless of their gender. The recruited test users
were then able to partake in the test and complete (or at least try to complete) the tasks in the
game application while the test session was being recorded, and then answer the questionnaire.
3.10
Analysis of Session Recordings
The recorded sessions from each of the services were uploaded and became available at their
corresponding websites, where it was possible to analyse the recordings and create annotations. The majority of the researched SRTs and test services call the functionality for adding
a comment at a specific time in a recording for ”annotations”. To avoid confusion, annotations is therefore the term which will be used all throughout this thesis, regardless of what the
tool/service itself calls it. The analysis was carried out in the researcher environment for each
tool and annotations were created for every interesting observation in the recording. This could
be regarding misunderstandings and difficulties in the navigation or when playing the game.
It could also be in regard to the mood of the test user, if he or she seemed to like something
and seemed happy or if they got annoyed or frustrated over something. Annotations were also
made regarding if the test user managed to complete the tasks or not and if he/she had any
difficulties completing them, also feedback from the test user or other observations about their
behaviour and reactions were noted. These annotations were later used when summarising the
user experience.
Since some of the test sessions were carried out with our friends and family, we made sure
to analyse the session recordings where the test users were unfamiliar to us. This was in
order to avoid being influenced by already knowing the users and how they express themselves,
making sure this would not affect the composition of the workflow or the evaluation of the
tools. Normally, the test observer and the test users do not have a relation to each other.
3.11
User Feedback on the Test Object
When the session recordings had been analysed, a document with feedback about the game
was compiled for the commissioning company. The focus of the user testing was not only to get
information about the session recording tools but also to collect feedback about the on-boarding
process of the mobile game Ruzzle. It is important to compile the analysed results into tangible
feedback in order to be able to address the issues.
25
3.12
Final Evaluation of Session Recording Tools and Test Services
Documents with strengths and weaknesses and the overall experience of the tools were compiled
in order to be able to compare and evaluate the tools. The information which was of value for
the tools and services themselves were sent to the respective company by e-mail.
26
4
Results
This chapter contains the collected information about the SRTs and test services investigated
in this study and also the resulting workflow which has been produced based on observations,
experiences and knowledge gained from this study. Section 4.1 provides information about the
SRTs and test services which were initially investigated. Two tables are presented were the
properties and features of the services can be compared. In section 4.2 the tested SRTs are
being compared on a deeper level based on observations and experiences from using the tools.
Properties, strengths and weaknesses are presented. In section 4.3 focus is on comparing properties, strengths and weaknesses of the test services which are providing test set up, recruitment
of test users and distribution of test and application. Since some of the tools/services provide
both test service and session recording, they have been included in both section 4.2 and 4.3.
Section 4.4 contains a table where the tested SRTs and test services have been graded based
on their performance in this study. The outcome of the user testing is presented in section 4.5,
and the final workflow is displayed in section 4.6.
4.1
Test Services and Session Recording Tools Initially Investigated
The SRTs initially investigated during the research phase are displayed in table 3 and the test
services which offer session recording and also provide test user recruitment, test set up and
distribution of the game application are presented in table 4. The information about the tools
and test services that is displayed in the table have been gathered from their websites or by
contacting them through e-mail (information collected in February 2015).Some of the tools and
services from the table have been excluded from further evaluation due to them not giving access
to try out the service, not responding to our emails regarding this or not fulfilling necessary
requirements (e.g a high enough frame rate). The tools and services which were selected for
further investigation were: Lookback, UXCam, Beta Family, UserTesting and PlaytestCloud.
27
Table 3: Session recording tools included in the initial investigation.
28
Table 4: Test services which also provide SRTs and were included in the initial investigation.
29
4.2
Tested Session Recording Tools
The SRTs presented below have been integrated into the target game and tested. The strengths
and weaknesses presented for each specific tool (for Lookback see section 4.2.1, PlaytestCloud
see section 4.2.3, UXCam see section 4.2.2 and UserTesting see section 4.2.4) is based on our
experiences during this thesis work. In order to compare the UX researcher environment (where
the recorded sessions can be analysed) for each of the SRTs, a table was compiled, see table 5.
Some aspects being considered are organisation of tests and recordings (which is provided in
the dashboard), downloading and sharing possibilities and annotation features.
Table 5: Features available in the UX researcher environment (where recordings can be watched
and annotated) for the respective services.
When conducting user testing with SRTs, it might be desirable to for example be able to
customise the tool i.e. to change the default settings of the tool. Table 6 displays if it is
30
possible to customise the tool settings and if it is possible to preview the video, etc. The table
also specifies if the tool censors password fields and keyboards, if advertisements are visible in
the recordings and if the recordings are saved in the case of an application crash.
Table 6: Properties for the SRTs.
4.2.1
Lookback
Lookback [31] is a tool for recording feedback, user experiences and bug reports. The tool uses
the front camera of the device and records both facial reactions, audio and the screen. The focus
of the website is modern and stylish, it is easy to navigate but it lacks some functionality and it
does not seem completely stable. Lookback’s UX researcher environment is displayed in figure 4.
31
Figure 4: Lookback’s UX researcher environment.
Lookback’s strengths
Lookback’s weaknesses
• Clear progress bar which displays the
name of the current view.
• Items are spread out over the screen.
Far to move both mouse and eyes between the progress bar, face recording,
screen recording and annotations.
• Good quality of video, voice and also
game sounds.
• Several issues with annotations: too
many steps to create an annotation,
not possible to edit, ordered according to creation date and not according to the timestamps (makes it confusing and disorganised). There is also no
timestamp when clicking “post a comment” directly.
• Censors password and keyboard.
• Possible to click on an annotation and
move to that time in the recording, also
easy to share a link directing to a specific timestamp with a colleague.
• Possible to add tags, add videos to
projects, to favourise and organise the
videos in an easy way.
• Does not record game application
crashes.
• Possible to customise settings, e.g. giving the test user the opportunity to preview the recording before uploading.
• Still not a completely reliable tool. The
screen was not recorded for all sessions
even though it was supposed to be and
some recordings were not uploaded at
all.
32
4.2.2
UXCam
The advantages of using UXCam [62] is that it provides a lot of extra metrics, such as heat
maps and possibility to see which view in the game application the user is in and also how the
user swipe and navigate through the game. Another advantage is that it is possible to change
the settings for the tool, directly from the website instead of changing the application code or
ask the test user to change settings in a menu in the application itself. The big drawback is
that the website and the SDK is under development and currently unstable, frequently making
it impossible to access and analyse the videos. The UX researcher environment is displayed in
figure 5.
Figure 5: UXCam’s UX researcher environment.
33
UXCam’s strengths
UXCam’s weaknesses
• The website and the tool are under development and are currently unstable.
Many functions are not working, sometimes it is not possible to view the uploaded sessions and sometimes the facial recordings are not uploaded at all.
• The progress bar with timeline and annotations also display which direction
the user swiped on the screen or if they
tapped it.
• UXCam has many extra features
(heatmap, navigational flow and statistics with visualisations for the tests).
• Not possible to download recorded sessions or annotations.
• It is easy to share a recorded video or
a specific clip from the video.
• Does not upload the recording if the
application crashes.
• The administrator can change settings
(video quality, use front camera etc) directly from the website.
• The annotation feature is not working
properly, sometimes it does not work
to add annotations and sometimes they
disappear.
• The user do not have to specify any
specific options when starting the session recording, they just accept a popup asking if it is okay to record the camera input.
4.2.3
• It is difficult to navigate in the video
using the progress bar.
• It is not possible to start the tool again
on the same device if the test user has
declined the first pop-up asking to start
the recording.
PlaytestCloud
PlaytestCloud [46] specialises in testing of games. The UX researcher environment can be seen
in figure 6. The biggest advantage with PlaytestCloud’s SRT is that it continues recording
when restarting the game after an application crash. This separates the tool from the others
investigated in this study. A drawback is that it does not provide facial recordings.
34
Figure 6: PlaytestCloud’s UX researcher environment.
PlaytestCloud’s strengths
PlaytestCloud’s weaknesses
• Continues recording after an application crash.
• No facial recording.
• Does not censor sensitive input fields
such as passwords.
• Do not have to implement a SDK.
• Possible to download sessions and csv
file with annotations.
• The dashboard and the researcher environment is simple and easy to use.
• The annotation function is working
smoothly (easy to add new, video
pauses automatically, easy to edit, possible to add annotation at -5 or -15 seconds etc.).
35
4.2.4
UserTesting
UserTesting [57] has a lot of experience from conducting user tests and they offer many services.
Their biggest advantage is that they offer all of the desired functionality. However, in order to
record the screen directly in the device it is necessary to integrate the SDK (which is currently
available in Beta). Otherwise the user will record the test session using a web camera which will
decrease the recording quality remarkably. The researcher environment is presented in figure
7.
Figure 7: UserTesting’s UX researcher environment.
36
UserTesting’s strengths
UserTesting’s weaknesses
• Annotations are not automatically
scrolled when the video is played,
makes it a bit difficult to follow in the
annotations.
• Possible to rotate the recording and to
increase playback speed. Also possible
to jump -5 seconds in the recording.
• Smoothly working annotation functionality (automatically pause when
typing in a new annotation, save annotation by pressing enter, current annotation is highlighted when the video
is playing, possible to edit annotation,
etc.).
• Does not censor password fields and
keyboard.
• The recordings were of bad quality
when the SDK was not implemented,
since they were recorded using the test
user’s web camera.
• The user gets the test instructions and
the tasks on their device screen so they
need no additional devices to take part
in the tests.
• The tasks are visible in the game application which may withdraw attention
from the game, especially since there
seemed to be some difficulties for the
user to open and close the menu, but
this depends on preferences.
• Possible to create highlight reels directly on the website.
4.3
Distribution and Test Set Up Services
The services which offer distribution and test set up and have been tested in this study are:
Beta Family, PlaytestCloud and UserTesting. All of these services provide test users, although
the recruitment process differs. PlaytestCloud and UserTesting also provide a SRT, while the
Beta Family service used in this research has been used only for test set up and distribution
together with the SRT’s Lookback and UXCam. Test set up properties regarding tasks and
questionnaires can be seen in table 7. Factors such as demographics and the possibility to
contact the test users are also displayed in table 7. The strengths and weaknesses presented
for the test services (for Beta Family see section 4.3.1, PlaytestCloud see section 4.3.2 and
UserTesting see section 4.3.3) is based on our observations and experiences during this thesis
work.
37
Table 7: Features for test set up and distribution services.
4.3.1
Beta Family
Beta Family provides several test services. There is SuperSend, a free distribution service where
it is possible to upload an application and send a description and a link to anyone by mail in
an easy and quick way. SuperSend is suitable when creating a test without using a test set
up service, which is explained in more detail below in section 4.3.4, Distribution Without Test
Set Up Service. Beta Family also has a SRT which is called SuperRecorder, but this is still
in an early development stage and therefore it has not been further investigated in this study.
It does contain several nice features, such as test instructions and tasks being displayed in
the application, feedback for uploading of recordings, a face positioning functionality and it is
38
possible to view all of the recordings directly from the SRT interface on the device. But since
the game could not be played properly while SuperRecorder was recording and the researcher
environment is missing functionalities like annotation possibilities, it could not be used in this
study.
The service that has been investigated in this thesis work is Beta Family’s standard test
service, which is displayed in figure 8, where it is possible to do all parts of the user testing at
the same place. They provide test set up services, recruitment of test users and distribution
of the application. It is also possible to specify if the test should be private (where the users
are handpicked) or public (where all users can sign up). It is also possible to choose whether
the users should get paid or not, many users are willing to take the test anyway since they can
get higher ranking if they perform well. It is possible to invite colleagues, friends and family
to participate in the test for free.
Figure 8: Beta Family’s test set up service.
39
Beta Family’s strengths
Beta Family’s weaknesses
• Not possible to specify a screener question to ensure the test user is of the
desired target group, and make sure
that the user fulfills specific requirements before taking the test. It is possible to specify test user requirements,
but this do not provide the same insurance as a screener question since users
can ignore the text and participate in
the test anyway.
• Good overview of each part necessary
for setting up a test. Easy to create
new test, possible to use previously created tests and questions. It is also possible set test deadlines for how long the
tests should be active.
• Easy for the test users to get an
overview of the test tasks and give feedback in each section.
• The administrator has a lot of control
on who takes the test (if private test)
since it’s up to the test facilitator to
invite users, it is also possible to specify
demographics, devices and to contact
the test users.
• Would like more alternatives for setting
up various test questions, it is not possible to specify for example a multiple
choice question like ”pick 3 emotions”
and limit the test users to only click exactly 3 checkboxes. The users can still
add more or less answers.
• Rating system - there is information
available about how many reports the
user has submitted and how they have
been rated.
• Not possible to see the specific date and
time that the report was submitted.
• Difficult to know which of the test users
that should be connected to which session recording since the test user information and their response is uploaded
to Beta Family and the recording is uploaded to the dashboard of the SRT.
• Statistics functionality which gives an
overview of the test users and their gender, age, country and device.
40
4.3.2
PlaytestCloud
PlaytestCloud has a very easy test set up service, but the drawback is that it is not possible
to specify any tasks and only a maximum of five post gameplay questions. However, the
recruitment service is fast and the test users generally give honest and valuable feedback. Since
the test users seem to play a lot of mobile games, they can give good usability feedback and
compare the game and it’s concept to other similar games. This can be both positive and
negative, it can present a lot of valuable feedback but sometimes it is desirable to test the game
application on users that are not very technically skilled or do not have that much previous
experience of games.
Figure 9: PlaytestCloud’s test set up service.
41
PlaytestCloud’s strengths
PlaytestCloud’s weaknesses
• When creating a test on the website, it
is not possible to specify: tasks, customised demographics, requirements
(including devices), a screener or instructions without contacting the company. It would be faster and easier if
this could be handled directly in the
test set up.
• Finds test users within 48 hours.
• Creating a new test is straightforward
and easy.
• Can see what games the test user
normally plays, valuable information
about the test users.
• The users are really thorough and give
interesting feedback.
• Not possible to contact the users directly, without first contacting the
company.
• The video, annotations and survey results are displayed in the same view
giving a clear overview.
• Can only add a limited number of post
test questions.
• No ranking system, not possible to review users or see how many tests they
have participated in.
42
4.3.3
UserTesting
UserTesting’s test set up is very extensive and all desired features are avaiable (depending on
the subscription plan). The tests in this study were carried out using a trial of UserTestings
PRO subscription plan and hence the following observations are based on the features included
in that subscription plan. The UI seems a bit outdated, and in the beginning it can be difficult
to find everything. UserTesting’s biggest advantage is that they are very fast, recordings can be
collected within one hour. The demographics and possibility to choose a specific target group
is also extensive, making it possible to specify exact demographics requirements and to retrieve
a lot of information from each test user.
Figure 10: UserTesting’s test set up service.
43
UserTesting’s strengths
UserTesting’s weaknesses
• Has screener questions.
• The fact that the test tasks are visible in the app may withdraw attention
from the game, especially since there
seemed to be some difficulties for the
user to open or close the menu. But
this depends on preferences.
• Possible to run tests with several demographic groups simultaneously.
• Profound information in the “User’s
profile” view.
• The timestamps on the tests are not in
local time, however this is not a major
issue.
• Recordings are available within an
hour.
• Test instructions and tasks are displayed in the test user’s device so there
is no need for an additional device to
be able to take part in the test.
• Possible to duplicate a previous test
and alter it to fit the new test objectives.
44
4.3.4
Distribution Without Test Set Up Service
When testing with SRTs like Lookback and UXCam, it is possible to create and distribute the
test plan without using a third party test set up service. These tools do currently not offer
any test set up and distribution services and therefore this can be carried out according to
preference. This means that it is possible to record test sessions performed by test users and
colleagues at the office or another suitable location. When conducting remote user testing, it
is necessary to send the test plan (containing test instructions, tasks, questionnaires, etc.) to
the test users. This can be done using various approaches. In this study the test plan was
created using Google Forms but for example a customised website could work just as well. In
order to distribute the application, however, it is easier and safer to use an additional service
where the installation file for the application can be uploaded and the test user can download
it. The service used in this study was Beta Family’s SuperSend, where it is possible to specify a
message to the test users in an e-mail along with the installation file for the application. There
are also other distribution services available, like for example HockeyApp and Crashlytics. In
order to avoid using any additional test services, test users were recruited amongst friends and
family.
Strengths
Weaknesses
• More control of the entire test process.
• Can not do everything with one service.
Need at least two services, one for test
set up and one for distribution of the
application.
• Can specify and customise the test plan
without any restrictions.
• Possible to handpick the test users.
• More information is needed in the test
plan which places higher demands on
the user. Many things are handled automatically when using a test service.
• Suitable if the game is in a sensitive
development stage, to avoid confidentiality issues (less risk that information
about the game and the idea will be
leaked if testing on people you trust
or if having them sign a NDA (nondisclosure agreement) )
• Takes time to create and set up a test.
• Takes time to recruit test users.
Even though test users were recruited
amongst friends and family in this
study, it was time consuming and some
people were reluctant to participate
since they did not like to be recorded.
• Difficult to connect user to survey response. The user has to speak a unique
ID out loud and this has to be connected to the correct survey. This
means that it is not possible to send
out the test to all test users at once,
(if the test users are supposed to be
anonymous).
45
4.4
Comparison of Test Services and Session Recording Tools
In order to get a clearer overview, and to be able to decide on which SRT and test set up service
that are most suitable, the experiences from this study have been summarised into a grading
table. Based on our analysis, features of the SRTs and the test services were compared and
graded on a scale of 1 to 5, the result is displayed in table 8. The grades are explained and
justified in the discussion in section 5.
Table 8: Grading of tools and services.
4.5
Outcome of Test Session Analysis
In the following section, statistics from the test sessions will be displayed in form of charts.
Additionally, insights gained during the test sessions have been documented.
In total 26 test users were recruited through the different channels but only 13 complete recordings could be collected for analysis. From these 13 recordings, two were filmed with web camera
and not with a built in SRT and were therefore disregarded, another recording was disregarded
due to wrong platform (tablet). Additionally, one recording which had face recording but lacked
the screen recording was analysed. This means that in total 11 recordings were analysed. The
rest of the recordings failed to upload in some sense; the facial recording was not uploaded
(only the screen recording), the recordings were not uploaded at all, only a couple of seconds
of the recording was uploaded, or it was not possible to view the recording due to file error.
One user also had problem with the application crashing multiple times and did not want to
complete the test session. Some additional videos were disregarded since the test users did not
meet the test user requirements (they had previous experience of playing the game) and one
recording was uploaded after the analysis deadline had passed and was therefore not analysed.
46
4.5.1
Questionnaire Results
A total of 26 survey responses were collected. These include survey results from all test sessions
which were carried out with the SRTs specified in section 4.1. Hence, the responses regarding
a specific tool are based on a smaller group of respondents. The age distribution among the
participants is displayed in figure 11a, while the gender distribution is presented in figure 11b.
Age
15%
20-30
31-45
46-65
54%
31%
(a) Age distribution. Total number of test participants: 26.
Gender
0%
Male
Female
Other
46%
54%
(b) Gender distribution. Total number of test participants: 26.
Figure 11: Age and gender distribution from a total of 26 post gameplay questionnaire respondents.
47
The test users who were recorded using UXCam or Lookback were asked some additional
questions in regard to the session recording and the testing methodology. This was possible
since Google forms and Beta Family had no limitations on the number of post test questions.
There was a total of 15 survey participants who answered these additional questions, and out
of these 7 had completed the test session using UXCam and 8 using Lookback. When using
PlaytestCloud and UserTesting the post gameplay questionnaire focused only on the gaming
experience.
The test users were asked if they would prefer to be in control of starting and stopping the
recording by themselves or if they would prefer that it is handled automatically, removing one
step in the test process. The test users have been divided into subgroups depending on which
of the tools Lookback and UXCam that they used. The total result is also displayed, see figure
12a. The test users who were recorded using UXCam were asked if they would prefer to be able
to preview the recording of the test session before uploading it to the facilitator. The users who
were recorded using Lookback and had the possibility to preview the recording, were asked if
they liked the possibility to preview the recording before uploading it and if they had used this
functionality. The result is displayed in figure 13. The respondents which answered that they
did not use the preview option were also the one which replied ”don’t know” when they were
asked if they liked the possibility to preview or not.
In order to investigate if the test users appreciated to take part in the test in their natural
environment, they were also asked if they would have preferred to do the test at home on their
own device or at a test facility where they could be observed in real-time. A majority of the
respondents stated that they would prefer to participate in the test at home using their own
device, the result is displayed in figure 12b.
48
8
Number of Respondents
7
Lookback
6
UXcam
Total
5
4
3
2
1
0
Automatically
Manually
Not sure
How to start/stop recording
(a) Manual or automatic recording. The test users which were recorded using
Lookback or UXCam were asked whether they would prefer if the recording
would start automatically or if they would prefer to start it manually.
How would you have preferred to do the test?
At home on my own device
20%
At a test facility where I would get
observed in real time
80%
(b) Where do the test users prefer to conduct the test session, at a remote
location like at home or locally at a test facility.
Figure 12: Test users preferences to start and stop the recording manually or automatically
and where they would have preferred to conduct the user test. There was a total of 15 survey
participants, out of which 7 had completed the test session using UXCam and 8 using Lookback.
49
Would you have preferred to be able to preview the
recording before uploading it?
Number of Respondents
3,5
3
2,5
2
1,5
1
0,5
0
Yes
No
I don't know
Replies
(a) Do the test users appreciate the possibility to preview the recording of
the test session before uploading it to the facilitator or not.
Number of Respondents
7
6
5
4
3
2
1
0
Liked the preview
functionality
Not sure
Did not like the
preview
functionality
Previewed the
Did not preview
recording before
the recording
uploading
before uploading
Replies
(b) Lookback preview functionality. The test users which were recorded
using Lookback were asked if they previewed the recording before uploading
it and if they liked the possibility to do so.
Figure 13: The test users’ preferences regarding preview functionality in the session recording
tool. There was a total of 15 survey participants, of which 7 had completed the test session
using UXCam and 8 using Lookback.
50
4.5.2
Insights Gained from Screen, Facial and Voice Recordings
We did not notice any substantial differences between analysing sessions with facial recording or
without. How much information that could be extracted from the facial recording depended on
the personality and the body language of the test user. Some recordings provided information
in form of facial expressions, some which could be related to the feeling stated in the post
play questionnaire. While some facial recordings did not seem to provide much additional
information at all. In some cases it was valuable to see when the user was paying attention
to the test session and when not. The facial recording also gave a better overall picture and
subconsciously conveyed a sense of closeness to the UX researcher.
4.5.3
The Test Object: Ruzzle
The observations from the analysed recordings from the test sessions were compiled into a
document containing feedback on the mobile game application Ruzzle. Due to confidentiality
reasons, and the fact that this is outside the scope of this thesis there is no need to go into
further details in this report.
4.6
Resulting Workflow
Below is the developed workflow, which is a part of the results of the thesis work. It has been
developed as guidelines for MAG Interactive, with instructions for how to conduct user testing
with the use of session recording tools in order to evaluate the PX. In this section, the workflow
has been made concise and easy to read, but it is further explained and discussed in section
5.4. The focus is on mobile games, but the workflow has been made general in the sense that it
can be suitable regardless of what the test objectives are and what test service and SRT that
are being used. This was a conscious decision, since the available tools are constantly being
updated and new tools are becoming available, thus the recommended tools can vary in the
future. The tested tools and services are also suitable in different situations. Because of this,
the workflow explains what to think about when choosing a SRT and a test service, instead of
simply stating what SRT and test service to use and how. It is important to conduct the user
testing often and iteratively, preferably for every new feature or release.
4.6.1
Test Plan
The list below displays eight main steps for creating a test plan.
Consult with the team throughout the test period regarding what should be tested - what
questions are in need of being answered - and ensure tasks and questions are in agreement
to achieve that. In order to correct the issues discovered during user testing, a summarised
analysis and highlights from the recordings should be shared with the team.
1. Test Objective - Decide on what to test
2. Test Users - Decide on whom the test users should be
3. Tool and Test Service - Decide on which tool and test service to use
4. Time Plan - Set a time frame for the entire test
5. Prepare Test Details
51
• Preparations
• Instructions
• Pre Gameplay Questionnaire
• Tasks
• Post Gameplay Questionnaire
6. Perform Test
• Pilot Test
• Actual User Tests
7. Analysis
8. Summarise Results and Share with the Team
A more detailed explanation of the different steps is displayed below.
4.6.2
Test Objective
First, decide on the test objective:
• What questions are in need of being answered?
• What part of the game should be tested?
4.6.3
Test Users
Decide on whom should test the game, should it be the main target group of the game or a
group of test users who are new to the game?
Examples of factors to consider are:
• Age
• Nationality
• Gender
• Casual or Hardcore gamers (i.e. how often they play mobile games)
4.6.4
Tool and Test Service
A test service refers to an online service where a user test can be created, the installation file
for the application can be uploaded and both test and application can be distributed to the test
users. Some test services also provide a session recording tool, and some offer recruitment of
test users. Tools (SRTs which are installed into the application and records the screen) can also
be used without a test service, where the UX researcher takes care of test set up, distribution,
recruitment and analysis personally. It is also possible to combine a SRT together with a test
service which provides test set up, recruitment of test users and distribution of the test and the
game application (for example the SRT Lookback with the test service Beta Family).
52
Deciding on which test service to use depends on:
• Resources
How much time and money can be spent on the user test?
• Control
How much should the researcher be able to control and customise the test? What specifications are needed in the test set up and how specific demographics for the test users
are needed?
• Confidentiality
If the game or concept is not yet launched and should be kept private, it might be desirable
to not use a third party company to handle the testing. Confidentiality aspects also affect
if the user testing should be conducted locally or remotely, i.e. for privacy reasons it might
be conducted locally at the office.
4.6.4.1
Session Recording Tool
A session recording tool is integrated into the application using a SDK and it is then possible to
record the screen, sound and in some cases also the input from the front camera of the device.
The SRT also provides an online dashboard where it is possible to view the uploaded recordings
and add annotations.
Table 9: Choice of session recording tool.
The choice of tool is affected by:
• Development Stage
Some SRTs will not continue recording after an application crash, resulting in the session
recording not being uploaded or other recording issues. The risk for application crashes
53
is generally higher in early development stages. Different tools might be more suitable
depending on which development stage the game application is currently in. See table 9
for information about which tools that support recording of sessions even if the application
would crash.
• Platform
Make sure that the SRT supports the platform of the game application, see table 9 for
information about which tools that supports the desired platform.
• Facial Recordings
Not all SRTs provide facial recordings. If there is a need to test the UX of the game using
facial recordings, see table 9 for information about which tools who provide this.
4.6.4.2
Distribution and Test Set Up
When conducting remote user testing, a test has to be set up and distributed to the test users
together with the application. The SRTs Lookback and UXCam do not provide test set up
and distribution, hence they need to be combined with an additional service such as Beta
Family’s SuperSend where it is possible to upload the application with an integrated SDK
from an independent SRT. PlaytestCloud and UserTesting have their own session recording
tools, hence it is not possible to use them for distribution and test user recruitment. When
using UXCam or Lookback it is also necessary to use a separate service for test set up and
distribution of test instructions, for example Google Forms, a custom made website or Beta
Family. When using SRTs and test distribution services which are independent from each
other, it is important to make sure that the recordings and the answers to the post gameplay
questionnaire can somehow be related to each other. This can for example be solved by giving
each test user a unique ID which they have to state in both the recording and the post gameplay
questionnaire.
54
Table 10: Choice of test set up and distribution service
The choice of distribution service depends on the factors:
• Time Frame
What is the time frame for the test, i.e. how much time is available? Some test services
that provide test users and recordings are faster than others, see table 10 for information
about the time it takes to gain access to the videos.
• Recruiting Test Users
Table 10 displays which test services that provide recruitment of test users. In some
services, it is possible to specify whether the test should be private or public and also to
invite own test users or choose from their test user base. If there is a lot of time available,
it is also possible for the researcher to personally handle the recruitment process and
search for test users in the streets, cafés, social gatherings or online through forums or
social media.
• Specified Test User Demographics
How specific should the test user demographics be? All test services provide some kind of
test user demographics, but it is not always possible to specify additional requirements.
55
See table 10 for information about which services that provide the possibility to specify
special demographics requirements.
• Contacting Test Users
If there are any questions regarding what the test user experienced, for example if something happened that was not displayed in the recordings or explained in the questionnaire,
it might be desirable to contact the user for follow-up questions. See table 10 for information about which services that support contacting test users.
• Longitudinal Studies
Sometimes it is desirable to perform longitudinal studies by carrying out several tests
with the same test users. The services which support this are displayed in table 10.
• Easy and Quick Test Set Up
As can be seen in table 10, all the investigated tools have a quick and easy test set up.
If the goal is to create the test quickly, without a need for specifying an advanced and
customised test, then PlaytestCloud is a good option.
• Customisable Test Set Up
Sometimes there is a need to design the test exactly according to a detailed description,
which might be problematic if the required functionality is not available from the test
service. For example if a question should be answered with multiple choice alternatives.
See table 10 for information about which test set up services that provide a customisable
test set up.
4.6.5
Time Plan
Make a schedule and set a time limit for all parts of the user testing process:
• Plan and prepare test details (including preparations, instructions, tasks and questionnaires)
• Integrate SRT (if using a SRT where integration is required)
• Distribute application and test to the pilot tester
• Time for test user to conduct pilot test
• Correct issues found in the pilot test
• Distribute application and test to actual test users
• Time for test users to perform the test
• Analyse recordings and questionnaires
• Summarise, and share the results with the team
56
4.6.6
Prepare Test Details
Prepare the background information necessary for the test users to perform the test. It is not
always possible to specify background information when using a test set up service, see table
14 to see which services that provide this.
Figure 14: Properties for test set up.
4.6.6.1
Preparations
• Declaration of Informed Consent
It can be useful to have the user sign a declaration of informed consent in order to avoid
legal issues by making sure the users are okay with being recorded.
• Non-Disclosure Agreement
If needed, prepare a non-disclosure agreement (NDA).
• Specify Technical Requirements
For example that a Wi-Fi connection is necessary for uploading the video or if the user
test should be performed on a specific device etc.
4.6.6.2
Introduction
• Test information
What is the test about and what mindset should the user have.
• Specify limitations in the SRT
This depends on which tool that is being used. In the case where the application can
not be closed/sent to the background during the test, the test users should be informed
about this. The test user should also be informed whether or not the recording can be
paused and resumed during the test session.
57
• Specify limitations in the test object
Specify whether the game is live or not and if there are any parts of the game which
might not work. Be sure to specify if, for example, in-app purchases are not working in
the test version of the game.
4.6.6.3
Instructions
• Length of game session
How long the test users should play the game.
• Start recording
How to start the recording of the SRT.
• How to upload the recordings
Write instructions of how the test user can stop and upload the recording with the SRT.
4.6.6.4
Screener
A screener is a question that the test user is asked before beginning the actual test, the researcher gets to specify the correct answer, thus only test users answering a specific alternative
gets to continue to the test. Should only specific test users be included in the study? For example, if testing how understandable a new feature on the latest release is, it might be desirable
to test on users that have not played or not updated to the latest version. However, not all
test services provide a screener feature.
4.6.6.5
Pre Gameplay Questionnaire
A pre gameplay questionnaire contains questions that the test user should answer before playing the game. It is not always necessary to have pre gameplay questions when using a test
service, since information about the test user (age, gender, etc.) is already available in their
profile view. When the test set up and user recruitment is handled personally by the researcher
however, pre gameplay questions can be valuable. Try to keep the number of questions as short
as possible, preferably between 3-5 questions.
Examples of pre gameplay questions:
• Age
• Gender
• How often do you play mobile games?
• What games do you usually play?
• Have you played [the name of the game] before?
58
4.6.6.6
Tasks
Tasks are pre-defined activities that the test user should perform while using the app. The
recommended number of tasks are 3-5, also here it is important to keep it short and simple. It
might also be desirable to have no tasks, or simply one task saying play the game, which is also
fine. When testing a specific part of the game, however, it might provide more information if
the user performs tasks in relation to that area.
Examples of tasks:
• Create an account
• Start a new game
• Reach level 2
Sometimes it is okay if the tasks are a bit unclear, this can be used for example when testing
if the user easily understands how to start a new game etc. In other cases, it might be better
if it is specified in detail how to perform a task.
4.6.6.7
Post Gameplay Questionnaire
A post gameplay questionnaire contains questions that the users answers after playing the
game. When designing the questionnaire, it is important to keep the number of questions as
reduced as possible, preferably between 3-5. Use scale formatted answers or other predefined
answer alternatives as often as possible in order to make it easier and faster for the test user
to answer the questions. Some test set up services only provide a limited number of questions,
see table 14 to make sure that the test service allows you to ask enough questions.
Examples of post gameplay questions:
• Would you recommend [the name of the game] to a friend or colleague? On a scale of
0-10. (Net Promoter Score)
• Name 3 emotions you experienced during gameplay.
• Was it easy to understand the game?
• What did you think about the game?
• Any suggestions, comments or recommendations?
4.6.7
Perform Test
In most SRTs, a SDK has to be implemented in the application in order to record the session
(although, no implementation is needed when using for example PlaytestCloud). The test
details and the application with the integrated SRT can then be distributed to the test users
so that they can start the test.
59
4.6.7.1
Pilot Test
Before starting the actual test, send the test with the instructions to a test user that is doing
the pilot test. This is a way to test if everything in the test plan works and is understandable,
if not it is easy to modify it before sending the test to the actual test users. Preferable one
pilot test should be carried out while the facilitator observes the test user on the set, in this
way the test user is more prone to question the things that are unclear. Additionally, one pilot
test should be carried out by a test user at a remote location. It is also possible to add a few
additional questions in a survey at the end of the test (after the users have completed the test),
to ask what they thought about the test set up and the tool.
Questions to ask the test user after completing the regular part of the test:
• Did you experience any difficulties with the recording tool during the test session?
• Were the instructions before the test sufficient? If not, what was missing?
• Was there anything you did not understand in the instructions or regarding the recording
tool?
• Was it easy to understand the tasks? If not, how would you suggest they can be improved?
• Is there anything you think should be changed for future testing?
4.6.7.2
Actual User Test
When the pilot test has been conducted, and the test has been updated (if needed), send the
game application and the test plan to the actual test users.
4.6.8
Analysis
Analysis of the recorded test session can be performed by making annotations regarding interesting test user events throughout the video. Comments can be made regarding how the user is
experiencing different parts of the game, for example if the user is getting annoyed at any part
of the game or is experiencing any difficulties in fulfilling a task. The annotations can then
be used as an aid when creating a feedback document where all user experience and usability
issues should be documented. It can also be interesting to compare the actual experience,
as it is perceived in the recording, with what the test user has stated in the post gameplay
questionnaire.
• Annotate
First, access the session recording through the dashboard of the chosen SRT and analyse
the recordings. Create annotations for every event or feedback, both good and bad, which
can be deduced from the recordings.
• Create feedback document
Create an additional document and elaborate the feedback based on the annotations.
Write down both the negative and the positive feedback.
Some questions to keep in mind when analysing the recordings:
– Can the user perform the tasks?
60
– Is the user getting annoyed at anything?
– Is there anything that the user can not find, or have trouble finding?
– Is the user trying to click on buttons that are not clickable?
It can also be useful to add a link to the recording and specify at what time the event
happened in the feedback documentation, or take print screens, to make it easier to share
and explain the issues that were found.
• Draw conclusions and compare analysis with questionnaire responses
When the recordings have been analysed, see if the conclusions drawn from observing
and listening to the recording agrees with what the user is stating in the post gameplay
questionnaire. If it does not, try to understand the reason for this.
4.6.9
Summarise Results and Share with the Team
In order to correct the issues found, it is important that the team understands where and why
usability and UX issues occur. In order to convey the most severe issues, it can be valuable to
summarise the issues found during the analysis. This way, the team will not have to watch all
the recordings and read every annotation, but can instead focus on the major issues. It can
also be helpful and time saving to prepare proposed solutions for the issues.
• Summarise the feedback documents
Make a summary document with feedback based on the recordings from all of the test
users. Be sure to emphasise the issues found in more than one of the videos, i.e. where
several of the test users encountered the same problem.
• Proposed solution
For each issue found: add a proposed solution section in the feedback document with a
suggestion of how to solve the problem.
• Compile statistics from questionnaires
Summarise the questionnaire responses into one document and use this data to compile
statistics. To get a clearer view of the responses, graphs and charts can be drawn from
the statistics.
• Share results with the team
Share the summarised results of feedback and statistics with the rest of team. Show
highlight reels of the recordings where the issues were discovered in order to gain understanding of the problems and discuss how to solve the issues.
61
5
Discussion
This chapter discusses the benefits and challenges with remote user testing of mobile games in
section 5.1, as well as motivate the use of session recording tools (in section 5.2). It discusses
the test method and procedure for the user testing conducted in this study. The chapter also
contain one section which discusses analysis of the recordings (section 5.3) and one section for
the workflow (section 5.4). In this section, it is also discussed what aspects to consider when
planning the test, conducting the pilot test and also deciding on a SRT and test set up and
distribution service. Finally, section 5.5 discusses topics suitable for further research.
5.1
User Testing of Mobile Games
Below, different aspects regarding user testing of mobile games such as remote testing, test
users, test methods and procedures are discussed, Also social aspects and post gameplay questionnaires are being discussed.
5.1.1
Remote Testing
Remote testing differs from testing locally at the office or at a special testing facility. Remote
testing can be performed at home, at the workplace or during the bus ride home from school,
which means a significant difference for the test users compared to if they had to be at a specific
place at a certain time. Remote user testing with mobile SRTs do not require real-time testing
and the test does not have to be completed at a certain time. The users can carry out the test
by themselves without being in contact with the test observer, and the observer do not need
to be there asking questions and giving instructions. According to Nemberg [36], see section
2.5.3, many test moderators do the mistake of talking too much during the session. This is
one argument to why it might be better to not observe the tests in real-time and instead ask
questions afterwards, instead of in the middle of the test session. Another advantage with
remote user testing is that the test can be carried out in the user’s natural environment. When
inviting test users to take part in a user test in a laboratory setting, they are put in an unfamiliar
setting and sometimes introduced to another platform or operating system than what they are
used to. This gives rise to different player experiences.
During this study, a survey was conducted regarding user testing and SRTs, in addition
to the post gameplay questionnaire. Due to limitations in the number of follow-up questions
allowed during test set up for some of the test distribution services, these questions were only
included when performing tests using Lookback or UXCam as SRT and Google Forms or Beta
Family for test set up. According to the survey, 80% of the respondents would prefer to do the
test at home on their own device, while 20% would prefer to do it at a test facility where they
could be observed in real-time, see figure 12b. This strengthens the arguments for choosing a
remote approach over conducting test sessions in a laboratory setting.
SRTs allow the user to carry out the test as part of their everyday life in their natural
environment. They can also use their own device which they are already feeling comfortable
with, and no additional equipment is needed. But this also places higher demands on the test
facilitator when planning the test. Test instructions need to be exceptionally clear, and it is
also important to specify what OS versions that are supported both by the application and by
the SRT. While the application and the SRT work on an updated high tech device at the office,
the users might have older outdated versions on which it will not work.
When conducting our tests, we were not entirely sure which versions of the operating system
that were supported by the session recording tools and at least one user experienced trouble
62
with the application slowing down the device evidently, to the point that she did not want to
continue the test session. The issue can however be both the SRT, the device or the application.
In this study the application was only tested on a couple of different devices to see if the SRT
worked after integration. As a pilot test it might be a good idea to test locally with several
devices before sending out the test to remote users.
This study has been carried out without any expenses. Thanks to the motive of the thesis,
several SRT and test service companies were willing to contribute by offering free trials of their
tools and test services, additionally no premises or devices were needed for the test sessions.
However, a couple of iPhones were borrowed from the accommodating company in order for us
to be able try out the tools, and also to lend to our friends and families who wanted to take
part in the tests but did not have their own device. In total there were 6 test participants
which did not perform the test on their own device. The elimination of the need for carrying
out the test sessions in a laboratory environment both saves expenses and allows for the test
user to carry out the test session in their natural environment. It was also very convenient to
set up the test once, and then send it out and just wait for multiple responses. There was no
need to meet each and every test participant, walk them through the instructions and wait for
them to complete the test before moving on to the next participant. This saved a lot of time
since we could simply work on something else while waiting for the test participants to take
the test.
Usability.gov (see table 2, section 2.3.3) mentioned that some challenges in remote user
testing includes security of sensitive information, the observer getting a restricted view of the
user’s body language and technical problems. We noticed a worry among the employees at
the commissioning company regarding uploading the game to a third party or the test users
own devices. They were not willing to take that risk unless it was a game which had already
been launched. Unreleased games, still in the prototype phase, could only be tested with
coworkers and their families. There is a fear of compromising the security of products still in
the prototype stage, because of this the remote testing approach may be ruled out when testing
games which are in a sensitive stage of the development process. Regarding the restricted view
of the user’s body language, we did not feel like this was a problem. Their tone of voice and
their interaction with the application said very much by itself. One technical problem that was
avoided by using mobile session recording tools was the risk of the user feeling uncomfortable
with the technology. Since most of our test users performed the test on their own device
there were no problems related to this. However, other technical problems with the SRTs were
revealed. We did for example notice a problem with having a slow internet connection. At the
end of the test session the test participant was asked to uninstall the application from their
device. This was to prevent them from opening it by mistake and uploading recordings which
were not part of the test. However, we believe that uninstalling the application too soon may
have prevented some of the recorded sessions from uploading to the online dashboard and hence
they were lost. The videos not being uploaded on time, can be a product of a slow internet
connection, an unstable Wi-Fi connection or problems with the SRT itself. Based on this, it
is important to make sure the test user has a stable Wi-Fi connection and also to tell them to
wait for some time before uninstalling the application. In both the tools we tried, Lookback
and UXCam, there was no way to let the test user know if the recording had been uploaded or
not. All in all, the benefits and challenges with remote user testing which were mentioned by
Usability.gov (table 2, section 2.3.3) conforms to our observations and experiences during this
study.
63
5.1.2
Test Users
The commissioning company decided that the test could be carried out by anyone from the age
of 20 and up, regardless of gender. This was in order to gain more information about all types
of players. The game is generally being played by middle-aged women, but when testing the
on-boarding process in this study, all kinds of adult users were of interest. When recruiting our
own test users for the sessions with Lookback and UXCam, we had trouble finding volunteers.
Even though volunteers were searched for in large communities through social media, there
were no voluntary participants. This could be due to the fact that people got intimidated when
we told them that they would be recorded or it could also be due to the recruitment message
containing too much information. Another reason could also be the lack of incentives and no
more compensation than a simple thank you.
In section 2.5.3, Suen explains how to conduct a good usability study for mobile games.
He recommends to do tests in smaller batches of 3-5 persons which was an approach that we
followed, and it is also in agreement with Nielsen’s statement in section 2.3.7 that five users are
enough. Suen also states that people in the near surroundings usually are open to help testing
the game, which was one of the approaches that we used in order to find test users.
People need some kind of motivation in order to take part in user test studies, this can
be in form of monetary compensation or some other incentive. One incentive can be the fact
that they are helping someone else, which is often the case with friends and family. Due to the
lack of voluntary participants, we too turned to our friends and families. Here one of the main
obstacles was the fact that most of our acquaintances do not own an iPhone. Since the study,
apart from investigating the tools, also aimed to investigate the on-boarding process of the
game, the test participants also had to be new to the game. These two requirements ruled out
everyone at the commissioning company MAG Interactive as well as many of our friends and
family members. Eventually, we ended up testing on friends, family and acquaintances using a
borrowed device. This means that some of the participants were in fact partaking in the test
using a device and operating system they were not used to and did not feel entirely comfortable
with. This should not have affected the outcome of the evaluation of the session recording
tools, however it might have affected their interaction with the game. This has however been
overlooked since the only major difference between the two main mobile platforms Android and
iOS is how the ”back” option works (as far as the target game is concerned). Still, one can
argue that some of the benefits of remote testing with mobile SRTs were lost due to this, since
one big advantage of remote user testing with mobile session recording tools is that the user
can carry out the test on their own device, which they are confident in using. However, this
should not have affected the outcome of the study. An interesting aspect when recruiting test
users is to decide whether they should be recruited through a test service or not, and if they
should be paid or asked to participate in the test without monetary (or other) compensation.
Based on the user tests carried out in this study, it can be observed that users who were
recruited through a test service were more likely to write more detailed answers. During this
study, we came across user recruitment systems which motivates test users with incentives like
monetary compensation or ranking systems. These incentives seems to make the user more
keen to answer the questions more thoroughly. There was a clear distinction between the
answers to the post gameplay questionnaire filled out by our friends and family and the ones
filled out by unknown test users recruited through a third party test service. The test users
previously unfamiliar to us, generally left more elaborate and constructive answers, while the
test users recruited amongst our friends and family mainly gave very short answers. Some test
users even wrote nonsense as reply to non-optional questions requiring a text answer, since
they did not have the patience or interest to continue. The written nonsense could also be
64
addressing issues with the test itself, which may be in need of modification. The user might
have thought that the test was too long or that the questions were too complicated. Another
aspect that needs to be considered is the fact that the test instructions and post gameplay
questionnaire were written in English. The fact that the users recruited form test services often
were native English speakers (UserTesting, PlaytestCloud) while our friends and family are
not, may have contributed to a difference in the result. The language barrier might have made
them less comfortable with describing their thoughts and experiences, even though they were
encouraged to speak and answer in their native language. Maybe should the instructions, tasks
and questions also have been presented in their native language and not in a foreign language.
However, several of the test users from Beta Family were non-native English speakers and still
gave more elaborate answers. Based on these observations, it might be better to pay a small
amount to get more elaborated feedback and more useful information. Additionally, it can not
be made certain based on this study if it matters if the users get paid or not, since the data
base is too small and test users at Beta Family also are motivated with ranking and not always
monetary incentives. Further investigations could be made for example to see if users at Beta
Family answers sooner if they are being paid (we did not notice any difference when setting up
both paid and unpaid tests).
When hiring test users from a test user network, it is important to keep in mind that
the users might eventually become closer to ”professional” testers than the ordinary user (the
ordinary user is probably closer to the target group of the application). Users who are not
accustomed to take part in user test studies might be able to provide additional information
which will not be revealed by an experienced (possibly tech-savvy) test user. This topic could
be further investigated in order to see how it affects the result if someone is a regular test user
or not, but that is outside the scope of this thesis. PlaytestCloud claims that their testers
get on average one test invitation per month. They also stress that since the users get no
special instructions besides from “play this game and think out loud”, they will not fall into a
testing schema and every test will therefore be different. They claim that this approach makes
the test session become more natural and that the game experience can be compared to the
experience the player gets when playing a new game they have found on the Appstore. Our
observations, however, tells us that the test users we watched from PlaytestCloud seemed to be
pretty experienced. This may be due to the fact that the test users play a lot of games during
their spare time. Since PlaytestCloud are focused on games, the test users have a pretty vast
experience of playing mobile games and spotting usability and UX issues specific for games.
They also compare the test game application to other games they have played. This can be
very good for discovering issues due to differing from the ”norm” or spotting concept errors
due to similarities with other games on the market. On the other hand, they do not seem to
represent the inexperienced casual player and there is currently no way to know if they have
participated in many user tests or not. In some services this can be prevented by choosing a user
with a lower ranking or a lower number of submitted reports. This is possible at for example
Beta Family where every user has a ranking and information about the number of submitted
reports displayed in their profile, and it is possible to handpick the test users based on this. At
PlaytestCloud and UserTesting there is currently no easy way of choosing test users according to
this approach. During our online meeting with a representative from UserTesting, they claimed
that there was no risk that the test users became professional testers since they did not get new
tests all the time. In the webinar with Krug and Sharon [24] which was mentioned in section
2.3.3, they answer questions from the audience and customers of UserTesting. According to the
author of one of these questions, the UserTesting test users are highly experienced participants.
However, Krug and Sharon argues that in spite of having experienced users, the serious
65
problems seem to be found anyway as long as you test. They also recommend UserTesting’s
rating feature, where it is possible to choose test users who have a low rating score. Another
advantage with the scoring system is that the users want you to rate them higher, therefore
they are also good at answering follow-up questions. When trying out UserTesting, we only
found one toggle option saying ”Any (use highest rated testers)” for the demographics options:
gender, country and social networking. But since it is possible to customise the demographics
by writing the requirements in plain text, it should be possible to specify that the test should
be carried out by users with a lower rating. Although it seems to be implied that the services
assume you want high rated test users and not the opposite. It is also important to consider
the time frame of the study. Different services have different obligations to provide results
within a certain time limit. PlaytestCloud for example, promises results within 48 hours, while
UserTesting has a time limit of one hour. If you recruit your own test users from the street,
office, friends and family it can be difficult to predict how much time it will take to find test
users and to have them go through with the test. Beta Family differs a bit from the other test
services. Firstly, it is both possible to pay the test users or to ask them to partake in the test
for free. Secondly, you can make a test public, allowing all Beta Family testers to partake in
the test. Or you can choose to make it private and handpick the test users yourself from their
test user base. This gives the test facilitator more freedom, but also places higher demands on
planning the test and to consider the time frame of the study. It is possible to set the time
limit of a user test to 1-21 days. When creating a private test, the facilitator has to wait for
the test users to accept the invite, do the test and send in the report. You are not guaranteed
that the test users you invite will accept and not even that the test users who have accepted
will complete the test and hand in the report. With this approach it is necessary to invite
several test users and then hope that the desired number of test users will complete the test,
using this service hence requires a looser time frame. When creating a public test, there is a
higher chance of getting a larger number of test users within the time frame, although there is
still no guarantee, even though Beta Family currently has 17000 test users, 154 nationalities
and 470 different devices [40]. When using, for example, PlaytestCloud or UserTesting, you are
guaranteed to have the ordered recordings within the specified time frame. But if recruiting
test users yourself or using a service like Beta Family, it is important to take into consideration
a possible time delay due to too big no-show numbers. In section 2.3.7 it was mentioned
that Nielsen [42] implies that the no-show rate for remote user tests might be higher than for
user tests conducted locally. For this reason, Schade [49] recommends recruiting some extra
test users just in case. This should also be applied when working with test services like Beta
Family. However, the users at Beta Family could probably be assumed to be wanting a good
grade in order to improve their ranking and hence they should be keen to complete the test. A
test study can be delayed due to bad planning and other distractions or obstructions. If it is
crucial to complete the user tests in time it might be better to choose a service which provides
results within a specified time frame. Therefore it is also important to have a good test plan in
order to make sure the test users are found in time. Another aspect to keep in mind is that it
can sometimes take longer time to find the right test users, even with services like UserTesting
and PlaytestCloud, if the demographics are very specific and narrow.
5.1.3
Test Method and Procedure
Since this study investigates remote testing specific for mobile games, it is not enough to
simply use test guidelines for regular task-orientated software, as explained in section 2.2. This
is because they have different goals and usage compared to games and mobile applications.
Also Korhonen states that mobile games need to be evaluated differently compared to other
66
digital games, see section 2.2.1.3. Ülger also explains some of the playability issues that mobile
games can encounter, for example handle interruptions such as receiving a phone call.
When using a SRT with facial recordings, the analysis of the recorded test sessions is similar
to traditional observational user testing, which is explained in section 2.3.2. The main difference
when using a mobile SRT is that the recording only displays facial reactions and does not reveal
much about the rest of the test user’s body language. Traditional observational testing methods
are also often conducted in real-time and the researcher is usually located at the same location
as the test user when the test is performed. The user testing method that has been conducted in
this study is called unmoderated remote testing, since the researcher can analyse the recordings
retrospectively and there is no real-time interaction as explained is section 2.3.3.
Since this study investigates remote testing specific for mobile games, it was not enough
to simply use test guidelines for regular task-orientated software. This is because they have
different goals and usage compared to games and mobile applications (as explained in section
2.2). When conducting user testing with SRTs, both attitudinal and behavioral information
regarding each specific user and their actions are collected, making it an appropriate method
for qualitative studies. Since the aim of this study was to investigate not only what people
did but also how and why they did it, it is considered to be a qualitative study. The study
generated information about both the test users attitudes towards the game and their behavior
when playing the game. See section 2.3.1 for further explanation of these concepts. The study
also contains quantitative elements; information regarding the users’ preferences about how
the test should be conducted (see figure 12b, section 4.5.1) and their opinions about preview
functionalities (see table 13a, section 4.5.1) and how to start and stop the recordings were
collected (see figure 12a and 13b, section 4.5.1). However, there were not enough answers
collected for the study to be a valid quantitative study. As mentioned in section 2.3.3 Nielsen
recommends 20 participants in a quantitative study. In this study, there were 26 participants
but only 15 of them were asked the questions mentioned above. The behavioral approach aims
to answer what people do, while the attitudinal approach focus on what people say. Both
approaches are directly applicable when using a SRTs to record the users. During the course
of the test session, the test users were asked to think out loud, verbalising and explaining
their behavior and attitude towards the elements of the game. However, in order to make the
gaming experience as natural as possible, it was pointed out that the test users did not have to
think out loud while playing the game. The test users were also asked to fill in a questionnaire
outlining their thoughts, feelings and attitude towards the game. Except for the self-reported
information (think aloud and questionnaire), behavioral information was collected from the
recordings through observations.
5.1.4
Social Aspects
Isbister argues that social games (see section 2.2.1) should be tested in a social context in order
to get correct results [18]. Since Ruzzle is a social (as well as a casual) game where you play
against other people, both randomly chosen and friends and family, it can be argued that it
could be tested in a more social context. Most of the users were mainly interested in playing
against others - which is also part of the game concept itself. There were some players though,
who expressed a wish for not having to challenge opponents, but mainly users want to compare
scores and see how well they did against the opponent. One test user wrote: ”I think this is one
of those games where it will me much more enjoyable when you know you are playing someone
real at the other end, regardless of whether you are evenly matched or not. I was only able to
play in practice mode today, unfortunately the ”find a random opponent” remained on search
mode throughout the nearly 10 minutes recording today.” This experience was not unique
67
amongst the participants in this study, and it highlights an issue with the random opponent
functionality. The fact that it takes so long to find an opponent in a social game makes the
experience less enjoyable. Maybe this approach is not optimal when testing social games like
Ruzzle, where gameplay is dependent on others. Maybe it would be better to organise test
sessions where the opponent is predetermined in some way. One of the test users just happened
to play against a family member who was in the same room. They communicated with each
other when playing and as a bonus revealed more issues and information regarding the social
aspects of the gaming experience. But how can the social aspect be made part of the test
session? Organising test sessions with multiple participants might be a better way to test a
player vs player game like Ruzzle. Our experience was that the test user became more relaxed
compared to other test participants, talked more during gameplay and made honest comments
about the game, even though being recorded on video. This is however based on one single
observation and should be further investigated. In a game like Ruzzle, where the gameplay is
dependent on two individuals playing the game almost simultainously in order to get a good
flow, it might be better to have a certain predetermined opponent to challenge. When testing
the on-boarding process, this can be divided into at least two scenarios. One being the player
finding the app somehow and downloading it independently, which will require the test user to
investigate and discover the app by itself and hence it is not suitable to supply a predetermined
opponent. And the other being the player was recommended to play the game by a friend which
asks the player to challenge them, when testing this scenario it would be a good idea to supply
a predetermined opponent. Both these approaches could be tested but require different test set
up. The problem when requiring several test users to take part in the test, is that they have
to be available at the same time and this removes some of the advantages with remote testing.
One also has to take into consideration if the test users should be co-located or not and if they
should already have some kind of relationship to each other. It might also be difficult to test
on multiple test users if hiring a recruitment service, depending on if this feature is provided
or not.
5.1.5
Post Gameplay Questionnaire
More detailed answers can be received if using text input instead of radiobuttons, checkboxes
etc. But as mentioned in section 5.1.2, it can be concluded that some users do not reply at
all when asked to answer in text. The best would be to balance questions where answers are
provided and questions which require text answers. It can also be concluded that the balance
might depend on if the test users are compensated or not. If paying for the test users, it might
be a good idea to have more open questions asking for text answers in order to gain as much
information as possible. But this also makes the questions open for misinterpretations, which
is one of the reasons why the entire test, including the questionnaire, should be pilot tested.
All the services included in this study provide post gameplay questionnaires (or post test
session questions, since all services are not specialised in games). However, the number of
questions it is possible to ask the test participant differs, as does the way the questions are
presented. For example, PlaytestCloud offers five post gamplay questions, UserTesting offers
four and Beta Family offers an unlimited number, see table 7 in section 4.3. Additionally,
PlaytestCloud and UserTesting only offers plain text questions while Beta Family provides radio
buttons, check boxes, and selection alternatives which can be used to, for example, visualise
scales. This functionality comes in handy if collecting quantitative data as well as qualitative.
It can be discussed if a post gameplay questionnaire is necessary at all, since a lot of
information can be gained from the recordings and the test user can be urged to sum up their
experience or answer questions in speech. But written responses give the user the opportunity
68
to express themselves differently as well as being a good base for statistical observations. It is
also an opportunity for the UX researcher to ask the test users about their feelings for the game
and the test session. The questionnaire can be used to compare the emotions perceived from the
recordings and the feelings stated by the test users themselves. The answers can also be used as
a complement for test sessions where the test users did not express any distinct emotions. Not
everything can be observed, sometimes the player has to be asked certain questions directly.
This can be a disadvantage with using SRTs only, but it can also be a motivation to use
questionnaires as a complementary procedure. It might be possible to notice if a test user gets
irritated just by watching the recording, but this depends on the player and it is sometimes
difficult to know exactly what makes the user feel the way they do. This is also one of the reason
why it might be necessary to be able to contact the user with follow-up questions. Emotions
can also be difficult to read just from observing and listening to the player playing the game.
During the execution of this study, an effort was made to try to formulate the questions
similarly even though they were presented in different formats to the test user (depending on
which test service that was being used). It can be discussed if a survey where the user is given
a question with a visual scale with radio button options, can be compared to a survey where
the same question were given in plain text with the scale described in words instead of with
a visual scale. In this study the two approaches have been treated equally but this could be
further investigated. The NPS, which is described in section 2.3.4, was investigated in this
study, but due to confidentiality reasons the results will not be published.
5.2
Session Recording Tools
There are many factors to consider when evaluating and comparing session recording tools
and test services and their features. In this section some of these factors and how they affect
the testing process are discussed. It is also explained how the tools have been graded (see
table 8, section 4.4), this includes grading of website, integration, test set up, customisation,
demographics specification, test user profile information and researcher environment.
5.2.1
Evaluation of tools and features
Both in Lookback and UXCam, it is possible to customise the settings for how to start and stop
the recording etc. In Lookback it is also possible to preview the recording before uploading it
and to have the camera input visible on the screen. An interesting aspect to discuss in relation
to this is how much the user should be involved in the recording process and how much freedom
that should be given to them. Could too much freedom and too much involvement affect the
PX? Do more options affect the test results? Is it better if the recording starts automatically
or should the test users themselves be able to start and stop the recording? According to our
test users (see figure 12a), the majority of the respondents preferred manual start and stop
or were not sure what they would prefer. However, an interesting observation is that from
the respondents which replied ”automatic”, all had completed the test using UXCam which
automatically starts and stops the recording. This may imply that the result would have been
different if the test users would have had the possibility to try out both options. But even
amongst the users which tried UXCam, most preferred to be able to manually start and stop
the recordings. The data from the survey is limited due to a small number of participants and
could only be seen as guidance, not fact.
In Lookback’s SRT, there is a possibility for the user to preview the recording before uploading it. This might be good, for example, if the user realises that their face was not visible
to the camera during the session or similar, giving them the possibility to record a new session.
69
But it can also be a way for the user to censor the test session. What if the user does not like
the way their face looks from the angle the device is recording? Or what if the user think they
were playing too badly and are embarrassed to upload the recording? There is a risk that the
test user misuses this functionality and to be on the safe side, maybe this feature should be
disabled when testing first time players (if it is only their first time playing the game which
is of interest). The limited number of survey responses makes the result a bit unreliable but
according to the collected responses, all of the test users which used the preview functionality
in Lookback liked it (see figure 13b) and three out of seven of the test users which were recorded
with UXCam would have preferred to be able to preview the recording before uploading it (see
figure 13a). One problem we experienced when testing the tools with facial recording, was to
position the device so that the entire face was visible to the front camera at the same time as
playing the game in a natural and comfortable position. In Lookback’s SRT there is a setting
which can be enabled which shows the face recording in the lower right corner (see figure 15)
while playing the game. The advantage is that the test user can make sure that the face is
visible to the camera throughout the test session, but it may also cause distraction from the
game and affect the PX. The player might get self-consious and the picture might obstruct
important parts of the UI and this could potentially remove focus from the gaming experience.
Figure 15: Using Lookback with the setting to show camera input in the lower right corner set
to on.
When the integration phase of the study had already been initiated, we discovered that Beta
Family had a SRT of their own called SuperRecorder. This tool was not included in the study
because we were unaware of its existence when planning it, and also because the tool did not
70
come with a dashboard with annotation possibilites and at a first attempt of implementing the
SDK it seemed to cause errors in the game application. However, SuperRecorder does have
facial recording and hence we found it extra interesting, but it was decided that analysing
yet another tool was outside the time frame for this project. In SuperRecorder, however, the
camera positioning issue has been solved by giving the user the opportunity to adjust the
device position by showing the camera input for a couple of seconds before starting the test.
This solution gives the user the possibility to adjust the device but prevents distraction. Even
though we found it rather difficult to keep the camera positioned correctly during gameplay,
we were pleasantly surprised when we noticed that this had not been a problem for our test
users. In some recordings, the mouth was occasionally missing, but we realised that this was
not a very big problem since the user experience could be deduced anyway. However, it is good
if the tool has many options, so it can be customised depending on what is desired for the
specific test session. The drawback is that it can be time consuming to customise the settings
if several of the options should be considered in the study. The main issue when carrying
out tests using UXCam or Lookback, is that there is no way of telling the test user if the
recording was uploaded or not. There should be a message which tells the user if the video was
successfully uploaded or not, and an option to retry to upload if it fails. Otherwise the user
will not know when they can delete the app. We consider this to be a big drawback for these
tools. SuperRecorder is the only tool we came across during this study which gives feedback if
the recording was successfully uploaded or not and which also offers the possibility to upload
a session recording multiple times.
5.2.2
Grading of SRTs and Test Services
The grading of the SRTs in table 8, see section 4.4, is based on the experiences from conducting
this study. The things we consider to be of extra importance when choosing a tool has been
included in the table and will be discussed here.
5.2.2.1
Website
The website grading has been based on how usable the website is, how well it works and its
features. UXCam has many good features which the other services lack, but has got a lower
grade since the website does not work properly. The website is currently under development
and the site and the recordings are loading very slowly and often things are not loaded at
all. This is a major drawback and the poorly working website makes it almost impossible,
from time to time, to use their SRT. Towards the end of this study it did not work at all,
but when it is working properly we believe it will be very good. UserTesting has a simple and
clear website which is reasonably easy to navigate and has hence got a higher grade. The site
has many features and possibilities though, thence it can be a bit difficult to find everything
before getting used to it. Therefore, their website has not got the highest grade. Similarly Beta
Family’s website is pretty easy to navigate but also here one has to click around for some time
in order to reach the correct view. PlaytestCloud lacks many features which the others have,
but this also makes the navigation of the website very easy. The lack of features is the reason
why it did not get grade five. Lookback also has a clear website which is easy to navigate but
the researcher environment can be a bit cumbersome to work with.
71
5.2.2.2
Easy to integrate
All tools were fairly easy to integrate. There were clear instructions, sometimes a bit outdated
though. However, amongst the tools included in the final study, the integration instructions
were up to date and all worked well. There was some trouble encountered later in the study
when some of the SDKs were updated and had to be reintegrated.
5.2.2.3
Easy to set up test
All test services have got the highest grade regarding how easy it is to set up the test. Both
UserTesting and Beta Family provide example tasks, and both have a category for suitable
tasks for game testing. At UserTesting it is also possible to specify if the user should give a
verbal response to something during gameplay or if it is a ”do this”-task. Both services allow
for creation of similar tests using an old test setup as the base, and allow for saving of drafts.
Since there is no possibility to add tasks or other test instructions at PlaytestCloud, a test can
be set up very quickly. It is not possible to make a similar test based on a previous test, but
on the other hand there is not much to copy from any old ones.
5.2.2.4
Customise test
The lack of an option to easily specify test instructions and tasks, is the reason why PlaytestCloud got a low grade on customisability. It is possible to specify what type of players to test
on and there is an option to ask the player to fill in a gameplay questionnaire, but even though
the demographics is somewhat customisable, there is no other way to specify test properties like
instructions and tasks without getting in contact with the PlaytestCloud crew. They are very
helpful and everything seems to be possible but the fact that there is a need to contact them
means it will take more time and hence they have got a lower grade. When using UserTesing
or Beta Family it is possible to add initial instructions, introducing the test and telling the
user which mindset they should have. It is also possible to specify tasks and post gameplay
questionnaires. At UserTesting it is possible to add a screener question which further weeds
out the test users you are not interested in. This is the reason why UserTesting has got a
high grade. Beta Family’s tests lacks this option but weighs up for it by offering an unlimited
number of questions with possibilities to display them in scale, selection dropdown, multiple
choice or text format, in comparison with UserTesting where it is only possible to ask 4 text
based questions at the end of the session. However, there are some more things missing in
the test set up at Beta Family as well. For example, the possibility to specify a maximum
number of checked checkboxes allowed and the presentation of the scale questions is not very
user friendly. But this still gives more freedom than the other services, UserTesting only offer
4 post test questions and these have to be written and replied to in plain text.
5.2.2.5
Demographics Specification
The demographics grade has been based on how specific the demographics can be. Since
UserTesting has an option where it is possible to specify other demographics requirements, it
is possible to specify the demographics as detailed as needed, without any limitations. The
screener question also makes it easier to reach the desired test users. Beta Family does not
have a screener functionality but it is possible to specify requirements in text. However, if
using the approach where specific users are invited to take part in the test, there are only a
few predefined specifications to search for. At Beta Family it is also possible to handpick test
users and invite the same users on a regular basis, if longitudinal studies are required. Since
72
PlaytestCloud is specialised in games, it is possible to specify the test users according to games
they have played or by age, gender and what gaming type they are (casual, midcore, hardcore).
But it is not possible to specify both played games and other demographics at the same time.
Additionally, there is no option to specify other requirements, or to use a screener question,
which makes it difficult to specify more exact demographics. It is for example not possible to
say “I want users who have not played this game before” without contacting them.
5.2.2.6
Profile Information
All test user recruitment services provides information about the users gender, age and nationality or country. UserTesting has the most extensive profile information, including for example
gaming genres, income, ranking, web expertise, social networks etc. Hence they have got the
highest grade. PlaytestCloud’s user profiles contains favorite games, hours spent playing games
per week and currently played games. They have got a higher grade due to the focus on mobile games. Beta Family’s profiles are more sparse but contains ranking, submitted reports
and feedback from previous test sessions. It is also possible to see which user tests they have
already participated in. The number of submitted reports could be used as an indicator about
how experienced the test users are.
5.2.2.7
Researcher Environment
Under this category, aspects like organisation of recordings (in the dashboard where all videos
are available), properties of the progress bar, video viewing and annotation features have been
summed up into one grade. UXCam has a combined progress bar and annotation timeline,
see figure 5, section 4.2.2. The annotations are directly connected to a timestamp and swipe
directions and taps are displayed along the progress bar. It is also possible to reply to the
annotations which can be convenient if there is more than one UX researcher working with the
session. There are several nice features but the whole annotation functionality is currently a
bit unstable and this has lowered the grade. Lookback och UXCam both show the name of
the current view in relation to the progress bar. This makes the test user’s navigation in the
application extra clear to the observer and it is easier to navigate to the desired time in the
recording. However, Lookback shows this in a much clearer way than UXCam, see figure 4,
section 4.2.1. At PlaytestCloud (see figure 6, section 4.2.3) and UserTesting (7, section 2.3),
the focus is instead on relating the annotations with the progress bar. UserTesting only has
timestamps. UserTesting and PlaytestCloud both provide a clearer overview of the recording
and the annotations, and it is easier to work in their environments. There are less scrolling
and mouse movements required and things are well organised. The annotation functionality in
Lookback is especially tortuous since one has to click several times to create an annotation and
then move across the entire screen in order to type it in. It is also difficult to get the correct
timestamp since there is no ”create annotation at -5 seconds” feature, and it is quite difficult to
hit the correct time when moving backwards on the progress bar using the mouse. UXCam has
the same problem, and sometimes comments disappear. It is much easier to get the annotation
at the correct time when using UserTesting or PlaytestCloud.
At UXCam it is not possible to name folders and to reorganise recordings. All recordings
are collected in folders in the current project according to date. The recordings are named
”Session 1”, ”Session 2”, ”Session 3”, and so on. Name and type of the device, country, length
of recording, date and time is also displayed. The inability to rename and reorganise the
recordings has lowered the grade, but if working properly the researcher environment would
overall be very nice and easy to use. In Lookback all recordings are uploaded to the main
73
directory where length, date and device are displayed. It is possible to create new folders and
to rename the recordings, which can hence be organised according to preference. This works
very smoothly and has contributed to a good overall grade. In UserTesting, all projects with
their associated recordings can be seen in the same view, the recordings can also be minimised
leaving only the title of the test. This allows for a clear overview at the same time as there is
no need to enter a folder in order to know which test session that has been uploaded, which is
very convenient. The recording summary contains date, time, name and ranking of test user
and demographics. At PlaytestCloud there is one directory for every test, marked with test
name and date, and the test sessions are named according to the name of the test user, also
specifying the length of the recording, age and country for the test user. When adding all this
up, PlaytestCloud and UserTesting are superior as far as the research environment goes.
5.3
Analysis of Recordings
The SRT records the screen, making it possible to see how the user interacts with the application
through the touch screen. The device microphone records what the user is saying and some
tools also provide facial recording using the front camera of the device, allowing the observer
to see the user’s face when playing the game. The initial thought about discerning an unknown
person’s emotions through facial, screen and voice recordings, was that it would be difficult
and should require expert experience. However, it proved to be pretty straightforward and
relatively easy to understand if the test user had a good or bad experience regardless of the
observer’s previous experiences. It was also, in most cases, easy to notice when the test users
were annoyed. This can however depend on how honest the user is, how they show their feelings
and how much he or she talks about them out loud. When analysing the UX of the test users,
the aspects explained in 2.1 were considered. In most cases it is possible to determine if the
user is having fun and enjoy the game or not, and usability issues can be discovered if the user,
for example, clicks on graphical elements that are not clickable or is unable to find what he or
she is looking for in the game menu. But some factors can be vague to read. It can for example
be difficult to conclude if the test user is concentrated, stressed or is experiencing flow, only
based on observations from the recordings. There are also different motivations for playing a
game, as described in section 2.2. It can be both task oriented and fun orientated, the menus
and navigation need to be usable but the game itself should also be playable (see table 1 for
more information about usability vs playability). From the analysis of the recordings, it was
discovered that in most cases the test users experienced some kind of flow (see section 2.2) and
most users rather felt that the game was challenging than boring.It is also important to keep
in mind that the test users were first time players, which makes the game a bit more difficult.
5.3.1
Voice
The voice recordings proved to provide a lot of valuable information regarding the user experience. It was discovered that the session recordings from PlaytestCloud and UserTesting
provided a lot of important information even though they did not record the face. The test
users voice both conveyed insights about their emotions, actions and reactions. In this study,
the voice analysis was conducted by two fairly inexperienced human beings. Another method
would be to use a computer system to interpret the voice, similar to the VIS mentioned in section 2.4.3. The system might, however, pick up less information than a human researcher, since
voice reactions are natural and well known to the researcher. He or she might have easier to
make correct interpretations than the computer might have. The drawback with the researcher
analysing the voice recording is that it is time consuming. One of the goals with conducting
74
user testing and analysing the audio is that it should be effortless and not very time consuming.
It is likely that a system similar to a VIS could be used in the future to automatically interpret
the test users emotions by reading their voice.
5.3.2
Facial
Since the facial reactions and expressions varied depending on the test users, it was difficult
to know exactly what the users felt based only on analysis of the facial recordings. In this
study, the analysis was conducted based on regular human observations, instead of by using for
example the FACS, which was explained in section 2.4.2. An automatic system did not seem
necessary in this study, since one of the goals with the research was to investigate how easy
it was to conduct the user testing with session recording without an expert nor an automatic
system that interprets the facial expressions. One of the investigation points of this study was to
discern if facial recordings generate more valuable data compared to the use of screen and voice
recordings only. Even though the screen and voice recordings provided enough information
about the PX, the facial recording contributed to a greater overall impression. We should take
advantage of this and use the facial recordings in order to get as much information as possible. If
using a computer system for automatically evaluating the user experience, advanced algorithms
and testing equipment would be necessary to achieve the same or similar results.
5.3.3
Read Emotions
The emotional part is essential to the entire PX, as was explained in section 2.2, where Lazzaro’s
four keys of fun were described and correlated to different emotions. Discerning the users
emotions is an important part of PX evaluation in games, and as was concluded by Oliveira et
al. (see section 2.3.5) the combination of screen recordings and facial recordings can be used to
determine the emotions of the test user and to improve the evaluation of a user test. Oliveira
et al. were not specifically discussing emotions in games but rather emotions emerging when
interacting with a medical interface. But since the user’s emotions can be read by the same
means, regardless of the purpose of the product that is being tested, their statements about
how to capture these emotions are also relevant in regard to session recording of mobile games
user tests. One interesting investigation point in this study was to compare the annotations
about the test users’ actions and emotions during gameplay, with the questionnaire where the
users were asked to state their actual emotions. When analysing the facial recordings it was
rather difficult to gain insight about the test user’s emotions during gameplay by just watching
their facial expressions, but this depended on who the test user was; some persons revealed a lot
of information due to vivid facial expressions while some did not express any visible emotions
at all. It can be difficult to establish if the perceived emotions match the user’s actual emotions
even when using an additional post gameplay questionnaire. The questionnaire was filled out
after the test session and the answers were based on the gaming experience as a whole, while
the observations were made during the test session when the user’s feelings were based on what
had happened previously in the game and what was happening right at that exact moment.
Hence more specific emotional data could theoretically be collected from the observations than
from the post gameplay questionnaire. This also makes it difficult to compare the observations
with the answers to the questionnaire since they rather represent the player’s overall emotions
during the gaming experience. However, the recordings could be used for understanding the
origin of the emotions stated in the post gameplay questionnaire, and at the same time the post
gameplay questionnaire could be used to confirm the players overall experience of the game.
75
In this study, it was discovered that the test users were balancing between the feelings stress,
frustration and challenge, but also interest, engagement and excitement. Considering the fact
that the test object was a stressful word game, this could be considered to be a pretty good
result. One of the main reasons why the UX in games is more difficult to evaluate compared to
the UX of regular software is that emotions that are usually considered to be negative can in fact
be positive, see section 2.2. However, if challenge and stress would be related to, for example,
navigating the game menu, there would be a usability issue, it should not be a challenge trying
to find the correct buttons.
5.4
Workflow
The resulting workflow developed in this thesis work is a set of guidelines for how to conduct
user testing of mobile games using SRTs. This procedure should be carried out often and
repeatedly, as explained in the resulting workflow in section 4.6. This complies with Lookback’s
guide for user testing in section 2.5.4, as well as with Nielsen’s statements which was mentioned
in section 2.3.7. Inokon also recommends researchers to not postpone testing and analysis since
the game is constantly changing 2.3.6. The workflow has also been developed based on the
existing workflow for remote usability testing from Usability.gov [54], see section2.5.1, and the
checklist for how to conduct mobile application testing which was developed by UserTesting
[60] and was covered in section 2.5.2.
5.4.1
Planning the Test and Writing Instructions
As mentioned in section 2.3.3, it is important to write short and clear instructions when preparing a remote user test. One should be extra careful with the choice of words and make sure it
matches the words used in the the actual application. For example, when writing instructions
about the UI, it is important to refer to things with their proper names. We made the mistake
that in task one (see step 6 in appendix C) we told the users to register instead of asking them
to ”create an account” as was stated in the first level of the application menu. One important
insight acquired during the execution of this study was that things we take for granted can be
very difficult for someone else. For example, several of the test users did not understand how to
start a new game, and one of them thought that an ad was part of the game. This is important
to keep in mind when designing the test. Assume that the technical level is not very high and
that everything that can be misinterpreted will be misinterpreted. It is also important to take
into consideration how tasks influence the UX and the users interaction with the application.
Is the use of specific tasks limiting to the user, are they resulting in loss of information? Or
are they an asset and a prerequisite, for example when wanting to test a specific part of the
application? PlaytestCloud’s approach is that the gaming experience becomes more natural
when the player plays the game like they would normally do. This can be a good starting
point, but it is not really what actually happens on PlaytestCloud. For one, the test users are
required to play for a specified amount of time, where the shortest test session is 15 minutes.
Secondly, the players are still aware of the fact that they are participating in a UX test and uses
the think-aloud concept where they are encouraged to talk out loud. This results in the test
users exploring the entire application and commenting on everything from the game concept,
to the graphics and the navigation. This is all valuable information but if there is some specific
part which need to be investigated (like for example the on-boarding process), time could be
saved both for the test users and the UX researcher who will watch the recordings, if it was
possible to specify tasks or provide instructions. The use of tasks is a way of guiding the test
user in the desired direction in order to gain insight into the relevant area. But of course, if
76
the tasks are too limiting, information will be lost and this is no good either. For example,
in our test, task number one was ”Register”. We thought this would be a clear but not too
limiting way to investigate the on-boarding process. But when watching the recordings from
PlaytestCloud (where it was not possible to specify tasks), we discovered that several users
never even created an account but instead they chose to play offline. This revealed valuable
information about the nature of new players, as well as both usability and UX issues in the
offline gaming mode. If we had not been testing without specifying tasks, these problems would
not have been highlighted, unless players skipped task one in our instructions. This if proof of
how important it is to thoroughly consider what instructions to give the players and sometimes
it might be better to let them explore the game by themselves without any guidelines.
The think-aloud approach, which is described in section 2.2.1.3, is another thing which needs
to be carefully considered if it should be applied or not. Not only can it be difficult for some
users to feel comfortable when speaking their thoughts out loud and also to verbalise what they
are doing, the think-aloud approach can also move focus away from the gameplay and distract
the players. This is why we included in the instructions, see appendix C and D, that the users
should speak out loud only when navigating the application and not during gameplay. But
the think-aloud approach is a good way to get to know the users thoughts which can not be
seen on screen or face recordings. A lot of feedback regarding the game concept can also be
collected in this way and think-aloud during gameplay can give valuable information about the
PX. The test users at Beta Family, PlaytestCloud and UserTesting are encouraged to speak
their thoughts out loud during the test session, so if think-aloud is not required the test users
have to be informed about this. However, it is unsure how this affects the gaming experience.
When using recording tools like Lookback and UXCam, it is also important to specify in
the instructions that the application can not be closed or sent to the background during the
test session. In the case of Lookback, using default settings, the recording will be lost if upload
is not clicked and if using the default settings in UXCam, the recording will be uploaded when
the application is sent to the background and a new will start when the application is opened
again. In hindsight, we should have emphasised this in the test instructions. Since the player
is supposed to challenge opponents in the game, it can take some time before a round can
be played and some users got bored, closed the application and then resumed playing when
someone had accepted their game request or when it was their turn to play. Since it stopped
recording when the user left the game, parts of the sessions were lost or divided into several
parts.
5.4.2
Pilot Testing
When preparing a UX test it is important to write clear test instructions, define tasks the test
user should perform, compose a clear and relevant post gameplay questionnaire and then pilot
test these in order to make sure that everything is clear to the test user and that the test will
generate the desired information. In the pilot tests carried out in this study some additional
questions were added after the post gameplay questionnaire and the SRT survey, see appendix
H. The value of pilot testing was discovered when testing with Lookback and UXCam. The
pilot test was conducted with a test user located at a remote location. The first test that was
sent out contained the wrong .ipa-file (the iOS installation file for the application) containing
another SRT which had different instructions, and the pilot test made us aware that we need
to be more careful and make sure everything is correct before sending out the test. However,
it was not until after completing a couple of more tests that we discovered that there was
a problem with application crashes and also with uploading the recordings. Some users also
found it problematic to know what to do next because they had navigated to the next page
77
of the test instructions too quickly. To prevent this, a checkbox was added at the bottom
of the instructions and tasks page to make the user confirm that all of the instructions and
tasks had been carried out before clicking forward to the next page. This problem was only
discovered since one of the test facilitators were on the set and could advice the test user to go
back when they expressed their confusion. Furthermore, initially there were instructions about
deleting the application right after the test tasks had been completed, but when not receiving
all recordings, this was changed to asking the user not to uninstall the application until someone
from the research team had confirmed that the recording had been uploaded. This last change
meant more job for the facilitators and could easily be avoided by implementing a confirmation
message feature in the SRT, but that is currently not possible to do with neither Lookback nor
UXCam. SuperRecorder does however have this functionality.
Lessons learned from initial testing:
• Be careful to send out the correct files and links
• Make sure the recordings uploads and works
• Make sure the application is not keen to crash, unless the recording tool can handle this
• Make sure there are no possibilities for the user to move forward too quickly.
Changes after initial testing:
• Added a checkbox for the user to confirm that all the instructions or tasks had been
carried out before they could click forward to the next page.
• Added instructions about not uninstalling the app until getting the green light from the
test facilitator.
When carrying out the test sessions with friends and family members, being on the set with
them, we realised that it would be a good idea to perform a local pilot test and observe the
test user while they are going through with the test. In this way ambiguities in the instructions
and regarding how to use the tool are easier to spot. It can also be a good idea to carry out
more than one pilot test with different kinds of users if having a diverse target group. It could
for example be good to test on both tech-savvy and less tech-savvy users, on different devices
and also to perform one pilot test locally and one remotely before making the test live. But
of course, time and cost has to be taken into account. When using a test service you have
less control over navigation throughout the test and also over formatting and page content.
But it would still be a good idea to perform a light version of the test with friends, family
or co-workers to make sure the instructions and the tasks are good enough and that the post
gameplay questionnaire is easy to understand.
5.4.3
Deciding on a Session Recording Tool
The market for session recording tools is rapidly changing and their features and websites are
constantly being updated. While doing this study; new tools have become available, tools
have been bought by other companies, features and platform support have been added, SDKs
have been updated more than one time and websites and dashboards have been relaunched.
Features have been released in beta and tools have gone from beta to full release. All of this
has happened within just a couple of months. The fact that things are changing so rapidly
78
makes it difficult to compare and evaluate the tools. Our tables with tool/service properties
will probably be outdated in a couple of months time and the initial tables used for deciding
on which tools to test (see table 3 and 4, section 4.1) has already been updated several times
during the course of the study. We can only assess what is available right now, and it is difficult
to give the tools a fair judgment since they are all still in development, containing bugs and
lacking functionalities. For example, UXCam seemed to work fine first when we tested it by
ourselves at the office, but towards the end of the study almost nothing on their website worked.
During the three weeks we dedicated to user testing with the SRTs, a new website and a new
SDK were released and unfortunately the old SDK which we had implemented was no longer
compatible with the website so the recordings from half of the test sessions were lost (there
was no indication or warning about this from the company). Having to upgrade the SDKs and
redo the integration is time consuming and this could be a brake block in the UX evaluation
and development process if UX tests are performed frequently. Additionally, when being forced
to reintegrate the SDK it is easy to miss small settings like, for example, turning on the front
camera or changing default start or stop procedure to custom settings, etc.
When choosing which tool to use, it is important to have a clear test plan in order to know
what properties the tool should have (or one can do the opposite, and choose a tool and adjust
the test after the limitations of the tool). Some important things to consider when choosing a
tool is if there is a possibility to specify test instructions, tasks, time limits etc. The time limit
of the recordings is important to consider since the user has to have plenty of time to complete
the test session but also because if the test tasks can be completed quickly the player should
not have to play for longer than needed, since this will generate unnecessary material that
no one will have time to go through. For example, one of the average sessions recorded with
Lookback (where we provided our own test instructions) lasted for about 7 minutes while one of
the sessions provided by PlaytestCloud (where the minimum session time was 15 minutes and
it was not possible to specify instructions or tasks) lasted for 40 minutes. This was, however, an
extreme case with an enthusiastic player, most PlaytestCloud sessions lasted about 20 minutes.
On one hand, it might be nice to see for how long the test users continue to play, but on the
other hand, that could also be a task. Our last task was ”Play the game until you have leveled
up to at least level 2” but most of the players which succeeded in reaching level two stopped
when reaching it.
When deciding on a tool, it is also important to consider how high frame rate that is needed.
This mainly depends on how swift movements the user is expected to make while navigating
and playing the game. If the frame rate is too low, there is a risk that important information
is lost and it also gets less natural to watch for the UX researcher and issues with the game
application itself may pass unnoticed. AppSee’s and TestFairy’s SRTs were removed from the
study since the frame rate was too low and it did not record all the movements of the player
when the game was played. All the other tools we tried out had a high enough frame rate.
In UXCam’s SRT it is also possible to adjust the frame rate and the quality of the video.
The need for a higher frame rate depends on which type of game that is to be tested, if it
is a stressful game requiring many swift movements a higher frame rate is needed. But it is
also important to keep in mind that a higher frame rate and a higher quality also puts higher
demands on storage memory, processing power and bandwidth.It is also important to consider
which development stage the game is currently in, since not all of the SRTs continues recording
after an application crash. Application crashes was one of the main reasons why many of the
session recordings from the test sessions conducted in this study was lost.
79
5.4.4
Deciding on Recruitment, Distribution and Test Set Up
When deciding on which distribution and test set up service to use, it is important to have a
clear plan regarding whom the test users should be. When that is clear, it is easier to see if the
test service fulfills the requirements or not. This is important for example when considering
aspects like demographic specifications for the test users. If the test users should be from
for example Australia, it is important to use a test service which can recruit test users from
Australia. It is also necessary to know if the test should be conducted remotely or locally. If
the test should be conducted locally at the office (perhaps because of confidentiality reasons),
there is a need to be able to handpick and invite specific test users and hence it is necessary to
use a service where this is possible. Another factor to consider is the possibility to contact the
test users for follow-up questions, which might be necessary if there is a need to supplement
the answers to the questionnaire or if anything is unclear from the test session. Making sure
the tool supports the platform for the game is also essential, since otherwise it is not possible
to install the SRT in the game. Important factors to consider are also the price of the service,
the time it takes to gain access to the recordings, provision of crash logs and also how much
time the researcher can invest in recruitment and test set up. When using separate services
for distribution, test set up and SRT, it is important to consider how the recordings and the
collected answers can be connected to each other (if there is a need for it).
When conducting our tests using Lookback’s and UXCam’s SRTs in combination with
Beta Family’s SuperSend (as distribution service) and Google Forms (for presentation of test
instructions and post gameplay questionnaire), an issue occurred regarding how to relate the
recordings to the questionnaire answers. This was solved by giving each test user an unique ID
which they had to type in at the beginning of the test and also speak out loud at the beginning
of the recording. The name of the recordings on Lookback was thereafter manually changed to
the correct ID, but at UXCam it was not possible to change the name of the recordings. This
approach was quite cumbersome since unique test instructions had to be sent out to each user
in order to give them a unique ID. This meant that a record had to be held for which of the
pre composed IDs that had already been used, and when sending out the tests; the information
text had to be altered for every test user and the .ipa-file had to be uploaded to SuperRecorder
once for every test user, instead of just sending the same test to all the participants e-mail
addresses. Furthermore, with this approach, the beginning of the session recording had to be
watched before it could be named properly. Since the test user base was fairly small, it would
probably have worked out well just by comparing the timestamps of the survey and the session
recording. But in practice it is possible to get several recordings and several surveys with
the same or very similar timestamp. Furthermore, the questionnaire and the recording will
probably not have the exact same timestamp, hence it can be a problem to relate them to each
other. When the testing period was initiated, UXCam did only show the date the recording
was uploaded and not the time, hence we had to come up with another solution. The use of
IDs is rather time consuming and it also puts more responsibility on the test users which has
to remember to state their ID at the beginning of the recording and in the questionnaire. This
method is also prone to errors since both the test participant and the UX researcher might
mix up the IDs. Furthermore, it might be a good idea to use a tool where the tasks and the
instructions can be displayed in the application itself thorough the SRT, which is the case in
UserTesting and SuperRecorder. This allows the test participants to use only one device. If
using, for example, Lookback and UXCam, the test instructions, tasks and questionnaire needs
to be viewed on a second device or be sent to the test user analogously, or else the recording
will stop each time the user needs to read the instructions. It is important to inform the test
users that they can not leave the application before the test session has been completed.
80
5.5
Further Research
An interesting investigation topic for future research is if monetary incentives affect the test
users and makes them sign up for the test and submit the test report faster, on for example
Beta Family. Another interesting aspect that can be investigated is how the test results are
affected depending on if the test user is a regular participant in user tests or not. It is also
interesting to examine how freedom and options in the recording tools could affect the PX.
This study has focused on the iOS platform only. It could be interesting to look into how
well the SRTs are working with other platforms and devices as well. Like for example Android,
Unity, tablets. Maybe some tools are more appropriate than others depending on the platform
which will be used in the test. Furthermore, there are several tools available on the market
which were not further investigated in this study, also the market is rapidly changing and soon
there might be even more tools and services available, it could be interesting to take a closer
look at these.
Regarding the workflow, it could be further tested in practice in order to see if there is
anything that could be improved. It would be interesting to iterate the process while testing on
different kinds of games, in this way it would be possible to discover more issues and benefits
with the various tools. Separate, less general, workflows could also be developed for certain
tools, games, testing objectives or other contexts.
81
6
Conclusion
This chapter aims to answer the initial objectives of the thesis, and to present the conclusions
that can be drawn based on this study. Based on the results of the study, it has been made
clear that it is possible to conduct user testing of mobile games with the use of SRTs. A
workflow for how to conduct UX testing in mobile games is presented in section 4.6. The
course of action for how the recordings can be analysed and interpreted into information that
can be used to address UX and usability issues, is also described in the workflow in section
4.6.8. Remote testing with the use of a SRT is a suitable method for testing of mobile games
since the user can perform the test in a natural environment on a familiar device, which implies
that the test should have a minimal effect on the PX compared to traditional testing methods.
Another advantage is that the testing can be done unmoderated, meaning it does not have to
be in real-time which allows multiple test users to perform the test simultaneously. In order
to perform the test smoothly it is however important to watch out for technical issues, which
can be avoided by writing detailed instructions and doing thorough pilot testing. Based on
the study, it can be concluded that there is no one perfect method for conducting user tests
with SRTs and there is no ”one method fits all”. There are many factors to consider; it is
important to know what to test, who should test it and then decide on a test service and a
SRT which suits the specific need. None of the investigated SRTs work perfectly, but each of
the tools investigated has their advantages as well as drawbacks, and the ideal tool would be
a mixture of all of these. Since many of the SRTs are still in an early development stage, the
tools and websites are constantly being updated and they are therefore not completely reliable.
The ideal tool would be connected to a service which provides both test user recruitment, test
set up and distribution, since it would be appropriate to store everything in the same place
and not have to worry about connecting the test user’s questionnaires with the corresponding
recordings. Another valuable feature would be the possibility to summarise the results and
receive autogenerated diagrams of statistics and quantitative data directly on the website. It
would then be possible to share the results with the rest of the team instantly, without using
multiple services for writing and storing documents, recordings and other data. This would
both facilitate organisation of test data and save time. The ideal SRT would also provide
features and properties like:
• Tasks and questions being displayed directly in the application
• Feedback stating if the recording succeeded to upload or if something went wrong
• Possibility to upload the recordings again
• Possibility to pause the recording (for example, when waiting for an opponent or if someone interrupts during the test session)
• Possibility to change settings from the online dashboard (and not only in the code)
• Preview possibility
• Statistics, metrics and auto generated diagrams in the dashboard
• Possibility to adjust camera position (possibility to see the face for a few seconds before
starting the test, to make sure the face is visible to the camera)
• No loss of information due to application crashes
82
However, since the ideal tool with all the desired functionalities is not currently available on the
market, the recommended SRT and test service depends on how much control the researcher
wants to have over the test and also which resources that are available. If there is a short time
frame, UserTesting is recommended since they provide test users within an hour. If the game
application is in an early development stage, PlaytestCloud is recommended since they continue
recording after an application crash, hence making it possible to view the entire test session,
while also gaining insights in when and why application crashes occur. If a quick test set up is
required and the only desire is to just have the test users play the game and there is no need
for advanced test specifications, PlaytestCloud can be used since the test set up process is very
quick but also limited. The SRTs without facial recordings can still provide a lot of information,
and in many cases it is enough to be able to identify UX and usability issues. However, the
facial reactions can help when analysing and identifying emotions. If there is a need for facial
recordings, the SRTs Lookback and UXCam can be used. The recommended tool out of these
two are, however, Lookback; since UXCam has been unstable the last month and the dashboard
and researcher environment has not been working properly (however this might be changed
shortly). Lookback is also suitable when the game application needs to be tested in the office
due to, for example, confidentiality reasons. If this is not an issue, however, and the test can
be performed by people outside of the office, we do not recommend independent recruitment
of test users (if there is not already an existing test user base) since this is a time consuming
process which will require a larger time frame. Lookback can, however, advantageously be used
together with Beta Family’s test user recruitment and test set up service. Beta Family offer
the possibility to make a public test where anyone of Beta Family’s test users can participate
in the test, it is also possible to create a private test where the test users can be handpicked
from their test user base or recruited by other means and invited through e-mail.
There is a lack of easy-accessible functioning standardised methodologies for conducting user
tests using SRTs. Many companies are reluctant to perform testing due to lack of knowledge
or resources, but by discovering UX and usability issues at an early stage both time and money
can be saved. It is also important to test often and to make it a natural part of the development
process. The workflow produced in this thesis work can be applied regardless of which session
recording tool or test service that is being used. User testing with the help of session recording
tools is a rapidly changing area and tools and test services are constantly being updated and
new tools and services emerge. Therefore, the workflow contains the most important parts
involved in UX and usability testing and do not focus on any specific tool or service. This will
hopefully make it applicable also in the long run, no matter which tools or test services that are
available on the market. In order to be able to decide on the most suitable tool or test service
for the specific test object and the testing objectives, important factors to considered have been
included in the workflow. Also tables displaying the current properties of the investigated tools
have been included in order to make it easier to compare the tools and to decide on which tool
that is most appropriate in the context, however these do only cover a handful of the tools and
services which are available on the market. See appendix I for the final version of the workflow
which was presented to MAG Interactive.
83
References
[1] Albert W, Tullis T. Measuring the User Experience: Collecting, Analyzing, and Presenting
Usability Metrics. Waltham: Morgan Kaufmann; 2013.
[2] Bernhaupt R. User Experience Evaluation in Entertainment, In: Bernhaupt R, editor. Evaluating User Experience in Games - Concepts and Methods. London: Springer; 2003. p 3-10.
[3] Becerra S, Smith L. How Research Plays: The UX of Mobile Gaming. [Internet]. AnswerLab.
2011 [cited 2015 Feb 5]. Slideshow presentation available from: http://answerlab.com/
wp-content/uploads/2012/05/how-research-plays.pdf.
[4] Brown E. The Life and Tools of a Games Designer, In: Bernhaupt R, editor. Evaluating
User Experience in Games - Concepts and Methods. London: Springer; 2003. p 73-87.
[5] Cavillo-Gámez EH, Cairns P, Cox AL. Assessing the Core Elements of the gaming experience, In: Bernhaupt R, editor. Evaluating User Experience in Games - Concepts and
Methods. London: Springer; 2003. p 47-73.
[6] Csikszentmihalyi M. Flow and Education [Internet]. Quality of Life Research Center Claremont, Claremont Graduate University, Claremont. 2007 [cited 2015 Feb 9]. Presentation
available from: http://www.ppc.sas.upenn.edu/csikszentmihalyipowerpoint.pdf.
[7] Darwin C. The Expression of the Emotion in Man and Animal [Internet]. London 1872
[cited 2015 February 12]. p 366. Available from: http://darwin-online.org.uk/content/
frameset?pageseq=1&itemID=F1142&viewtype=text.
[8] Fortugno N. The Strange Case of the Casual Gamer, In: Isbister K, Schaffer N, editor.
Game Usability - Advice from the Experts for Advancing the Player Experience. Burlington:
Morgan Kaufmann; 2008. p 143-145.
[9] Genise P. Usability Evaluation: Methods and Techniques: Version 2.0 [Internet]. 2002 [cited
2015 Feb 24]. Available from: http://www.netlibrary.net/article/whebn0014468774/
comparison%20of%20usability%20evaluation%20methods.
[10] TechTerms.com [Internet]. [cited 2015 May 17]. Available from: http://techterms.com/
definition/gui.
[11] Hamm J, Kohler CG, Gur RC, Verma R. Automated Facial Action Coding System for Dynamic Analysis of Facial Expressions in Neuropsychiatric Disorders. Journal of Neuroscience Methods [Internet]. 2011 September 15 [accessed: 2015 Feb
17];200(2):237–256. Available from:
http://www.ncbi.nlm.nih.gov/pmc/articles/
PMC3402717/pdf/nihms313731.pdf.
[12] Helppi VV. Mobile Game Testing – Part 1: The Importance and Difference from App
Testing. Testdroid. 2014 [cited 2015 Feb 9]. Available from: http://testdroid.com/tech/
mobile-game-testing-the-importance-and-difference-from-app-testing.
[13] Helppi
VV.
Mobile
Game
Testing
–
Part
3:
Graphics
Performance
and
UX
with
Automation.
Testdroid
[Internet].
2014
[cited
2015
Feb
9].
Available
from:
http://testdroid.com/tech/
mobile-game-testing-part-3-graphics-performance-and-ux-with-automation.
84
[14] Merriam-Webster’s Learner’s Dictionary [Internet]. [cited 2015 May 18]. Available from:
http://www.learnersdictionary.com/definition/heuristic.
[15] Hicks M, Heywood A, Küpper A. EXPOSING THE SYNERGY... Between Market Research & User Experience (UX). Slideshare [Internet] 2014 [cited 2015 April
13]. Slideshow presentation available from: http://www.slideshare.net/martinjhicks/
gf-k-market-research-ux-uxpa-2014final-37452766. Slide 18.
[16] IJsselsteijn1 W, van den Hoogen W, Klimmt C, de Kort Y, Lindley C, Mathiak K, Poels K, Ravaja N, Turpeinen M, Vorderer P. Measuring the Experience of Digital Game Enjoyment [Internet]. [cited 2015 Feb 24]. Available from:
http://www.noldus.com/mb2008/individual_papers/Symposium%20vanderHeijden/
Symposium_vanderHeijden_IJsselsteijn.pdf.
[17] Isbister K, Schaffer N. Game Usability - Advice from the Experts for Advancing the Player
Experience. Burlington. Morgan Kaufmann; 2008.
[18] Isbister K. Enabling Social Play: A Framework for Design and Evaluation, In: Bernhaupt
R, editor. Evaluating User Experience in Games - Concepts and Methods. London: Springer;
2003. p 11-22.
[19] ISO ISO 9241-210:2010 Ergonomics of human-system interaction – Part 210: Humancentred design for interactive systems. 2010.
[20] Kolko BE. Digital Games Course - Definitions. Course page for Technical Communication
498: Digital Games, University of Washington [Internet]. 2005 [cited 2015 Feb 9]. Available
from: http://faculty.washington.edu/bkolko/games/definitions.shtml.
[21] Korhonen H, Koivisto EMI. Playability heuristics for mobile games. Nokia Research
Center [Internet]. 2006 [cited 2015 Feb 9]. Available from: http://citeseerx.ist.psu.
edu/viewdoc/download;jsessionid=241373B4E699D5338FE43E9F073E021C?doi=10.1.
1.494.3021&rep=rep1&type=pdf.
[22] Kostov V, Fukoda S. Emotion in User Interface, Voice Interaction System. Tokyo
Metropolitan institute of Technology, Department of Production, Information and Systems
Engineering, Tokyo 191, Japan. 2000.
[23] Krug S. Don’t Make Me Think, Revisited: A Common Sense Approach to Web Usability.
3rd Edition. Berkeley, CA: New Riders; 2014.
[24] Krug S, Sharon T. Remotely Possible - Part 1: Remote Unmoderated Usability Testing
Webinar, UserTesting.com [Internet] 2015 [cited 2015 April 8]. Available from: http://
www.usertesting.com/resources/remotely-possible .
[25] Lankes M, Bernhaupt R, Tscheligi M. Evaluating User Experience Factors Using Experiments: Expressive Artificial Faces Embedded in Contexts, In: Bernhaupt R, editor. Evaluating User Experience in Games - Concepts and Methods. London: Springer; 2003. p 165-183.
[26] Lazzaro N. Why We Play Games: Four Keys to More Emotion Without Story. XEODesign [Internet]. 2004 [cited 2015 Feb 9]. Available from: http://www.slideshare.net/
NicoleLazzaro/gdc-4-emotions-social-games-lazzaro-slides-100311?related=1.
85
[27] Lazzaro N. Enabling Social Play: A Framework for Design and Evaluation, In: Isbister
K, Schaffer N, editor. Game Usability - Advice from the Experts for Advancing the Player
Experience. Burlington. Morgan Kaufmann; 2008. p. 319-320.
[28] Lemay P, Maheux-Lessard M. Investigating Experience and Attitudes Toward Videogames
Using a Semantic Differential Methodology, In: Bernhaupt R, editor. Evaluating User Experience in Games - Concepts and Methods. London: Springer; 2003. p 11-22.
[29] Long S, Foley E, McAllister G. Understanding the Contribution of Biometrics to Games
User Research [Internet]. 2011 [cited 2015 Feb 24]. Available from: http://citeseerx.
ist.psu.edu/viewdoc/download?doi=10.1.1.224.9755&rep=rep1&type=pdf.
[30] Lookback Documentation. Using Lookback for UX research [Internet]. [cited 2015 Feb 27].
Available from: https://lookback.io/docs/usability-testing.
[31] Lookback. Lookback’s Official Website [Internet]. https://lookback.io.
[32] Maymin S. Flow This Emotional Life. [Internet]. [cited 2015 May 24]. Available from:
http://www.pbs.org/thisemotionallife/blogs/flow.
[33] McAllister G. Delivering Successful Games. Player Research [Internet]. Presented
at Games Industry Analytics Forum 5. London. 2014 [cited 2015 Feb 24].
Slideshow presentation available from: http://www.slideshare.net/gamesanalytics/
delivering-successful-games-graham-mcallister-player-research-giaf-5 Slide 4.
[34] Mueller F, Bianchi-Berthouze N. Evaluating Exertion Games, In: Bernhaupt R, editor.
Evaluating User Experience in Games - Concepts and Methods. London: Springer; 2003. p
187-207.
[35] Nacke L, Drachen A, Korhonen H, Kuikkaniemi K, Niesenhaus J, van den Hoogen W, Poels
K, IJsselsteijn W, de Kort Y. Playability & Player Experience Research [Internet]. Presented
at DiGRA. London. 2009 [cited 2015 Feb 6]. Slideshow presentation available from: http:
//www.slideshare.net/acagamic/playability-player-experience-research. Slide 25.
[36] Nemberg M. Five Common Mistakes in Game Usability Testing And
How To Avoid Them [Internet]. Trinidad Consulting;
2013 May 22
[cited
2015
Feb
26].
Available
from:
http://www.trinidad.ee/blog/
5-common-mistakes-in-game-usability-testing-and-how-to-avoid-them/.
[37] Net Promoter The Net Promoter Score and System [Internet]. [cited 2015 Mar 12]. Available from: http://www.netpromoter.com/why-net-promoter/know.
[38] Net Promoter About Net Promoter [Internet]. [cited 2015 Mar 17]. Available from: http:
//www.netpromoter.com/why-net-promoter/about-net-promoter.
[39] Nielsen J. How to Conduct a Heuristic Evaluation [Internet]. Nielsen Norman Group;
1995 January 1 [cited 2015 Feb 23]. Available from: http://www.nngroup.com/articles/
how-to-conduct-a-heuristic-evaluation/.
[40] Beta Family [Internet]. 2015 [cited 2015 May 18]. http://betafamily.com.
86
[41] Nielsen J. Why You Only Need to Test with 5 Users [Internet]. Nielsen Norman Group;
2000 March 19 [cited 2015 Feb 11]. Available from: http://www.nngroup.com/articles/
why-you-only-need-to-test-with-5-users/.
[42] Nielsen J. Recruiting Test Participants for Usability Studies [Internet]. Nielsen Norman
Group; 2003 January 20 [cited 2015 Feb 11]. Available from: http://www.nngroup.com/
articles/recruiting-test-participants-for-usability-studies/.
[43] Nielsen J. Keep Online Surveys Short [Internet]. Nielsen Norman Group; 2004 February 2 [cited 2015 Mar 10]. Available from: http://www.nngroup.com/articles/
keep-online-surveys-short/.
[44] Nielsen J. Quantitative Studies: How Many Users to Test?
[Internet]. Nielsen
Norman Group; 2006 June 26. Available from: http://www.nngroup.com/articles/
quantitative-studies-how-many-users/ (Accessed: February 11 2015).
[45] Oliveira A, Pinho C, Monteiro S, Marcos A, Marques A. ”It is always a lot of
fun!” Exploring Dimensions of Digital Game Experience using Focus Group Methodology. Computers in Biology and Medicine [Internet]. 2013 December 1 [cited 2015 Feb
11];43(12):2205–2213. Available: http://www.sciencedirect.com/science/article/
pii/S0010482513002898.
[46] PlaytestCloud. Playtestcloud’s Official Website [Internet]. https://www.playtestcloud.
com.
[47] Reichheld FF. The One Number You Need to Grow [Internet]. Harvards Business
Review; 2003 Dec [cited 2015 Mar 10]. Available from: https://hbr.org/2003/12/
the-one-number-you-need-to-grow/ar/1.
[48] Rohrer C. When to Use Which User-Experience Research Methods [Internet]. Norman
Nielsen Group; 2014 October 12 [cited 2015 Feb 10]. Available from: http://www.nngroup.
com/articles/which-ux-research-methods/.
[49] Schade A. Remote Usability Tests: Moderated and Unmoderated [Internet]. Nielsen Norman
Group; 2013 October 12 [cited 2015 Feb 10]. Available from: http://www.nngroup.com/
articles/remote-usability-tests/
[50] Suen D. How do you conduct a good usability study for a mobile game?
[Internet]. [cited 2015 Feb 26]. Available from:
http://www.quora.com/
How-do-you-conduct-a-good-usability-study-for-a-mobile-game.
[51] Takatalo J, Häkkinen J, Kaistinen J, Nyman G. Presence, Involvement and Flow in Digital
Games, In: Bernhaupt R, editor. Evaluating User Experience in Games - Concepts and
Methods. London: Springer; 2003. p 23-46.
[52] Ülger G. Playability heuristics for mobile games using touchscreen displays. M.Sc. thesis.
Middle East Technical University; 2013.
[53] The University of Texas at Austin. Observational methods for usability testing [Internet]. Instructional Assessment Resources; 2011 Sep 21 [cited 2015 Feb 23]. Available
from: https://www.utexas.edu/academic/ctl/assessment/iar/tech/gather/method/
use-types.php.
87
[54] Usability.gov. Remote Testing [Internet]. [cited 2015 Feb 26]. Available from: http://www.
usability.gov/how-to-and-tools/methods/remote-testing.html.
[55] Usability first. Usability First’s glossary [Internet]. [cited 2015 Feb 9]. Available from:
http://www.usabilityfirst.com/glossary/playability.
[56] Usability Partners ISO standards, Standards in usability and user-centred design [Internet]. [cited 2015 Jan 26]. Available from: http://www.usabilitypartners.se/
about-usability/iso-standards.
[57] UserTesting. UserTesting’s Official Website [Internet]. https://www.usertesting.com.
[58] UserTesting. 2014 UX Industry Survey [Internet]. 2014 [cited 2015 Feb 26]. Available from:
http://info.usertesting.com/ux-industry-survey-2014-results.html.
[59] UserTesting. 2013 UX Industry Survey [Internet]. 2013 [cited 2015 Feb 27]. Available from:
http://info.usertesting.com/ux-industry-survey-2013-results.html.
[60] UserTesting. Checklist: Mobile App Testing [Internet]. [cited 2015 Feb 27] Available from:
http://info.usertesting.com/Mobile-Usability-Testing-Checklist.html.
[61] UserZoom. A demo study to illustrate the impact of screen recordings in usability research
[Internet]. 2014 August 11 [cited 2015 Feb 26]. Available from: http://www.userzoom.
com/casestudy/capture-screen-video-every-user-session/.
[62] UXCam. UXCam’s Official Website [Internet]. https://uxcam.com.
88
Appendix A
-
Initial Test Instructions
Testing the mobile game Ruzzle
This test is part of a Master’s Thesis study at Linköping University, spring 2015. The thesis
aims to investigate session recording tools used in user experience and usability tests of mobile
games.
Initially, you will be asked to accept a declaration of informed consent and fill in some information about your background. Thereafter you will be asked to perform 3 tasks in the
game application Ruzzle, while the test session is being recorded. After completing the test
session you will be asked to complete a questionnaire about your experience of the game and
the recording tool. The study will take approximately 20 minutes in total and you will need a
stable WiFi connection. Also make sure to be in a brightly lit environment.
Thank you for your participation in this user experience test!
Best Wishes
Karro and Veronica
Appendix B
-
Declaration of Informed Consent
This test is part of a Master’s Thesis study at Linköping University, spring 2015. Observations
from the test will be discussed in the thesis but no pictures of your face will be used. No names
will be mentioned, all data collected during the test will be handled anonymously and the test
can be withdrawn from the study if the user wish so.
The purpose of this study is to evaluate the built in test session recording tool as well as the
on-boarding process of the game application. The study will test the user experience of the
game application and the efficiency of the session recording tool and not the user’s skills in
using it.
You will be performing some predefined tasks which will be recorded on video. Your face och
voice will be recorded using the front camera and the microphone of the device. Additionally
the screen will be recorded in order to investigate the navigation in the app. The test facilitators will watch the recordings in order to evaluate the tool used for recording the test as well
as the game application.
There is no obligation to participate in this study and you are free to withdraw your participation at any time without further explanation.
Thank you for your participation!
I declare that I wish to take part in this study
X I have read and understand the statements in this informed consent document
Appendix C
-
Test Procedure: Lookback
Do not go to the next page before completing all the test instructions, thank you.
Try to hold the device so that your entire face will be visible to the front camera of the device,
but please feel comfortable and try not to think about the fact that you are being recorded. The
game and the session recording tools will be assessed, not your performance. We appreciate if
you are completely honest and speak your mind.
If possible, try to verbalise your thoughts in your native language when navigating in the app
(e.g explain your thoughts and intentions when registering, using the menus/buttons etc.).
Please comment on anything you find good or bad, all criticism will be of value. However
when playing the game, act naturally like you would when normally playing a game, no need
to verbalise your thoughts. Cursing or laughter etc. is encouraged if coming naturally.
Please take the following into consideration:
• It is not possible to do in-app purchases.
• The study will take approximately 20 minutes.
• Please play the game in your native language. The default setting of the game is the
same as the language on your device. If you want to change the language of the game,
click on ”Start a new game” and click on the button displaying a language in the upper
right corner.
• If the app crashes during the game session, please start again and report the crash in the
questionnaire (under the question ”Did you experience any difficulties with the recording
tool during the test session?”).
This study is voluntary and your contribution may be withdrawn from the study if you wish
so. Before starting the test make sure your device fulfills the technical requirements: iOS 6.0 or
later. Also make sure the device has enough memory to install the app and store the recordings.
1. If you have not already downloaded the application: Download and install the game using
the link in the e-mail. Open the link in your mobile browser or mobile e-mail client and
click on ”Download app”. You can also open the link in your computer browser and
download the app by scanning the QR-code with the barcode reader on your device.
2. Read through the tasks (see below) and make sure you understand them.
3. Start the game application.
4. Shake the device, a menu will appear. Press record (the big, red, round button) to start
recording. Make sure the camera is set to on (this is the default setting).
5. Say your test number out loud (this number was given to you in the e-mail).
6. Perform the tasks:
(a) Register
(b) Play the game until you have levelled up to at least level 2
(c) Please summarise your thoughts and experience of the game by speaking out loud.
7. Shake the device, a menu will appear. Press stop (the big, red, square button) to stop
recording. A preview of your game session will appear on the screen. If you want to, feel
free to watch it.
8. Click on upload in the upper right corner to upload the video.
Appendix D
-
Test Procedure: UXCam
Do not go to the next page before completing all the test instructions, thank you.
Try to hold the device so that your entire face will be visible to the front camera of the device,
but please feel comfortable and try not to think about the fact that you are being recorded. The
game and the session recording tools will be assessed, not your performance. We appreciate if
you are completely honest and speak your mind.
If possible, try to verbalise your thoughts in your native language when navigating in the app
(e.g explain your thoughts and intentions when registering, using the menus/buttons etc.).
Please comment on anything you find good or bad, all criticism will be of value. However
when playing the game, act naturally like you would when normally playing a game, no need
to verbalise your thoughts. Cursing or laughter etc. is encouraged if coming naturally.
Please take the following into consideration:
• It is not possible to do in-app purchases.
• The study will take approximately 20 minutes.
• Please play the game in your native language. The default setting of the game is the
same as the language on your device. If you want to change the language of the game,
click on ”Start a new game” and click on the button displaying a language in the upper
right corner.
• If the app crashes during the game session, please start again and report the crash in the
questionnaire (under the question ”Did you experience any difficulties with the recording
tool during the test session?”).
This study is voluntary and your contribution may be withdrawn from the study if you wish so.
Before starting the test make sure your device fullfills the technical requirements: iOS 6.0 or
later. Also make sure the device has enough memory to install the app and store the recordings.
1. If you have not already downloaded the application: Download and install the game using
the link in the e-mail. Open the link in your mobile browser or mobile e-mail client and
click on ”Download app”. You can also open the link in your computer browser and
download the app by scanning the QR-code with the barcode reader on your device.
2. Read through the tasks (see below) and make sure you understand them.
3. Start the game application and click yes when the message ”Ruzzle would like to record
your Camera Video” appears on the screen. You will now see a red dot in the upper right
corner. If more messages appear, click ”ok” on all of them.
4. Say your test number out loud (this number was given to you in the e-mail).
5. Perform the tasks:
(a) Register
(b) Play the game until you have leveled up to at least level 2
(c) Please summarise your thoughts and experience of the game by speaking out loud.
6. Press the home button to close the app/minimise it/send it to the background.
Appendix E
-
Pre Gameplay Questionnaire
In order to collect statistical data, a pre gameplay questionnaire was applied when testing
the tools/services which did not offer test user profile information. In order to be able to
weed out the test users who had played Ruzzle before, a question about this was added in
the beginning of the test for the tools/services where it was not possible to specify exact
demographics requirements.
Questions for the test user to answer before gameplay:
• Age
• Gender
• Have you played Ruzzle before?
Appendix F
-
Post Gameplay Questionnaire
The following questions were used in the post gameplay questionnaires. Some test set up
services only offered a limited number of questions and therefore all questions could not be
included. Where it was possible to present the questions using visual aids such as scale options
or check boxes, this was applied. The first 4 questions were included in all the tests. Where it
was possible to ask more questions, the questions were divided into several questions. The last
question was only asked when using Lookback and UXCam in combination with Beta Family
or Google Forms, where an unlimited number of questions could be specified.
• How likely is it that you would recommend Ruzzle to a friend or colleague (0=Not at all
likely, and 10=Very Likely)?
• Did you understand how to play the game? What did you think about the tutorial? Did
the tutorial help your understanding of the game or did you think it was unnecessary?
• Pick 3 emotions you felt during game play (e.g. engagement, positivity, challenge, stress,
excitement, confusion, frustration, boredom, happiness, sadness, uselessness, mastery,
effectiveness, meaningfulness, interest, tiredness, energetic)
• Would you like to play Ruzzle again?
• Have you played Ruzzle before?
• Do you have any recommendations, suggestions or comments regarding the game?
Appendix G
-
Session Recording Tool Survey
When it was possible to specify many post test questions, a survey regarding the session recording tool and the test user’s testing preferences was included. This survey was conducted after
the test users had completed the regular part of the test, including the post gameplay questionnaire. This was because we did not want the survey to interfere with the PX or the result
of the post gameplay questionnaire.
• Did you experience any difficulties with the recording tool during the test session?
Did you get disturbed, felt uncomfortable or did any technical issues occur? Did the app
crash during the test session? Please elaborate your answer.
• Would you have preferred if the tool had started and stopped recording automatically
without you having to navigate its menu? / Would it have been better if you could
start and stop the recording yourself instead of it being handled automatically when
opening/closing the app?
• Why, or why would you not, prefer to automatically start and stop the recording?
• Would you have preferred to be able to preview the recording before uploading it? / Did
you like the possibility to preview the recording before uploading?
• How would you have preferred to perform the test?
Would you rather do it at home on your own device where you could chose place and
time yourself, or would you rather have visited a test facility where you would have been
observed in person while playing the game?
• Do you have any recommendations, suggestions or comments regarding the session recording tool?
Appendix H
-
Questions for the Pilot Test
These questions were asked after the pilot test user had completed the entire regular test session.
• Were the instructions before the test sufficient? If not, what was missing?
• Was there anything you did not understand in the instructions or regarding the recording
tool?
• Was it easy to understand the tasks? If not, how would you suggest we improve them?
The tasks were: A. Register. B. Play the game until you have leveled up to at least level
2. C. Please summarise your thoughts and experience of the game by speaking out loud.
• Is there anything you think we should change for future testing?
Did you think anything in the test was redundant or missing? Any additional comments
on the declaration of consent, the instructions, the questions, etc.?
Appendix I
-
Final Workflow
The following pages contains the workflow which was produced for MAG Interactive.