Feasibility Study

Statistics in Mobile Computing
Lecture 3: Feasibility Study
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
Previous lecture: We will use our mobile phones in our
course
Apply the lessons learned in practice by conducting
empirical experiments with our mobile phones
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
2
1
Previous lecture: Preferred data collection app: “Sensor
Log”
Developed by Boğaziçi University student Hasan Faik Alan
Recommended sensor data logging app for this course
Available in
Current version: 1.0.9
Requires Android 2.2 and above
Direct link:
https://play.google.com/store/apps/details?id=com.hfalan.activitylog
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
3
Previous lecture: Sensor Log in a Nutshell
Objective: Log, label and export
sensory data
Motivation: Ease the process of
collecting and labeling sensory
data from smart phones.
Data are stored in sqlite
database and can be exported in
CSV format
[Hasan Faik Alan et al., 2014] Sensor Log: A Mobile Data Collection and Annotation Application
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
4
2
Previous lecture: Install Sensor Log
Search for “Sensor Log hfalan” in
Google play
Direct Link
https://play.google.com/store/apps/detai
ls?id=com.hfalan.activitylog
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
5
Previous lecture: Install Sensor Log
App requires many permissions
Don’t worry, you have full control
over your data
You control which data is collected at
which time
You control which data is exported to
whom
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
6
3
Previous lecture: Start Sensor Log
Sensor Log screen when started
for the first time
Running timer named Unknown
Three pre-defined activity labels
Walk, Run and Sit
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
7
Previous lecture: Sensor Log Screen
Label Editor
Settings Menu
Recording History
Timer
Pre-defined activity labels
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
8
4
Previous lecture: Sensor Log
Settings Menu
As a first step, select Sensors from the settings menu in
order to explore phone sensors
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
9
Previous lecture: Sensor List
Usually you should get a list of
10 to 20 modalities
Some modalities rely on the
same sensor hardware but differ
in signal pre-processing, e.g.
calibration
The list will vary for different
phone models
Select a sensor modality for
more sensor information
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
10
5
Previous lecture: Choose sensors from the list and
explore
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
11
Previous lecture: What is R?
R can be considered as a different implementation of S
[ http://www.r-project.org/ ]
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
12
6
Previous lecture: R Download and Installation
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
13
Previous lecture: Getting Started with R
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
14
7
Previous lecture: Start with some simple calculations
Enter an arithmetic expression and receive a result
> 2 + 2
[1] 4
> 2 + 2
[1] 4
Result
Index of the first number on a result line: we will see later that
the index is helpful when we receive more than a single result
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
15
Previous lecture: Assign values to variables
Assign a value to the variables x and y by using the assignment
operator <> x <- 2
> y <- 3
Check the values of the variables
> x
[1] 2
> y
[1] 3
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
16
8
Previous lecture: Getting Help and Documentation
Open a Web browser interface to R help
> help.start()
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
17
Previous lecture: R Editor
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
18
9
Previous lecture: Alternative Notepad++ Editor
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
19
Previous lecture: RStudio Integrated development
environment
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
20
10
Previous lecture: Homework
Install R and get used with it
Install sensor logging app on your mobile device
Get used with sensor logging app and record some sensor
data
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
21
Program Today
Design and conduct a feasibility study
Collect experimental data
Import collected data into R
Explore collected data and define research hypotheses
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
22
11
Feasibility Study
Our aim today is to investigate the acceleration patterns of
two activities: walking and running
For the feasibility study we collect and inspect data from our
own
We use Sensor Log for data collection
We take the phone in our right hand when performing both
activities
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
23
Prepare Sensor Log for data recording
Label Editor allows to add new
labels or remove old ones
For each label, a set of sensors
will be assigned
Only data from the assigned
sensors will be collected for a
specific label
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
24
12
Label editor
Tap on an existing label to
change sensor assignment
Tap and hold on an existing label
to remove label
Tap add button for creating a
new label
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
25
Sensor assignment
Assign 3d accelerometer sensor
to label Walk
Usually, we will assign the same
set of sensors to all activities of
an experiment
Proceed with Save
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
26
13
Sensor assignment
Assign 3d accelerometer sensor
to label Run
Proceed with Save
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
27
Get ready for starting data collection
After label definition and sensor
assignment, Sensor Log is ready
for data collection
Running timer named Unknown
shows us that app is in stand-by
mode
We now get ready for the first
activity
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
28
14
Start data recording
Tapping a label button starts the
data collection
Only data from the previously
assigned sensors will be
collected
Timer shows us the recording
time
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
29
Stop data recording
Tapping an active label button
stops data recording
Running timer named Unknown
shows us that app is in stand-by
mode again
We now get ready for the second
activity
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
30
15
Continuing data recording
Tapping a label button starts a
new data collection
Timer shows us the recording
time
We stop data recording by
tapping the active label button
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
31
Data management
Data recording history
List of data recording sessions
Inspect recoding duration and
number of samples
Delete or export data
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
32
16
Data recording history
Overview about data recording
sessions
Delete or export sessions
Tap on a session to get more
detailed information
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
33
Data recording history
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
34
17
Data export
Data is usually exported to
/storage/emulated/0/com.h
falan.activitylog/sensor.
db
Export path might differ from
phone to phone
In addition data can be exported
as comma separated value (CSV)
format file log.txt that is stored
in the same location as above
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
35
Data export
Select the app for exporting the
CSV file
Data export might not work with
all apps
Alternative: direct export as
shown in the next slide
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
36
18
Data export
Data can be directly accessed
from the usual storage path
/storage/emulated/0/com.h
falan.activitylog/
The CSV format file log.txt can
be selected as an attachment
from the storage path
As an alternative, connect phone
to a computer via USB, locate
storage path and copy file from
there
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
37
Recorded sensor data in file log.txt
...
Consists of two sections:
(1) Assignment of a status identifier called statusId for each label
(2) Recorded sensor data
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
38
19
Recorded sensor data in file log.txt
Sensor data consist of 5 columns
Status identifier representing the label
Sensor name
Sensor value(s)
Timestamp: number of seconds that have elapsed since 1st Jan 1970
In case of multi-dimensional sensor values, data is contained
in square brackets and separated by comma
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
39
Data preprocessing
In order to simplify data import and data handling we perform
some data preprocessing in the following
We remove unnecessary file headers, format syntax and
useless data entries
We transform sensor value format to simplify data import into
R
All preprocessing steps can be performed with a standard
text editor like Notepad++
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
40
20
Data preprocessing
We move the first section (assignment of a status identifier
called statusId for each label) to some other place for
documentation
The resulting file now consists of a file header in the first line
and the recorded data in the remaining lines
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
41
Data preprocessing
Sensor name
is identical in
every line
We remove it
since it is
useless in our
case
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
42
21
Data preprocessing
We need to
access the 3
acceleration
values
separately
We first
remove the
square
brackets [ and
]
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
43
Data preprocessing
Second we
replace , with
|
Afterwards our
data entries
are in the right
format
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
44
22
Data preprocessing
Finally, we correct the header in the first line:
We remove sensorName
Instead of value we use x|y|z
Now we are ready to import our recorded data into R
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
45
Recap R Basics: Vectors
Construct vector height from an array of numbers
> height <- c(1.75, 1.8, 1.65, 1.9, 1.74, 1.91)
Element-wise operations
> height^2
[1] 3.0625 3.2400 2.7225 3.6100 3.0276 3.6481
Construct a second vector weight and compute bmi
> weight <- c(60, 72, 57, 90, 95, 72)
> bmi <- weight/height^2
> bmi
[1] 19.59184 22.22222 20.93664 24.93075
31.37799 19.73630
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
46
23
Recap R Basics: Summary Statistics
Compute mean, standard deviation, variance and median
> mean(weight)
[1] 74.33333
> sd(weight)
[1] 15.42293
> var(weight)
[1] 237.8667
> median(weight)
[1] 72
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
47
Recap R Basics: Simple Scatter Plot
> plot(height,weight,pch=2)
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
48
24
Recap R Basics: Random Numbers
Generate 1000 numbers at random from the normal distribution
and plot them
> plot(rnorm(1000))
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
49
Recap R Basics: Histogram
> hist(rnorm(1000))
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
50
25
Recap R Basics: Overlaying Histogram with Normal
Density
> x <- rnorm(1000)
> hist(x,freq=F)
> curve(dnorm(x), add=T)
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
51
Data Import into R
The function read.table is the most convenient way to read
in a rectangular grid of data from a text file
> help(read.table)
read.table(file, header = FALSE, sep = "", quote = "\"'",
dec = ".", row.names, col.names,
as.is = !stringsAsFactors,
na.strings = "NA", colClasses = NA, nrows = -1,
skip = 0, check.names = TRUE,
fill = !blank.lines.skip,
strip.white = FALSE, blank.lines.skip = TRUE,
comment.char = "#",
allowEscapes = FALSE, flush = FALSE,
stringsAsFactors = default.stringsAsFactors(),
fileEncoding = "", encoding = "unknown", text)
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
52
26
Arguments of read.table
file
Name of the file which the data are to be read from
Each row of the table appears as one line of the file
If it does not contain an absolute path, the file name is
relative to the current working directory
Can also be a complete URL
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
53
Arguments of read.table
header
Logical value indicating whether the file contains the names
of the variables as its first line
If header information is available in the file it will be used for
variable names
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
54
27
Arguments of read.table
sep
Field separator character
Values on each line of the file are separated by this
character
Default value sep = "" means that the separator is ‘white
space’: one or more spaces, tabs, newlines or carriage
returns
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
55
Output of read.table
> my.data <- read.table(...)
Output of read.table is a data frame which contains a
representation of the data in the file
A data frame corresponds to what other statistical packages
call a “data matrix” or a “data set”
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
56
28
Import
First, we check the current working directory to make sure
which path to use when importing the data file
> getwd()
[1] "C:/Users/Bert/Documents"
Next, we change working directory to the path where our
data file is located in order to simplify data import
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
57
Change working directory
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
58
29
Import
Now, we import the data into the data frame my.data by
using the function read.table
> my.data <- read.table(header=TRUE, "log2.txt", sep="|")
> my.data
statusId
1
1
2
1
3
1
...
2138
2
2139
2
2140
2
23rd February 2015
x
0.22984336
0.61291564
0.11492168
y
5.28639750
5.40131900
6.35899970
z
timestamp
7.50821640 1.42454e+12
6.16746400 1.42454e+12
6.51222850 1.42454e+12
-0.91937345
-0.49799395
-0.26815060
6.35899970
5.86100600
5.93762000
6.43561400 1.42454e+12
7.24006600 1.42454e+12
8.69574100 1.42454e+12
Assistant Professor Dr. Bert ARNRICH
59
Data frame my.data
Individual variables can be accessed using the $ notation:
> my.data$x
[1]
0.22984336
[5]
0.61291564
...
0.61291564
0.38307226
0.11492168
0.30645782
0.42137950
0.34476504
> my.data$y
[1]
5.28639750
[5]
6.05254170
...
5.40131900
5.40131900
6.35899970
5.40131900
6.81868650
5.51624060
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
60
30
Data frame indexing
Notation for columns:
[,1] first column
[,2] second column
…
Notation for rows:
[1,] first row
[2,] second row
…
Single element: [row,column]
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
61
Data frame indexing
Select the third row, second column
> my.data[3,2]
[1] 0.1149217
All variables from third row
> my.data[3,]
statusId
x
y
z
timestamp
3
1 0.1149217 6.359 6.512228 1.42454e+12
The second column
> my.data[,2]
[1]
0.22984336
[5]
0.61291564
...
23rd February 2015
0.61291564
0.38307226
0.11492168
0.30645782
Assistant Professor Dr. Bert ARNRICH
0.42137950
0.34476504
62
31
Head and tail
First n cases of a data set (default n=6)
> head(my.data, n=3)
statusId
x
y
z
timestamp
1
1 0.2298434 5.286397 7.508216 1.42454e+12
2
1 0.6129156 5.401319 6.167464 1.42454e+12
3
1 0.1149217 6.359000 6.512228 1.42454e+12
Last n cases in a data set (default n=6)
> tail(my.data, n=3)
statusId
x
y
z
timestamp
2138
2 -0.9193735 6.359000 6.435614 1.42454e+12
2139
2 -0.4979940 5.861006 7.240066 1.42454e+12
2140
2 -0.2681506 5.937620 8.695741 1.42454e+12
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
63
Data frame filtering
The concept behind is to apply a Boolean evaluation function
and to use the results of the evaluation function for the
filtering
For each single element, the Boolean evaluation function
returns TRUE in case of a positive evaluation and FALSE in
case of a negative evaluation
> head(my.data$x)
[1] 0.2298434 0.6129156 0.1149217 0.4213795
0.6129156 0.3830723
> head(my.data$x > 0.5)
[1] FALSE TRUE FALSE FALSE
23rd February 2015
TRUE FALSE
Assistant Professor Dr. Bert ARNRICH
64
32
Data frame filtering
First, we create a logical evaluation vector sel with the value
TRUE for all cases which satisfy the condition
> sel <- my.data$x > 0.5
> head(sel)
[1] FALSE TRUE FALSE FALSE
TRUE FALSE
Second, we select the corresponding rows that satisfy the
condition
> head(my.data[sel,])
statusId
x
2
1 0.6129156
5
1 0.6129156
16
1 0.5363012
17
1 0.9193735
18
1 0.7661445
23rd February 2015
y
5.401319
6.052542
5.669470
5.707777
5.975927
z
6.167464
6.090849
7.776367
7.967903
8.810662
timestamp
1.42454e+12
1.42454e+12
1.42454e+12
1.42454e+12
1.42454e+12
Assistant Professor Dr. Bert ARNRICH
65
Data frame filtering
We can perform evaluation and filtering in one line
> head(my.data[my.data$x > 0.5,])
statusId
x
y
z
2
1 0.6129156 5.401319 6.167464
5
1 0.6129156 6.052542 6.090849
16
1 0.5363012 5.669470 7.776367
17
1 0.9193735 5.707777 7.967903
18
1 0.7661445 5.975927 8.810662
19
1 0.6129156 5.171476 8.734048
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
timestamp
1.42454e+12
1.42454e+12
1.42454e+12
1.42454e+12
1.42454e+12
1.42454e+12
66
33
Data frame filtering
Select data with statusId equal 2
> head(my.data[my.data$statusId==2,])
statusId
x
y
z
1177
2 -0.3064578 4.596867 7.508216
1178
2 -0.1149217 5.056554 8.350976
1179
2 -0.4979940 4.903325 9.423578
1180
2 0.1149217 5.171476 8.542512
1181
2 -0.5363012 5.171476 8.159439
1182
2 0.6895301 5.439626 7.393295
23rd February 2015
timestamp
1.42454e+12
1.42454e+12
1.42454e+12
1.42454e+12
1.42454e+12
1.42454e+12
Assistant Professor Dr. Bert ARNRICH
67
Add a new feature column
We are interested in the feature
magnitude of acceleration
Having 3d acceleration data, we
compute the magnitude as
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
68
34
Add a new feature column
Compute acceleration magnitude and save it into new column
> my.data$mag = sqrt(my.data$x^2 + my.data$y^2 + my.data$z^2)
> head(my.data)
statusId
x
1
1 0.2298434
2
1 0.6129156
3
1 0.1149217
4
1 0.4213795
5
1 0.6129156
6
1 0.3830723
23rd February 2015
y
5.286397
5.401319
6.359000
6.818687
6.052542
5.401319
z
7.508216
6.167464
6.512228
6.167464
6.090849
6.627150
timestamp
1.42454e+12
1.42454e+12
1.42454e+12
1.42454e+12
1.42454e+12
1.42454e+12
mag
9.185431
8.221163
9.102703
9.203785
8.608564
8.558044
Assistant Professor Dr. Bert ARNRICH
69
Transform timestamp
Our timestamp represents the number of seconds that have
elapsed since 1st January 1970
For practical reasons we transform the timestamp so that it
starts with 0
We subtract the smallest timestamp from each entry
> mt <- min(my.data$timestamp)
> mt
[1] 1.42454e+12
> my.data$timestamp <- my.data$timestamp - mt
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
70
35
Transform timestamp
> head(my.data)
statusId
x
1
1 0.2298434
2
1 0.6129156
3
1 0.1149217
4
1 0.4213795
5
1 0.6129156
6
1 0.3830723
y
5.286397
5.401319
6.359000
6.818687
6.052542
5.401319
z timestamp
mag
7.508216
0 9.185431
6.167464
11 8.221163
6.512228
22 9.102703
6.167464
41 9.203785
6.090849
53 8.608564
6.627150
71 8.558044
timestamp now starts with 0 and represents milliseconds
after experiment start
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
71
Data summary
> summary(my.data)
statusId
x
Min.
:1.00
Min.
:-11.760
1st Qu.:1.00
1st Qu.: 1.264
Median :1.00
Median : 10.515
Mean
:1.45
Mean
: 9.047
3rd Qu.:2.00
3rd Qu.: 15.285
Max.
:2.00
Max.
: 19.575
z
timestamp
Min.
:-10.4579
Min.
:
0
1st Qu.: -0.8428
1st Qu.: 8446
Median : 0.3448
Median :16886
Mean
: 1.8152
Mean
:23828
3rd Qu.: 6.0908
3rd Qu.:40745
Max.
: 14.7483
Max.
:49180
y
Min.
:-20.073
1st Qu.: -3.256
Median : 1.073
Mean
: 1.824
3rd Qu.: 6.483
Max.
: 19.115
mag
Min.
: 2.85
1st Qu.:10.03
Median :14.05
Mean
:15.09
3rd Qu.:19.64
Max.
:30.97
1st Qu. and 3rd Qu. refer to empirical quartiles: 0.25 and 0.75 quantiles
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
72
36
20
15
5
10
my.data$mag
25
30
Plotting data
0
10
20
30
40
50
my.data$timestamp/1000
We create a scatter plot for timestamp (in seconds) and
magnitude using type line
> plot(my.data$timestamp/1000, my.data$mag, type="l")
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
73
20
15
10
5
Acceleration Magnitude
25
30
Plotting data
0
10
20
30
40
50
Seconds
We improve our plot by providing better axis labels
> plot(my.data$timestamp/1000, my.data$mag, type="l",
xlab="Seconds", ylab="Acceleration Magnitude")
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
74
37
20
15
10
5
Acceleration Magnitude
25
30
Plotting data
0
10
20
30
40
50
Seconds
We can clearly identify the two experimental conditions Walk
and Run separated by a break in between.
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
75
20
15
10
5
Acceleration Magnitude
25
30
Plotting data
0
10
20
30
40
50
Seconds
We observe almost no acceleration change during the first
seconds at beginning/end of each condition.
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
76
38
20
15
10
5
Acceleration Magnitude
25
30
Plotting data
0
10
20
30
40
50
Seconds
Walk condition shows less frequent acceleration change and a
lower average mean acceleration.
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
77
Some data characteristics
From the plots we have observed differences in the mean
values and the variances between both conditions
We now compute mean and variance for both conditions
> mean(my.data[my.data$statusId==1,]$mag)
[1] 12.01597
> mean(my.data[my.data$statusId==2,]$mag)
[1] 18.84866
> var(my.data[my.data$statusId==1,]$mag)
[1] 10.67059
> var(my.data[my.data$statusId==2,]$mag)
[1] 43.32184
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
78
39
Working hypotheses from feasibility study
From the feasibility study we might derive the following
working hypotheses
Mean acceleration during walking is less than 15 ⁄ Mean acceleration during running is less than 20 ⁄ Mean acceleration during walking is lower than during running
Variation of acceleration during walking is lower than during running
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
79
From the feasibility study to the real experiment
We need to record data from a set of persons
In order to obtain results that can be generalized to the
general population, we need to get heterogeneous samples
that reflect the diversity of the general population
Young, mid-age, elderly
Male, female
Variation in body height and body weight
…
We should add more activities like climbing stairs, etc.
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
80
40
Homework
Perform your own feasibility study
Import data into R
Explore data and define your research hypotheses
23rd February 2015
Assistant Professor Dr. Bert ARNRICH
81
41