Statistics in Mobile Computing Lecture 3: Feasibility Study 23rd February 2015 Assistant Professor Dr. Bert ARNRICH Previous lecture: We will use our mobile phones in our course Apply the lessons learned in practice by conducting empirical experiments with our mobile phones 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 2 1 Previous lecture: Preferred data collection app: “Sensor Log” Developed by Boğaziçi University student Hasan Faik Alan Recommended sensor data logging app for this course Available in Current version: 1.0.9 Requires Android 2.2 and above Direct link: https://play.google.com/store/apps/details?id=com.hfalan.activitylog 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 3 Previous lecture: Sensor Log in a Nutshell Objective: Log, label and export sensory data Motivation: Ease the process of collecting and labeling sensory data from smart phones. Data are stored in sqlite database and can be exported in CSV format [Hasan Faik Alan et al., 2014] Sensor Log: A Mobile Data Collection and Annotation Application 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 4 2 Previous lecture: Install Sensor Log Search for “Sensor Log hfalan” in Google play Direct Link https://play.google.com/store/apps/detai ls?id=com.hfalan.activitylog 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 5 Previous lecture: Install Sensor Log App requires many permissions Don’t worry, you have full control over your data You control which data is collected at which time You control which data is exported to whom 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 6 3 Previous lecture: Start Sensor Log Sensor Log screen when started for the first time Running timer named Unknown Three pre-defined activity labels Walk, Run and Sit 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 7 Previous lecture: Sensor Log Screen Label Editor Settings Menu Recording History Timer Pre-defined activity labels 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 8 4 Previous lecture: Sensor Log Settings Menu As a first step, select Sensors from the settings menu in order to explore phone sensors 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 9 Previous lecture: Sensor List Usually you should get a list of 10 to 20 modalities Some modalities rely on the same sensor hardware but differ in signal pre-processing, e.g. calibration The list will vary for different phone models Select a sensor modality for more sensor information 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 10 5 Previous lecture: Choose sensors from the list and explore 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 11 Previous lecture: What is R? R can be considered as a different implementation of S [ http://www.r-project.org/ ] 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 12 6 Previous lecture: R Download and Installation 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 13 Previous lecture: Getting Started with R 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 14 7 Previous lecture: Start with some simple calculations Enter an arithmetic expression and receive a result > 2 + 2 [1] 4 > 2 + 2 [1] 4 Result Index of the first number on a result line: we will see later that the index is helpful when we receive more than a single result 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 15 Previous lecture: Assign values to variables Assign a value to the variables x and y by using the assignment operator <> x <- 2 > y <- 3 Check the values of the variables > x [1] 2 > y [1] 3 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 16 8 Previous lecture: Getting Help and Documentation Open a Web browser interface to R help > help.start() 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 17 Previous lecture: R Editor 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 18 9 Previous lecture: Alternative Notepad++ Editor 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 19 Previous lecture: RStudio Integrated development environment 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 20 10 Previous lecture: Homework Install R and get used with it Install sensor logging app on your mobile device Get used with sensor logging app and record some sensor data 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 21 Program Today Design and conduct a feasibility study Collect experimental data Import collected data into R Explore collected data and define research hypotheses 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 22 11 Feasibility Study Our aim today is to investigate the acceleration patterns of two activities: walking and running For the feasibility study we collect and inspect data from our own We use Sensor Log for data collection We take the phone in our right hand when performing both activities 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 23 Prepare Sensor Log for data recording Label Editor allows to add new labels or remove old ones For each label, a set of sensors will be assigned Only data from the assigned sensors will be collected for a specific label 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 24 12 Label editor Tap on an existing label to change sensor assignment Tap and hold on an existing label to remove label Tap add button for creating a new label 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 25 Sensor assignment Assign 3d accelerometer sensor to label Walk Usually, we will assign the same set of sensors to all activities of an experiment Proceed with Save 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 26 13 Sensor assignment Assign 3d accelerometer sensor to label Run Proceed with Save 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 27 Get ready for starting data collection After label definition and sensor assignment, Sensor Log is ready for data collection Running timer named Unknown shows us that app is in stand-by mode We now get ready for the first activity 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 28 14 Start data recording Tapping a label button starts the data collection Only data from the previously assigned sensors will be collected Timer shows us the recording time 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 29 Stop data recording Tapping an active label button stops data recording Running timer named Unknown shows us that app is in stand-by mode again We now get ready for the second activity 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 30 15 Continuing data recording Tapping a label button starts a new data collection Timer shows us the recording time We stop data recording by tapping the active label button 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 31 Data management Data recording history List of data recording sessions Inspect recoding duration and number of samples Delete or export data 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 32 16 Data recording history Overview about data recording sessions Delete or export sessions Tap on a session to get more detailed information 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 33 Data recording history 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 34 17 Data export Data is usually exported to /storage/emulated/0/com.h falan.activitylog/sensor. db Export path might differ from phone to phone In addition data can be exported as comma separated value (CSV) format file log.txt that is stored in the same location as above 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 35 Data export Select the app for exporting the CSV file Data export might not work with all apps Alternative: direct export as shown in the next slide 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 36 18 Data export Data can be directly accessed from the usual storage path /storage/emulated/0/com.h falan.activitylog/ The CSV format file log.txt can be selected as an attachment from the storage path As an alternative, connect phone to a computer via USB, locate storage path and copy file from there 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 37 Recorded sensor data in file log.txt ... Consists of two sections: (1) Assignment of a status identifier called statusId for each label (2) Recorded sensor data 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 38 19 Recorded sensor data in file log.txt Sensor data consist of 5 columns Status identifier representing the label Sensor name Sensor value(s) Timestamp: number of seconds that have elapsed since 1st Jan 1970 In case of multi-dimensional sensor values, data is contained in square brackets and separated by comma 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 39 Data preprocessing In order to simplify data import and data handling we perform some data preprocessing in the following We remove unnecessary file headers, format syntax and useless data entries We transform sensor value format to simplify data import into R All preprocessing steps can be performed with a standard text editor like Notepad++ 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 40 20 Data preprocessing We move the first section (assignment of a status identifier called statusId for each label) to some other place for documentation The resulting file now consists of a file header in the first line and the recorded data in the remaining lines 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 41 Data preprocessing Sensor name is identical in every line We remove it since it is useless in our case 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 42 21 Data preprocessing We need to access the 3 acceleration values separately We first remove the square brackets [ and ] 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 43 Data preprocessing Second we replace , with | Afterwards our data entries are in the right format 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 44 22 Data preprocessing Finally, we correct the header in the first line: We remove sensorName Instead of value we use x|y|z Now we are ready to import our recorded data into R 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 45 Recap R Basics: Vectors Construct vector height from an array of numbers > height <- c(1.75, 1.8, 1.65, 1.9, 1.74, 1.91) Element-wise operations > height^2 [1] 3.0625 3.2400 2.7225 3.6100 3.0276 3.6481 Construct a second vector weight and compute bmi > weight <- c(60, 72, 57, 90, 95, 72) > bmi <- weight/height^2 > bmi [1] 19.59184 22.22222 20.93664 24.93075 31.37799 19.73630 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 46 23 Recap R Basics: Summary Statistics Compute mean, standard deviation, variance and median > mean(weight) [1] 74.33333 > sd(weight) [1] 15.42293 > var(weight) [1] 237.8667 > median(weight) [1] 72 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 47 Recap R Basics: Simple Scatter Plot > plot(height,weight,pch=2) 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 48 24 Recap R Basics: Random Numbers Generate 1000 numbers at random from the normal distribution and plot them > plot(rnorm(1000)) 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 49 Recap R Basics: Histogram > hist(rnorm(1000)) 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 50 25 Recap R Basics: Overlaying Histogram with Normal Density > x <- rnorm(1000) > hist(x,freq=F) > curve(dnorm(x), add=T) 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 51 Data Import into R The function read.table is the most convenient way to read in a rectangular grid of data from a text file > help(read.table) read.table(file, header = FALSE, sep = "", quote = "\"'", dec = ".", row.names, col.names, as.is = !stringsAsFactors, na.strings = "NA", colClasses = NA, nrows = -1, skip = 0, check.names = TRUE, fill = !blank.lines.skip, strip.white = FALSE, blank.lines.skip = TRUE, comment.char = "#", allowEscapes = FALSE, flush = FALSE, stringsAsFactors = default.stringsAsFactors(), fileEncoding = "", encoding = "unknown", text) 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 52 26 Arguments of read.table file Name of the file which the data are to be read from Each row of the table appears as one line of the file If it does not contain an absolute path, the file name is relative to the current working directory Can also be a complete URL 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 53 Arguments of read.table header Logical value indicating whether the file contains the names of the variables as its first line If header information is available in the file it will be used for variable names 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 54 27 Arguments of read.table sep Field separator character Values on each line of the file are separated by this character Default value sep = "" means that the separator is ‘white space’: one or more spaces, tabs, newlines or carriage returns 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 55 Output of read.table > my.data <- read.table(...) Output of read.table is a data frame which contains a representation of the data in the file A data frame corresponds to what other statistical packages call a “data matrix” or a “data set” 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 56 28 Import First, we check the current working directory to make sure which path to use when importing the data file > getwd() [1] "C:/Users/Bert/Documents" Next, we change working directory to the path where our data file is located in order to simplify data import 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 57 Change working directory 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 58 29 Import Now, we import the data into the data frame my.data by using the function read.table > my.data <- read.table(header=TRUE, "log2.txt", sep="|") > my.data statusId 1 1 2 1 3 1 ... 2138 2 2139 2 2140 2 23rd February 2015 x 0.22984336 0.61291564 0.11492168 y 5.28639750 5.40131900 6.35899970 z timestamp 7.50821640 1.42454e+12 6.16746400 1.42454e+12 6.51222850 1.42454e+12 -0.91937345 -0.49799395 -0.26815060 6.35899970 5.86100600 5.93762000 6.43561400 1.42454e+12 7.24006600 1.42454e+12 8.69574100 1.42454e+12 Assistant Professor Dr. Bert ARNRICH 59 Data frame my.data Individual variables can be accessed using the $ notation: > my.data$x [1] 0.22984336 [5] 0.61291564 ... 0.61291564 0.38307226 0.11492168 0.30645782 0.42137950 0.34476504 > my.data$y [1] 5.28639750 [5] 6.05254170 ... 5.40131900 5.40131900 6.35899970 5.40131900 6.81868650 5.51624060 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 60 30 Data frame indexing Notation for columns: [,1] first column [,2] second column … Notation for rows: [1,] first row [2,] second row … Single element: [row,column] 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 61 Data frame indexing Select the third row, second column > my.data[3,2] [1] 0.1149217 All variables from third row > my.data[3,] statusId x y z timestamp 3 1 0.1149217 6.359 6.512228 1.42454e+12 The second column > my.data[,2] [1] 0.22984336 [5] 0.61291564 ... 23rd February 2015 0.61291564 0.38307226 0.11492168 0.30645782 Assistant Professor Dr. Bert ARNRICH 0.42137950 0.34476504 62 31 Head and tail First n cases of a data set (default n=6) > head(my.data, n=3) statusId x y z timestamp 1 1 0.2298434 5.286397 7.508216 1.42454e+12 2 1 0.6129156 5.401319 6.167464 1.42454e+12 3 1 0.1149217 6.359000 6.512228 1.42454e+12 Last n cases in a data set (default n=6) > tail(my.data, n=3) statusId x y z timestamp 2138 2 -0.9193735 6.359000 6.435614 1.42454e+12 2139 2 -0.4979940 5.861006 7.240066 1.42454e+12 2140 2 -0.2681506 5.937620 8.695741 1.42454e+12 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 63 Data frame filtering The concept behind is to apply a Boolean evaluation function and to use the results of the evaluation function for the filtering For each single element, the Boolean evaluation function returns TRUE in case of a positive evaluation and FALSE in case of a negative evaluation > head(my.data$x) [1] 0.2298434 0.6129156 0.1149217 0.4213795 0.6129156 0.3830723 > head(my.data$x > 0.5) [1] FALSE TRUE FALSE FALSE 23rd February 2015 TRUE FALSE Assistant Professor Dr. Bert ARNRICH 64 32 Data frame filtering First, we create a logical evaluation vector sel with the value TRUE for all cases which satisfy the condition > sel <- my.data$x > 0.5 > head(sel) [1] FALSE TRUE FALSE FALSE TRUE FALSE Second, we select the corresponding rows that satisfy the condition > head(my.data[sel,]) statusId x 2 1 0.6129156 5 1 0.6129156 16 1 0.5363012 17 1 0.9193735 18 1 0.7661445 23rd February 2015 y 5.401319 6.052542 5.669470 5.707777 5.975927 z 6.167464 6.090849 7.776367 7.967903 8.810662 timestamp 1.42454e+12 1.42454e+12 1.42454e+12 1.42454e+12 1.42454e+12 Assistant Professor Dr. Bert ARNRICH 65 Data frame filtering We can perform evaluation and filtering in one line > head(my.data[my.data$x > 0.5,]) statusId x y z 2 1 0.6129156 5.401319 6.167464 5 1 0.6129156 6.052542 6.090849 16 1 0.5363012 5.669470 7.776367 17 1 0.9193735 5.707777 7.967903 18 1 0.7661445 5.975927 8.810662 19 1 0.6129156 5.171476 8.734048 23rd February 2015 Assistant Professor Dr. Bert ARNRICH timestamp 1.42454e+12 1.42454e+12 1.42454e+12 1.42454e+12 1.42454e+12 1.42454e+12 66 33 Data frame filtering Select data with statusId equal 2 > head(my.data[my.data$statusId==2,]) statusId x y z 1177 2 -0.3064578 4.596867 7.508216 1178 2 -0.1149217 5.056554 8.350976 1179 2 -0.4979940 4.903325 9.423578 1180 2 0.1149217 5.171476 8.542512 1181 2 -0.5363012 5.171476 8.159439 1182 2 0.6895301 5.439626 7.393295 23rd February 2015 timestamp 1.42454e+12 1.42454e+12 1.42454e+12 1.42454e+12 1.42454e+12 1.42454e+12 Assistant Professor Dr. Bert ARNRICH 67 Add a new feature column We are interested in the feature magnitude of acceleration Having 3d acceleration data, we compute the magnitude as 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 68 34 Add a new feature column Compute acceleration magnitude and save it into new column > my.data$mag = sqrt(my.data$x^2 + my.data$y^2 + my.data$z^2) > head(my.data) statusId x 1 1 0.2298434 2 1 0.6129156 3 1 0.1149217 4 1 0.4213795 5 1 0.6129156 6 1 0.3830723 23rd February 2015 y 5.286397 5.401319 6.359000 6.818687 6.052542 5.401319 z 7.508216 6.167464 6.512228 6.167464 6.090849 6.627150 timestamp 1.42454e+12 1.42454e+12 1.42454e+12 1.42454e+12 1.42454e+12 1.42454e+12 mag 9.185431 8.221163 9.102703 9.203785 8.608564 8.558044 Assistant Professor Dr. Bert ARNRICH 69 Transform timestamp Our timestamp represents the number of seconds that have elapsed since 1st January 1970 For practical reasons we transform the timestamp so that it starts with 0 We subtract the smallest timestamp from each entry > mt <- min(my.data$timestamp) > mt [1] 1.42454e+12 > my.data$timestamp <- my.data$timestamp - mt 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 70 35 Transform timestamp > head(my.data) statusId x 1 1 0.2298434 2 1 0.6129156 3 1 0.1149217 4 1 0.4213795 5 1 0.6129156 6 1 0.3830723 y 5.286397 5.401319 6.359000 6.818687 6.052542 5.401319 z timestamp mag 7.508216 0 9.185431 6.167464 11 8.221163 6.512228 22 9.102703 6.167464 41 9.203785 6.090849 53 8.608564 6.627150 71 8.558044 timestamp now starts with 0 and represents milliseconds after experiment start 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 71 Data summary > summary(my.data) statusId x Min. :1.00 Min. :-11.760 1st Qu.:1.00 1st Qu.: 1.264 Median :1.00 Median : 10.515 Mean :1.45 Mean : 9.047 3rd Qu.:2.00 3rd Qu.: 15.285 Max. :2.00 Max. : 19.575 z timestamp Min. :-10.4579 Min. : 0 1st Qu.: -0.8428 1st Qu.: 8446 Median : 0.3448 Median :16886 Mean : 1.8152 Mean :23828 3rd Qu.: 6.0908 3rd Qu.:40745 Max. : 14.7483 Max. :49180 y Min. :-20.073 1st Qu.: -3.256 Median : 1.073 Mean : 1.824 3rd Qu.: 6.483 Max. : 19.115 mag Min. : 2.85 1st Qu.:10.03 Median :14.05 Mean :15.09 3rd Qu.:19.64 Max. :30.97 1st Qu. and 3rd Qu. refer to empirical quartiles: 0.25 and 0.75 quantiles 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 72 36 20 15 5 10 my.data$mag 25 30 Plotting data 0 10 20 30 40 50 my.data$timestamp/1000 We create a scatter plot for timestamp (in seconds) and magnitude using type line > plot(my.data$timestamp/1000, my.data$mag, type="l") 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 73 20 15 10 5 Acceleration Magnitude 25 30 Plotting data 0 10 20 30 40 50 Seconds We improve our plot by providing better axis labels > plot(my.data$timestamp/1000, my.data$mag, type="l", xlab="Seconds", ylab="Acceleration Magnitude") 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 74 37 20 15 10 5 Acceleration Magnitude 25 30 Plotting data 0 10 20 30 40 50 Seconds We can clearly identify the two experimental conditions Walk and Run separated by a break in between. 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 75 20 15 10 5 Acceleration Magnitude 25 30 Plotting data 0 10 20 30 40 50 Seconds We observe almost no acceleration change during the first seconds at beginning/end of each condition. 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 76 38 20 15 10 5 Acceleration Magnitude 25 30 Plotting data 0 10 20 30 40 50 Seconds Walk condition shows less frequent acceleration change and a lower average mean acceleration. 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 77 Some data characteristics From the plots we have observed differences in the mean values and the variances between both conditions We now compute mean and variance for both conditions > mean(my.data[my.data$statusId==1,]$mag) [1] 12.01597 > mean(my.data[my.data$statusId==2,]$mag) [1] 18.84866 > var(my.data[my.data$statusId==1,]$mag) [1] 10.67059 > var(my.data[my.data$statusId==2,]$mag) [1] 43.32184 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 78 39 Working hypotheses from feasibility study From the feasibility study we might derive the following working hypotheses Mean acceleration during walking is less than 15 ⁄ Mean acceleration during running is less than 20 ⁄ Mean acceleration during walking is lower than during running Variation of acceleration during walking is lower than during running 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 79 From the feasibility study to the real experiment We need to record data from a set of persons In order to obtain results that can be generalized to the general population, we need to get heterogeneous samples that reflect the diversity of the general population Young, mid-age, elderly Male, female Variation in body height and body weight … We should add more activities like climbing stairs, etc. 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 80 40 Homework Perform your own feasibility study Import data into R Explore data and define your research hypotheses 23rd February 2015 Assistant Professor Dr. Bert ARNRICH 81 41
© Copyright 2024