Classifier of Gait Pattern for Detecting Abnormal Health Status

Classifier of Gait Pattern for Detecting Abnormal Health Status
Qian Cheng, qcheng4
Department of Computer Science
University of Illinois at Urbana-Champaign
Abstract
Billions of persons today carry mobile devices, like smartphones and music players. Population Health would
improve if the mobility itself could detect health status problems. Gait has been used to evaluate diseases in clinical
environment, with fixing sensors on the person. We show that raw sensors in mobile devices suffice to continuously
monitor health status and to automatically detect abnormal situations, after computing with signal processing to
extract the walking motion, filtering noise generated by device movements. We developed a gait model to detect
health abnormality via neural network, by identifying normal and abnormal patterns in continuous motion utilizing
signal processing of the frequency period and deviation. Experiments were performed with 10 subjects, chosen for
personal variation in age, sex, and body size. These results show that gait patterns differ for each individual; while a
suitably trained model can accurately detect abnormal status.
Introduction
Supporting Population Health requires detecting abnormal health situations and taking appropriate actions to effectively treat the subjects 1 . Since billions of people today carry mobile devices such as cell phones and music players,
we investigate whether the mobile device itself can detect abnormal health status. Can health be continuously tracked
automatically without special devices? Can it be done by continuously recording mobile device sensors without direct
interaction by the user? We show that “Yes, gait analysis is possible with mobile devices.”
Since normal gait is the individual’s walking pattern, any abnormal gait is then a deviation from normal walking.
Watching a patient walk is the most important part of the neurological examination. Normal gait requires that many
systems, including strength, sensation and coordination, function in an integrated fashion. Many common problems in
the nervous system and musculoskeletal system will show in the way a person walks 2 . Thus gait analysis is standardly
used for diagnosing pathological function such as cerebral palsy 3 .
In acute care, measuring gait is done in a clinical setting, looking for fixed patterns indicating particular disease.
The patient is outfitted with measurement devices such as a girdle around the waist or a motion detector affixed to
the leg 4, 5 , which continuously evaluates the health status of the patient for the particular condition. In our work, we
use gait analysis to detect the health status for chronic care, where the person changes over time with lower severity.
If “weaker” gait is due to injury or to tiredness, then this improves as they rehabilitate. If “weaker” gait is due to
“ordinary” sickness, such as influenza rather than palsy, then smaller differences in pattern must be detected.
In chronic care, determining abnormal situations depends on the particular individual and the particular time/place.
The “normal” of health varies over time, decreasing as the person ages. Certain behaviors increase normal, such
as proper diet and exercise, while other behaviors decrease normal, such as unusual stress and strain. Notice that
determining “abnormal” is thus a relative comparison, rather than an absolute. The status of “health” needs to be
continuously measured and deviation from the normal detected as a statistical outliers.
There have been attempts to identify “gait speed” as the “sixth vital sign” 6 , with major longitudinal studies demonstrating strong correlation between gait speed and patient mortality 7 . Gait speed is the simplest measure of personal
mobility. Using mobility measured by gait characteristics to infer health information is a new area of study, e.g.
Studenski 8 proposes to use speed as a direct substitute for status. Gait, which is differentiated from other invariant
biometrics, is emphasized to be specific for individuals 9 . In particular biometric cohorts of gait patterns have been
studied on medical topics to detect emotional problems 10, 11 . However, there is still lack of work on personalized
gait application, differing for each individual, rather than identifying a particular cohort of persons with anxiety or
depression. In this study, we will show that speed and deviation of individual gait is an effective measure for health
status, in detecting abnormal health status automatically from a personalized model on a continuous basis.
Most research with personal sensors has utilized specialized devices to measure gait, e.g. for biometric cohort
identification 12, 13 or for personal medical diagnosis 14, 15 . Such devices are fixed to the person to directly utilize
the clinical evaluation. We will instead use ordinary smart phones and music players, so must compensate for their
additional motion caused by bouncing during walking. And we will compute a new model customized to the individual
person. Thus we will show that gait abnormality can be automatically detected with ordinary devices, which implies
that health monitors could be immediately deployed to billions of persons for measuring population health.
Pilot Study
We were the first to measure ordinary walking or gait outside of a clinical or laboratory setting. We tracked motion
using the accelerometer in a Motorola Droid 2.0 smartphone, whose sensor updates every 30 ms or at roughly 33 hertz.
For this study, the smartphone was fixed in a holster attached to a belt, in order to mimic the fixed girdle of the usual
medical gait analysis as closely as possible.
The user starts a walking session, the accelerometer data is recorded, the user stops the session then answers a short
quality of life questionnaire. This self-assessment enables correlation between walking patterns and health status.
Next raw acceleration data is converted into measurable statistics for inferring the health information of the person.
Kinesiology study of gait patterns has developed models for joint movements and gait timing 16 . A typical gait goes
through a set cyclical pattern. Beginning when the left foot makes contact on the ground, the subject then rests weight
on both feet and balances. Next, the right foot enters the toe off stage and pushes up with force. The subject then
balances his weight on his left foot during the terminal swing which brings the right foot forward. As the foot comes
forward, the subjects left heel rises slightly as the right foot then makes contact with the ground. The cycle then repeats
on with the opposite leg coming up and moving forward.
Therefore, the following statistics are measured: step speed was measured by the total time between axis crossings
to get the period of the step, the relative time that the velocity was positive, the relative time that the velocity was
negative, and the maximum and minimum velocities. The force of the step was measured by the maximum acceleration
both when the velocity starts to go positive (beginning of a step) and when the velocity reaches its peak negative
(upon contact of the foot with the ground). The subjects balance was analyzed using the maximum z acceleration
corresponding to the sway of the subject left to right during the entire gait.
A two week experiment was performed to continuously assess health status hourly, on a graduate student and a
senior professor. Data was collected when the subjects were out in public walking and carrying their phones. In
this experiment, the phones presented a dynamic post-session quality of life questionnaire, simplified from standard
instruments 17 . This allowed the subjects to enter a score either leaning toward the positive, neutral or negative in each
category. The categories were well to sick, happy to sad, healthy to injured, relaxed to stressed and energetic to tired.
The data results were stored and used to analyze the data using the parsing algorithm with a 12 second binning interval.
The most important characteristic correlated was whether the subject
felt sick or injured. The younger graduate student did not record any sick
or injured data on the questionnaire. The older professor recorded over
half the data points with a sick or injured score of lowest level 3. This can
be seen in the Figure 1, where blue stands for “professor well”, red stands
for “professor sick” and yellow stands for “student well”. It demonstrates
the walking characteristics of the sick versus well for professor versus
healthy for the student over the course of the experiment. Data for the
professor was divided between sick data points with score of 3 versus the
Figure 1
healthy data points with both sick and injured having a neutral 2 or healthy
1 score. It is interesting that the gait period is similar between the student
and healthy professor. The standard deviation is lowest for the younger
student and highest for the older professor while sick.
There are two conclusions from this. The first is that the most discriminating feature of gait analysis is gait speed,
where speed is measured by period in the figure. When a person has abnormal status, the speed and deviation of their
gait are significantly affected. Intuitively, this could be explained by moving more slowly when injured and resting
more between steps when tired. This result correlates well with the research showing that depressed patients have
tendency to walk slower and more erratic than healthy patients, as measured in laboratory setting 11, 10 . Thus, in the
main experiments here, we will focus on gait speed.
The second conclusion is that detecting abnormal gait is easier than detecting the type of abnormality. That is, a
binary diagnosis requires less data and less computation than even the 5-way diagnosis from the simple self-assessed
questionnaire. In the pilot study, stress of the student seems correlated with his fatigue level, which dominates the
trends in his walking data. Conversely, the old professor’s answers to the questionnaire did not demonstrate a coupling
between fatigue and stress. Likewise, stress not fatigue seemed to affect the trends in his walking data. Perhaps, this
can be attributed to the difference in ages. While stress was strongly linked with fatigue with the student, the professor
reported always feeling tired regardless of stress level. This demonstrates that the demographics of the subjects may
need to be accounted for when analyzing the data. So in the experiments below, we investigate multiple persons who
vary across age and abnormality, in an attempt to construct a robust model of gait pattern.
Main Study
Methodology
A. Gait Analysis without Devices Fixed on Hip
The first problem of using general mobile devices to analyze gait is that these devices are in their normally carry-on
status rather than are fixed in clinical equipment. This part of the study provides a method to pre-processing the data
for building gait models.
Coordinate Adjusting. In this study, data collection is based on mobile devices (iPhone 4 and iPod Touch 4)
with gravimeter and accelerometer assembled. Software used to collect the data is an iPod app named Sensor Data,
developed by Wavefront Labs a . All data were collected simply putting the device in the pants pockets or coat pockets,
either tighter or looser, reflecting movements of hip part during human walking, Figure 2(a).
Data collected from the mobile sensors are acceleration vectors
~v (x, y, z) based on the 3D coordinates of the device itself, Figure
2(b). The coordinates change with the position of device itself during people walking. Thus, the coordinates need to be adjusted to
remove the effect of the device position. The vector of gravity measured by gravimeter can be applied to adjust the acceleration vector
from a device position based ~v (x, y, z) to a vertical-horizontal based
~v (vertical, horizontal), Figure 2(b), in which the vertical direcFigure 3: Cross tied of 2 devices
tion is based on the direction of the gravity. Transformation between
Cartesian coordinates and spherical coordinates is applied to realize this adjusting. Finally, for each instance, we have
{~v (x, y, z) → ~v (vertical, horizontal)}.
a Copyright
2009-2011, Wavefront Labs
(a)
(b)
Figure 2: (a) Capture of test situation; (b) 3D coordinates of devices and Body-based coordinates
After the transformation, vertical represents vertical component of accelerate during walking and horizontal
represents the horizontal component.
A simple experiment on a period of walking data collection with two devices tied together in perpendicular directions were designed to see if the above process is effective, Figure 3. Figure 4(a) shows that the original accelerates
in device 3D coordinates differ on each axis. After coordinates adjusted to body-based, Figure 4(b), the acceleration
curves of two devices are coherent with each other.
(a) 3D acceleration comparison
(b) Body-based acceleration comparison
Figure 4: Comparison of data generated by 2 perpendicular devices, before and after data processing
Data Processing. In this study, gait data was collected from 10 subjects, varying in age, sex and body sizes. For
each subject, the collection session lasted for one whole week. This includes multiple separate walking sessions,
changing between normal and abnormal gait patterns for 4 subjects and always normal for the other 6 subjects. Generally, there is no strict definitions on “normal” and “abnormal” gait, but it is identified that “normal” and “abnormal”
standards should be chosen for each individual 16 . In this study, “Abnormal” for an individual is defined as walking in
situations of tiredness, physical injury, mental distress or other anomalous situations; “normal”, oppositely, is defined
as walking in regular situations. The basic information of subjects shows in Table 1. Among the subjects there are
a pair of identical twins, FYS1 and FYS2. Age varying from “young”, defined as a person in his “20s”, to “old”,
defined as a person in his “50s”. Sampling frequency is set as 50 hertz and all instances are unified into same length
and format, which reprent period of walking in same length.
Table 1: Information of Subjects
Subject
Subject MYS
Subject MYM
Subject MYL
Subject MOM
Subject MYM2
Subject FYS1(twin)
Subject FYS2(twin)
Subject FOS
Subject FOM
Subject FOL
Sex
Male
Male
Male
Male
Male
Female
Female
Female
Female
Female
Age
Young
Young
Young
Old
Young
Young
Young
Old
Old
Old
Body Size
Small
Medium
Large
Medium
Medium
Small
Small
Small
Medium
Large
Type of Abnormality
Vertebra hurts
Ankle injury
Extreme tiredness
Back injury
N/A
N/A
N/A
N/A
N/A
N/A
Discrete Fourier Transform is a widely used method to process discrete sampling signals. “The discrete Fourier
transform is an equation for converting time domain data into frequency domain data.” 18 A sequence of N complex
numbers can be transformed into another N complex numbers sequence by the DFT formula (1). The new sequence is
the reflection of the original sequence on frequency domain.
Xk =
N
−1
X
k
xn · e−2iπ N n
(1)
n=0
The fast Fourier Transform (FFT) is an efficient way to implement DFT with improvements of quantization noise
and reducing computational cost 18 . Since gait signals are time-sequential and periodical, FFT is a potential method to
be applied in gait analysis 16 . Boulgouris et al. claimed “The Fourier analysis is a very appealing approach as most
discriminative information is expected to be compacted in a few Fourier coefficients, providing a very efficient gait
representation.” 9 . All features extracted from gait accelerate sequence form a time series, and gait itself reflects a
series of periodic patterns. After FFT, a time sequencial sample is transformed to a spectrum in frequency domain.
In gait analysis, gait cycle time, or gait speed, is recognized as primary feature of gait pattern 16 . It is reflected as
spectrum changing in the frequency domain. The comparison between fast gait speed and slow gait speed in both time
domain and frequency domain is shown in Figure 5. The difference can be obviously recognized.
(a) comparison in time domain
(b) comparison in frequency domain
Figure 5: comparison of gait speed curves and spectrum in both time and frequency domains
In frequency domain, the peaks of the spectrum reflect the components of the original curve in time-series. Accelerates generated by shaking and bouncing of the device itself also contributes to the formation of the curve, which
may noisily affect the detection result. Effective noise suppression methods based on FFT has already been applied
in voice processing in mobile telephone systems. In that work, noise can be suppressed if its spectrum is different
from the voice spectrum 19 . Since device movements change asynchronously with body movements during walking, a
low-frequency and high-amplitude pass filter is applied to eliminate if from the major curve. To implement this, the
top 50 peaks in lower frequency 0-15 hertz are selected as the input of supervised machine learning.
B. Personalized Model for Abnormal Gait Detection
Back-propagation Neural Network. BPNN is a supervised machine learning method. It is widely applied in outlier
detection problems 20 . In this study, Levenberg-Marguardt algorthm is used to pre-processing the training data 21 . The
structure of the neural network is as follow: basic 3-layer network is applied with the Logistic-Signoid function as the
hidden layer transfer function; the learning rate is set as 0.05; maximum number of epochs to train is set as 30 and the
stop mean square error is set as 0.001 22 .
Model Training. The model randomly selects part of the gait data as the training set and the rest as the testing
set. Since the model is built to detect abnormal status, test on abnormal instances is more important than on normal
ones. To ensure enough abnormal instances in testing set, 30% of abnormal instances and 15% of normal instances
are randomly selected as testing data. All other instances form the training set. Data are normalized by z-score 23 and
labeled: normal instances were labeled as 1 and abnormal ones were labeled as -1.
Model built by BP neural network is a prediction model which returns a score within [-1,1] to measure the health
status for each instance. Result closer to 1 means it has higher possibility to be normal and vice versa. To make it a
classification model, the distance d(p, l) from each prediction value to its label value is used to classify this instance:
the instance belongs to the class where the distance is smaller, (2).
class(i) =
Normal
Abnormal
d(p, lnorm ) < d(p, labnorm )
d(p, lnorm ) > d(p, labnorm )
(2)
Model Evaluation. Statistical method is used to evaluate the classification model. In this study, abnormal status
is defined as “positive” and normal status as “negative” as it is in many medical cases. Three statistical values are used
to evaluate the quality of model: Accuracy represents the overall accuracy of classification, Sensitivity represents how
often an abnormal status can be correctly detected and Specificity represents how often a normal status can be correctly
detected.
Accuracy = (TruePositive + TrueNegative)/All
(3)
Sensitivity = TruePositive/Positive
(4)
Specificity = TrueNegative/Negative
(5)
Additionally, for the prediction model we introduce the average Manhattan distance to detailedly evaluate the
detection quality. ManDist(r, l) represents the average Manhattan distance between the result vector and the label
vector, shown as (6). N represents the number of instances in this sample. ManDist> 0 and smaller ManDist
represents closer relationship between the result vector and the label vector, which means the model has better quality
of prediction 23 ; vice versa. This method is designed specifically for this study. Due to limitation of the paper length,
we will not deeply discuss the effectiveness of this method.
ManDist(r, l) =
N
1 X
kri − li k
N i=1
(6)
Results and Discussion
A. Comparison between Device Fixed and Unfixed
In this part, sample trials with 2 devices (One fixed on the holster and the other loosely put in the pocket) simultaneously are done to compare their difference, to see if the result actually reflects gait after the data processing procedure.
Both of the data samples are tested on the prediction model. Each contains 20 instances. In statistics, 2-sample t-test 24
Table 2: 2-sample t-test of fixed and unfixed results
Number of Instances
20
20
Fixed
Unfixed
p value
Mean
0.9743
0.9746
0.92
Standard Deviation
0.0076
0.0142
is widely used to evaluate difference level between the two samples, shown in Table 2. The p value is 0.92 in risk
level of α = 0.05, showing there is no significant difference between detection results of the two samples. It indicates
that with the data processing methods, the detection quality when the device is loosely put in the pants pocket is very
similar to that when the device is fixed on person with specific equipment.
B. Personalized Detection Model
Individual Test. In this study, abnormal status of four subjects has been recorded, shown in Table 1. A personalized
model is built for each of them. The evaluation of the classification results are shown in Table 3.The overall accuracy,
sensitivity and specificity of models are 93.8%, 90.1% and 95.4%. Generally, it implies individual gait detection
models work well for subjects in different ages and different body sizes.
Table 3: Evaluation of classification models
Subject
MYS
MYM
MYL
MOM
Average (Total)
Instances (normal, abnormal)
89(59, 30)
203(183, 40)
101(66, 35)
368(185, 183)
761(493, 268)
Accuracy
0.889
0.897
1.000
0.964
0.938
Sensitivity
0.889
0.750
1.000
0.963
0.901
Specificity
0.889
0.963
1.000
0.966
0.954
The evaluation of the prediction models is shown in Table 4. For a specific average Manhattan distance ManDisti,j ,
row i specifies which model the sample was tested on and column j represents the actual test sample. For example,
ManDistMYM,MYL means the average Manhattan distance when MYL’s sample testing on MYM’s model. It is clearly
shown that distances on the diagonal are of lowest values. It means for each subject, the prediction quality is the
highest when testing on his own model.
Table 4: Evaluation of classification models
Model
MYS
MYM
MYL
MOM
MYS
0.4003
0.8069
0.8283
0.905
MYM
0.9969
0.3094
1.1037
0.5700
MYL
1.0331
0.8202
0.0965
1.5324
MOM
0.8467
0.8802
1.4017
0.1182
Personal Variation. A series of cross tests were performed to further analyze the feasibility of the models under
different conditions. Samples from one subject were tested on models of other subjects to supplement our experiments.
The evaluation parameters Accuracy, Specificity, Sensitivity and ManDist of the cross test result are compared with that
of individual test. If these parameters in cross test are much lower than that in individual test, it indicates that model
cannot be shared with the test subject. Because our experiments only record abnormal gait for 4 subjects, in their test
sample evaluation, Sensitivity is unavailable and Specificity is same as Accuracy. Thus, we only need to do comparison
on Accuracy and ManDist. That is, we can tell statistically whether the model generated for the 6 subjects with no
abnormal recordings is different from that using the model for 1 of the 4 subjects with abnormal recordings.
Variation of Different Ages. Result from cross test on young male subject (MYM) and old male subject (MOM)
with the same body size is shown in Table 5. The low Sensitivity indicates the abnormal status is difficult to detect
when the sample is tested on a model of another person in an apparently different age range. For the model of the old
subject, the abnormal points of the young subject totally cannot be detected.
Table 5: Cross test of Young and Old
Sample
MYM
MOM
Average
Model
MOM
MYM
Accuracy
0.553
0.831
0.692
Sensitivity
0
0.741
0.371
Specificity
0.963
1.000
0.982
ManDist
0.5700
0.8802
0.725
Variation of Different Sex. We have samples of 3 old females testing on the model of the old male (MOM). The
results in a low Accuracy and high average Manhattan distance, shown in Table 6. All instances of the old females
data are normal status, but the results shows most of them located in the abnormal region. These tests indicate that the
old male’s model does not work well for the old female’s samples even with the same body size.
Table 6: Cross test of Different Sex
Sample
FOS
FOM
FOL
Average
Model
MOM
MOM
MOM
Accuracy
0.425
0.143
0.325
0.298
ManDist
1.162
1.570
1.364
1.365
Table 7: Cross test of Different Body Sizes. Each cell represents [(Accuracy, Sensitivity, Specificity), ManDist]
Model
MYS
MYM
MYL
MYS
(0.889, 0.889, 0.889), 0.400
(0.500, 0, 1), 0.973
(0.575, 0.450, 0.700), 0.828
MYM
(0.500, 0, 1), 0.997
(0.897, 0.750, 0.963), 0.309
(0.325, 0.650, 0), 1.104
MYL
(0.500, 0, 1), 1.033
(0.300, 0, 0.600)1.296
(1, 1, 1), 0.097
Variation of Different Body Sizes. The young subjects have good distribution in body size leading us to cross
test with MYS, MYM and MYL. Results are shown in Table 7 with row i representing the tested model and column
j representing the test sample. This test once again demonstrates that the detection result in lower quality than testing
on model of another subject. This further emphasizes that people in different body sizes cannot share gait models.
MYM2 is another young male subject in medium body size. His sample is tested on MYM’s model. The Accuracy
of this test is 0.697 and the ManDist is 0.560. Thus, even though these two individuals are the same age, sex, and body
size, it is still unreliable for them to share gait models.
Variation of Identical Twins. We next collected data from a pair of identical twins (FYS1 and FYS2) without
abnormal status. We test their samples on MYL’s model, which is in the same age and body size as the twins, but
not the same sex, to see the difference between these two samples testing on the same model. A 2-sample t-test was
done on results of these two samples, shown in Table 8. In the t-test, the p value equals to 1.70 ∗ 105 , indicating that
there is significant difference between the two test samples. Thus, even if subjects share not only phenotypical but also
genetically common features, their gait patterns can still be different.
The preceding experiments indicate that individuals in different categories (age, sex and body size) are unable to
share gait models. Moreover, even though two individuals share the most common features (same age, sex, body size
and genetic similarity), the prediction results between models are not as reliable as the results on their own models.
Table 8: 2-sample t-test of FYS1 and FYS2, risk level α = 0.05
Sample
FYS1
FYS2
p value
Model
MYS
MYS
Mean
0.402
0.723
1.70 ∗ 10−5
Standard Deviation
0.280
0.345
Conclusion
We have shown it feasible to develop a fully automatic system for health monitors using gait analysis with mobile
devices, continuously tracking personal status and automatically detecting when the status differs from “normal”.
This enables point of contact diagnosis to treat whatever health problem has arisen. The monitoring system uses mass
commercial devices such as Android smartphones and Apple iPods, thus is immediately applicable to continuous
measurement for population health. The abnormal detection is based upon a personalized model for gait analysis, a
model for detecting patterns in personal walking, different for and specialized to each individual.
The gait model is a novel method, based on mobile device data collection, combining signal processing and machine
learning. It is constructed using two major experimental results. The first is moving gait analysis from clinical settings
to everyday settings. That is, rather than being fixed upon the person with specialized devices, the sensors are within
ordinary devices carried in an ordinary way, in particular smart phones and music players carried in the pockets of
pants and coats. We have shown that the additional motion from unfixed carrying can be eliminated using motion
vectors from the gravimeter, to focus entirely on forward movement walking patterns.
The second experimental result is to show that the model is personalized, that gait differs from individual to individual not from cohort to cohort. We have shown for a few individuals, that normal from abnormal gait can be
detected using frequency of period and deviation of walking steps using accelerometer. We systematically test the
model on a representative sample of demographic persons. Ten persons were tested across different models, while
varying age, sex, and body size. That is, we test young and old, male and female, plus three classes of body size from
small to medium to large. The results clearly show that each person needs their own model to accurately predict gait
abnormality. We even test identical twins, whose gait differs from each other according to the model. The model is
sufficiently detailed to predict gait abnormality for an individual, currently needing a week of walking sessions for
machine learning to train a personalized model.
Design of Real-time monitoring system
To completely make a mobile app to measure gait in real-time, an integrated system needs to be established. The pilot
study built a simple phone app, while the main study used phones only to collect the data, which was then analyzed
independently. To realize our goal of a fully automated health monitor, the analysis required to decide between normal
and abnormal gait patterns must be performed on the mobile device in real time. The initial data processing and data
collection is feasibly implemented on standard smart phone architecture. Unfortunately, machine learning methods
required to generate the models require substantial computing resources well outside the scope of current mobile
devices. To solve this, we propose the development of a distributed system which utilizes the mobile devices to record
a learning data sample that can then be compressed and transmitted to a powerful server which will generate the
custom user profile. This profile can then be downloaded to the phone and matched against real time data to determine
health status. As long as the user’s profile remains consistent, learning can be limited and if a user’s profile drastically
changes, the learning process can be repeated. We believe such a system to be easily realizable but leave the prototype
development to future work.
Data transmission
Since the real-time monitoring system needs data transmission between the phone client and the server, data compression is a major concern. Thus, the pre-processing of data is better done at client side. Then we only need to transmit
the compressed data to the server and process machine learning on it.
Mining Stream Data
Real-time gait data is actually a stream of data 23 . When deal with stream data, we have to concern more about
managing continuous data stream. The first concern is data deletion, which means deleting useless data to release
storage space. Since most of gait pattern collected should be normal gait, for each prediction period, we can delete the
normal gait data after the classification and keep the abnormal data in storage.
Future Work
Taking the measurement device out of uncomfortable clinical equipment such as a girdle is a vast improvement in
chronic gait analysis. However, people may also carry their cell phones in other ways rather than in their pants pocket.
For example, females may put them in a handbag or someone may put them in their shirt pocket or a backpack, or
simply carry them in hand. Since gait patterns generated from different parts of human body may vary 16 , which
part of the pattern is generated needs to be detected before gait is computed. At the same time, data generated when
people are not really walking should be eliminated before gait analysis as well. Jigsaw project 25 and other activity
recognition methods using cell phone accelerometers 26 provide promising ways to detect and eliminate motion during
bicycle riding and stair climbing among others. This work will also extend battery life for supporting an automatically
switching the device off when detecting the person is not really walking.
In this work, we consider a binary classification of normal gait and abnormal gait in our gait analysis; however, the
category of health status should be more precisely identified. To complete a whole medical study, data on more subjects
in each category is necessary. For each subject, a longer term test is also required to monitor the continuous change
of gait, rather than a simple decision of normal or abnormal. Specifically, abnormal gait has different types including
pathological gait. Studies on how to categorize these different abnormal types and implement detections for each of
them is a necessary and challenging next step to this work. A semi-supervised machine learning method 23, 27 can be
applied to build the model. An initial training set is required to build a baseline of detection first. The training sets need
to be determined: how much collected data, or how long the data collection period is the minimal requirement to build
an efficient model? Future work with multiple test is needed to answer these questions. Once a potential abnormal
status is detected, users can be queried to diagnose the situation either by using a quality of life questionnaire as in the
pilot study or by recording a user message via voice recognition and extracting the outcomes using natural language
processing 28 , as in our previous work with personal health messages typed as text by patients 29, 30 .
Acknowledgements
Part of this paper is retrieved from Cheng, Q., Li, Y., Jiang, Y., Juen, J. and Schatz, B.R., Personalized Model of Gait
Pattern for Detecting Abnormal Health Status, American Medical Informatics Association (AMIA) Annual Symposium
Proceedings, submitted on March, 2012. The project is designed for CS512 course project and instructed by Prof.
Jiawei Han and Prof. Bruce R. Schatz.
References
1. Schatz B and Berlin R. Healthcare Infrastructure: Health Systems for Individuals and Populations. London: Springer-Verlag,
2011.
2. Longo D, Kasper D, Jameson J, Fauci A, Hauser S and Loscalzo J. Harrison’s Principles of Internal Medicine. New York:
McGraw-Hill, 18th edition, 2011.
3. Perry J and Burnfield J. Gait analysis: normal and pathological function. SLACK Incorporated, 2nd edition, 2010.
4. LeMoyne R, Coroian C, Mastroianni T, Opalinski P, Cozza M and Grundfest W. Chapter 10: The merits of artificial proprioception, with applications in biofeedback gait rehabilitation concepts and movement disorder characterization. InTech,
2009.
5. LeMoyne R, Coroian C, Mastroianni T, Wu W, Grundfest W and Kaiser W. Virtual proprioception with real-time step detection
and processing. In Proc. 30th Annual Int. Conf. of the IEEE Engineering in Medicine and Biology Society EMBS 2008, pages
4238–4241, 2008.
6. Fritz S and Lusardi M. White paper: Walking speed: the sixth vital sign. J Geriatic Physical Therapy, 32:2–5, 2009.
7. Studenski S, Perera S, Patel K, Rosano C, Faulkner K, Inzitari M et al. Gait speed and survival in older adults. J American
Medical Association JAMA, 305:50–58, 2010.
8. Studenski S. Bradypedia: Is gait speed ready for clinical use? The Journal of Nutrition, Health Aging, 13:878–880, 2009.
ISSN 1279-7707.
9. Boulgouris N, Hatzinakos D and Plataniotis K. Gait recognition: a challenging signal processing technology for biometric
identification. Signal Processing Magazine, IEEE, 22:78–90, 2005.
10. Janssen D, Schollhorn W, Lubienetzki J, Folling K, Kokenge H and Davids K. Recognition of emotions in gait patterns by
means of artificial neural nets. Journal of Nonverbal Behavior, 32:79–92, 2008. ISSN 0191-5886.
11. Lemke M R, Wendorff T, Mieth B, Buhl K and Linnemann M. Spatiotemporal gait patterns during over ground locomotion
in major depression compared with healthy controls. Journal of Psychiatric Research, 34(4C5):277 – 283, 2000. ISSN 00223956.
12. Gafurov D, Helkala K and Sondrol T. Biometric gait authentication using accelerometer sensor. Journal of Computers, 1(7):
51–59, 2006.
13. Mantyjarvi J, Lindholm M, Vildjiounaite E, Makela S and Ailisto H. Identifying users of portable devices from gait pattern
with accelerometers. In Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP ’05), volume 2, 2005.
14. Crossan A, Murray-Smith R, Brewster S, Kelly J and Musizza B. Gait phase effects in mobile interaction. In CHI ’05 extended
abstracts on Human factors in computing systems, CHI EA ’05, pages 1312–1315, New York, NY, USA, 2005. ACM. ISBN
1-59593-002-7.
15. Byrne R, Eslambolchilar P and Crossan A. Health monitoring using gait phase effects. In Proceedings of the 3rd International
Conference on PErvasive Technologies Related to Assistive Environments, PETRA ’10, pages 19:1–19:7, New York, NY, USA,
2010. ACM.
16. Whittle M. Gait Analysis: An Introduction. Butterworth Heinemann Elsevier, 4th edition, 2007.
17. McDowell I. Measuring Health: A Guide to Rating Scales and Questionnaires. New York: Oxford Press, New York, 3rd
edition, 2006.
18. Smith W. Handbook of Real-Time Fast Fourier Transforms(Algorithms to Product Testing). Wiley-IEEE Press, 1995.
19. Yang J. Frequency domain noise suppression approaches in mobile telephone systems. In Acoustics, Speech, and Signal
Processing, 1993. ICASSP-93., 1993 IEEE International Conference on, volume 2, pages 363 –366 vol.2, april 1993.
20. Hodge V and Austin J. A survey of outlier detection methodologies. Artificial Intelligence Review, 22(2):85–126, 2004.
21. Lampton M. Damping-undamping strategies for the levenberg-marquardt nonlinear least-squares method. Computers in
Physics, 11:110, 1997.
22. Hecht-Nielsen R. Theory of the backpropagation neural network. In Proc. Int Neural Networks IJCNN. Joint Conf, pages
593–605, 1989.
23. Han J, Kamber M and Pei J. Data Mining: Concepts and Techniques. Morgan Kaufmann, 3rd edition, 2010.
24. Stein C. A two-sample test for a linear hypothesis whose power is independent of the variance. The Annals of Mathematical
Statistics, 16(3):pp. 243–258, 1945. ISSN 00034851.
25. Lu H, Yang J, Liu Z, Lane N, Choudhury T and Campbell A. The Jigsaw continuous sensing engine for mobile phone
applications. In Proceedings of the 8th ACM Conference on Embedded Networked Sensor Systems, 11 2010. ISBN 978-14503-0344-6.
26. Kwapisz J, Weiss G and Moore S. Activity recognition using cell phone accelerometers. Special Interest Group on Knowledge
Discovery and Data Mining (SIGKDD) Explor. Newsl. ACM, 12:74–82, Mar 2011. ISSN 1931-0145.
27. Lu Y and Zhai C. Opinion integration through semi-supervised topic modeling. In Proceedings of the 17th international
conference on World Wide Web, WWW ’08, pages 121–130, New York, NY, USA, 2008. ACM. ISBN 978-1-60558-085-2.
28. Demner-Fushman D, Chapman W and McDonald C. Methodological review: What can natural language processing do for
clinical decision support? J. of Biomedical Informatics, 42(5):760–772, Oct 2009.
29. Chee B, Berlin R and Schatz B. Predicting adverse drug events from personal health messages. In AMIA Annual Symposium
Proceedings, pages 217–226, 2011.
30. Jiang Y, Lin CX and Schatz B. Muliti-class classification for online personal healthcare messages. In The 2nd International
Workshop on Web Science and Information Exchange in the Medical Web, 2011.