Z. Cheng, Dynamic Human Shape Description and Characterization Dynamic Human Shape Description and Characterization Z. Cheng*†, S. Mosher†, Jeanne Smith† H. Cheng‡, and K. Robinette‡ † Infoscitex Corporation, Dayton, Ohio, USA th ‡ 711 Human Performance Wing, Air Force Research Laboratory, Dayton, Ohio, USA Abstract Dynamic human shape description and characterization was investigated in this paper. The dynamic shapes of a subject in four activities (jogging, limping, shooting, and walking) were generated via 3-D motion replication. The Paquet Shape Descriptor (PSD) was used to describe the shape of the subject in each frame. The unique features of dynamic human shapes were revealed from the observations of the 3-D plots of PSDs. The principal component analysis was performed on the calculated PSDs and principal components (PCs) were used to characterize the PSDs. The PSD calculation was then reasonably approximated by its first few projections in the eigenspace formed by PCs and represented by the corresponding projection coefficients. As such, the dynamic human shapes for each activity were described by these projection coefficients. Based on projection coefficients, data mining technology was employed for activity classification. Case studies were performed to validate the methodology developed. Keywords: Human Modeling, Dynamic Shape, Shape Descriptor, Principal Component Analysis, Activity Recognition 1. Introduction While a human is moving or performing an action, his body shape is changing dynamically. In other words, the shape change and motion are tied together during a human action (activity). However, human shape and motion are often treated separately in activity recognition. The shape dynamics describe the spatial-temporal shape deformation of an object during its movement and thus provide important information about the identity of a subject and the motions performed by the subject (Jin and Mokhtarian, 2006). A few researchers utilized shape dynamics for human activity recognition. In (Kilner et al, 2009), the authors addressed the problem of human action matching in outdoor sports broadcast environments by analyzing 3-D data from a recorded human activity and retrieving the most appropriate proxy action from a motion capture library. In (Niebles and Li 2007), a video sequence was represented as a collection of spatial and spatial-temporal features by extracting static and dynamic interest points; then a hierarchical model was proposed that can be characterized as a constellation of bags-of-features, both spatial and temporal. In (Jin and Mokhtarian 2006), a system was proposed for recognizing object motions based on their shape dynamics. The spatial-temporal shape deformation in motions was captured by hidden Markov models. In (Blank et al 2005), human action in video sequences was seen as silhouettes of a moving torso and protruding limbs undergoing articulated motion. Human actions were regarded as three dimensional shapes induced by the silhouettes in the space-time volume. Dynamic human shapes can be described by a dynamic 3-D human shape model which, in turn, can be extracted from 2-D video imagery or 3-D sensor data or created by 3-D replication/animation. A dynamic 3-D shape model usually contains tens of thousands of graphic elements (vertices or polygons). In order to use the information coded in dynamic shapes for human identification and activity recognition, it is necessary to find an effective method for dynamic shape description and characterization. 2. Dynamic Shape Creation Since the technologies that are capable of capturing 3-D dynamic shapes of a subject during motion are still very limited in terms of their maturity and availability, there is very little data available at this time for human dynamic shapes. However, as a motion capture system can be used to capture human motion and a laser scanner can be used to *Corresponding author. Email: [email protected] 1 Z. Cheng, Dynamic Human Shape Description and Characterization capture human body shape, various techniques have been developed to replicate/animate human motion in a 3-D space, thus generating dynamic shapes of a subject in an action. In this paper, Blender (http://www.blender.org), an open source software tool was used to animate the motion of a human subject in a 3-D space during four different activities. The four activities were walking, jogging, limping and shooting. The data that were used as a basis for the animation were acquired in the Human Signatures Laboratory of the US Air Force, including scan data and motion capture (MoCap) data. The human subject, with markers attached, was scanned using the Cyberware whole body scanner. Motion capture data were acquired for the same subject with the same markers attached. The markers allowed the joint centers to be determined for both the scan and the MoCap data. The scan was imported into Blender and the joint centers were used to define the skeleton in BVH (Bio-vision Hierarchical) file format. Euler angles for the different body segments were computed from the joint centers and other markers and used to set up BVH files for the four different activities. The BVH files were imported into Blender and used to animate the whole body scan of the subject. Figure 1 shows the images captured from the animation within Blender for the four activities. From Blender animation, a 3-D mesh can be output at each frame of motion, as shown in Fig. 2 for limping, which can be used to represent the 3-D dynamic body shape of the subject at this instant of motion. Thus, the output of the 3-D mesh in each frame can be used as the simulation data of dynamic human shapes for training the algorithms developed for activity recognition. 3. Dynamic Shape Characterization Description calculation are omitted here. The 3-D plots of the time histories of PSD for four activities are illustrated in Fig. 4 where the first 40 bins corresponding to radius are show on the top, the second 40 bins of cos(θ) in the middle, and the last 40 bins of cos(δ) at the bottom. Figure 1. Replication of a subject in four activities: limping, jogging, shooting, and walking. Figure 2. Dynamics shapes of a subject during limping. Example Cord r P1 Left Arm θ (0,0,0) ) δ P3 P2 Chest Figure 3. Paquet shape descriptor and its coordinate system. and Dynamic shapes shown in Figs. 1 and 2 are represented by 3-D meshes. Each mesh may contain as many as tens of thousands of graphical elements (vertices or polygons). It is not feasible to use the vertices or polygons directly for the analysis of human shape dynamics. One way to effectively describe dynamic shapes and to enable further analysis is by using a shape descriptor (Cohen & Li 2003; Chu & Cohen 2005). In this paper, the Paquet Shape Descriptor (PSD) (Paquet et al 2000; Robinette 2003) with certain modifications is used to describe dynamic shapes and to analyze shape dynamics. As illustrated in Fig. 3, the PSD uses 120 bins (discrete parameters) to characterize shape variation. Among these 120 bins, 40 are related to the radius r, 40 to the first angle (cos(θ)), and 40 to the second angle (cos(δ)). The details of the PSD By visually looking at these plots, one can find that: The variation of each bin over time is different: the variations of some bins over time are larger and significant, but others are not. Periodic features are exhibited by the plots for the activities of jogging, limping, and walking. The 3-D plot for each activity is unique. There are visible and significant differences among the plots for four activities. These observations from the PSD reveal some unique features of shape dynamics. However, directly using PSD to analyze shape dynamics is still not feasible, since it has 120 bins (variables) which form a space of 120-dimension. Further treatment becomes necessary to characterize the shape descriptor and to reduce the dimension of the problem space. 2 Z. Cheng, Dynamic Human Shape Description and Characterization Figure 4. The time histories of 120 bins of PSD for four activities 3 Z. Cheng, Dynamic Human Shape Description and Characterization Therefore, the principal component analysis (PCA) is used to characterize the high-dimensional space defined by the PSD. Denote (1) pijk { p1 p2 p120}Tijk , as the PSD shape descriptor for the i-th subject in jth activity at k-th frame. For the data collected, denote (2) P {pijk }, i 1,..., I ; j 1,..., J ; k 1,..., K . In this paper, dynamic shapes were created for the four activities at the frame rate of 0.02 /s, with 85 frames for jogging, 352 frames for limping, 554 frames for shooting, and 227 frames for walking. The percentage of variance of each principal component (PC) is shown in Fig. 5, and the first four PCs are show in Fig. 6. The original PSD vector can be projected onto the space (eigenspace) formed by PCs, that is, it can be expanded in terms of PCs. As shown in Fig. 5, among all 120 PCs, only the first 10~20 are significant. This means that the original PSD can be reasonably approximated by its first few projections in the eigenspace and represented by the projection coefficients corresponding to these significant PCs. Figure 7 illustrates the time histories of the first and second projection coefficients for four activities. where I represents the number of subjects, J is the number of actions, and K is the number of frames for each action. However note that the number of activities that each subject performs can be different and the number of frames for each activity can be different also. By performing PCA of P one can find the principal components that characterize the space defined by the shape descriptor. Figure 5. Percentage of variance of each principal component Principal Component 2 0.30000 0.20000 0.10000 0.00000 -0.10000 -0.20000 -0.30000 -0.40000 Magnitude Magnitude Principal Component 1 0 40 80 0.30000 0.20000 0.10000 0.00000 -0.10000 -0.20000 -0.30000 120 0 40 Bin Number 120 80 120 Principal Component 4 Principal Component 3 0.20000 0.30000 0.10000 0.20000 Magnitude Magnitude 80 Bin Number 0.00000 -0.10000 -0.20000 0.10000 0.00000 -0.10000 -0.20000 -0.30000 -0.30000 -0.40000 0 40 80 120 0 40 Bin Number Bin Number Figure 6. First four principal components of PSD. 4 Z. Cheng, Dynamic Human Shape Description and Characterization jog1 Principal component 2 200 150 100 50 0 -50 -100 -150 -200 magnitude magnitude jog1 Principal component 1 0.000 0.500 1.000 1.500 2.000 250 200 150 100 50 0 -50 -100 -150 -200 -250 0.000 0.500 1.000 Time (sec) 300 250 200 150 100 50 0 -50 -100 -150 -200 2 4 6 300 200 100 0 -100 -200 -300 -400 8 0 2 4 Time (sec) magnitude magnitude 200 100 0 -100 -200 -300 4 6 8 10 300 200 100 0 -100 -200 -300 -400 12 0 2 4 6 Time (sec) magnitude magnitude 2 10 12 walk1 Principal component 2 100 50 0 -50 -100 -150 -200 -250 -300 -350 1 8 Time (sec) walk1 Principal component 1 0 8 shoot1 Principal component 2 300 2 6 Time (sec) shoot1 Principal component 1 0 2.000 limp1 Principal component 2 magnitude magnitude limp1 Principal component 1 0 1.500 Time (sec) 3 4 Time (sec) 5 200 100 0 -100 -200 -300 -400 -500 0 1 2 3 4 5 Time (sec) Figure 7. Time histories of the first and second projection coefficients for four activities. Denote (3) WM {v1 v 2 v M } , where v m is the m-th principal component (eigenvector) of P . The original observations (data) can be projected onto the space defined by W , that is, (4) YMT PT WM , Where YM Y[M , N ] is the matrix of projection coefficients, each column of which corresponds to each original record, M is the dimension of a shape descriptor (M=120 for PSD), and N is the number of total shapes observed (N=1218 for the case of this paper). From Fig. 5 we can see that among the total of 120 principal components, the significant ones are less than 20. This means that instead of using the full space of dimension of M, one can construct a new space with only the significant principal components, that is, (5) WL {v1 v 2 v L }, L M , which would substantially reduce the dimension of the space. As for the case investigated in this paper, 5 Z. Cheng, Dynamic Human Shape Description and Characterization L 20 , which is much less than M 120 . Then the projection in this space is given by (6) YLT PT WL , where YL: Y[ L, N ] . Each original record can be either fully reconstructed by Eq. (4) or partially reconstructed (approximated) by Eq. (6). Usually an original record can be well approximated by its partial construction with significant principal components. This means that the original data with dimension of M can be represented by its projection coefficients with dimension of L ( L M ). In the space of reduced dimension, the problem becomes tractable, as the number of variables becomes much smaller. In fact, for the case in this paper, the two projection coefficients corresponding to the first two most significant principal components are sufficient to represent the shape dynamics for action recognition. The sequence of a projection coefficient at each frame for a particular subject in a particular action constitutes a time series, as shown in Fig. 7 for example. It is shown that the time histories of the first and second coefficient are unique with respect to each action, which can be used as the discriminators for activity recognition. 4. Activity Recognition Dynamics Based on Shape The shape dynamics of a subject during motion, as described in Section 3, can be used for activity recognition. In this paper, a data mining tool was employed to classify four activities (jog, limp, shoot, walk) based on 85 frames from each activity. Note that in the classification, each frame was treated independently rather than being placed in sequence as a time series. Five attributes were used in classification: (a) Pelvis_Velocity, the resultant velocity at the mid-pelvis location; (b) PC1, the first projection coefficient; (c) PC2, the second projection coefficient; (d) PC1_Velocity, the derivative of PC1; and (e) PC2_Velocity, the derivative of PC2. The significance of each attribute can be assessed in terms of gain ratio as given in Table 1. While Pelvis_Velocity is most significant, all five attributes are selected for classification. Various classification methods are available, such as those provided by Weka http://www.cs.waikato.ac.nz/ml/weka/. Among them, five conventional methods listed in Table 2 were chosen to use in the case study. All of them achieved classification accuracy greater than 95%, as shown in Table 2. Table 1. Attributes ranking results Table 2. Classification accuracy 5. Conclusion Based on the study of this paper, the following conclusions are in order. Shape dynamics contain the information about both body motion and shape changes and have great potential for human identification and activity recognition. Shape dynamics can be well-captured by a shape descriptor and further characterized by principal components. Human motion/action in 3-D space can be replicated or animated with high bio-fidelity, 6 Z. Cheng, Dynamic Human Shape Description and Characterization which can be used to generate the data for training a model or to evaluate the performance of a tool. Using a dynamic 3-D human shape model for human activity recognition is plausible. This approach is unique as it differs from other conventional techniques based on 2-D imagery or models. It is effective as it can overcome the shortcomings inherent in 2-D methods. As a shape descriptor, the PSD is not reversible. This means that while it can be used for analysis, as it was used in this paper, it cannot be used for shape reconstruction. Also, spatial information may not be uniquely represented in the original definition of the PSD, which can be remedied by certain treatments or modifications. It should be pointed out that the dynamic shape models used in this study were created from 3-D surface scan data and motion capture data using OpenSim and Blender. While these models provide high biofidelic description of body shape during motion, the body surface deformation may not be fully or accurately represented by these models. However, since the body shape variation induced by the articulated motion is much larger than the surface deformation, most observations and results from this paper can be reasonably postulated to be true even if the surface deformation is more precisely represented. More investigations are needed to validate this assumption. 18th International Conference Recognition (ICPR'06). on Pattern Kilner J, Guillemaut J-Y, and Hilton A, 2009. 3-D Action Matching with Key-Pose Detection. In: 2009 IEEE 12th International Conference on Computer Vision Workshops. Niebles J-C and Li F-F, 2007. A Hierarchical Model of Shape and Appearance for Human Action Classification. In: IEEE Computer Vision and Pattern Recognition (CVPR 2007). Paquet E, Rioux M, Murching A, Naveen T, and Tabatabai A, 2000. Description of shape information for 2-D and 3-D objects. Signal Processing: Image Communication 16 (2000), pp 103-122. Robinette K, 2003. An Investigation of 3-D Anthropometric Shape Descriptors for Database Mining. Ph.D. Thesis, University of Cincinnati. Acknowledgement This study was carried out under the support of a SBIR Phase I funding (FA8650-10-M-6092) provided by the US Air Force. References Blank M, Gorelick L, Shechtman E, Irani M, and Basri R, 2005. Actions as Space-Time Shapes. In: Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05). Cohen I and Li H, 2003. Inference of Human Posture by Classification of 3-D Human Body Shape, IEEE International Workshop on Analysis and Modeling of Faces and Gestures, ICCV 2003. Chu C-W and Cohen I, 2005. Posture and Gesture Recognition using 3-D Body Shapes Decomposition. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR ’05). Jin N and Mokhtarian F, 2006. A Non-Parametric HMM Learning Method for Shape Dynamics with Application to Human Motion Recognition. In: The 7
© Copyright 2024