RECOGNISING INTERACTION ACTIVITY WEI-SHI ZHENG 郑伟诗 http://sist.sysu.edu.cn/~zhwshi/ SUN YAT-SEN University 1 Introduction to My Research 2 OUTLINE 1 Introduction 2 Human-Object Interaction 3 Collective Activity 4 Summary 3 1. Background Why Learning Interaction Activity A woman drinks using a cup Human+ Object +Interactions A woman smashes a volleyball 4 1. Introduction Can you guess what is happening to them? 5 1. Introduction Are you right? 6 1. Introduction Why Learning Interaction Activity We are interacting with others everyday 7 1. Introduction The Challenges Can you detect the ball accurately? 8 1. Introduction The Challenges Different Poses 9 1. Introduction The Challenges Queuing?? Talking?? Local does not mean global ≈ Individual Action ≠ Collective Activity Talking!! Queuing!! 10 1. Introduction Our work in this presentation HUMAN -OBJECT Interaction (ICCV 2013, IEEE TCSVT 2015; CVPR 2015) Collective Activities: Less People (Less Occluded), More Interactive (IEEE TIP 2015) 11 OUTLINE 1 Introduction 2 Human-Object Interaction Jian-Fang Hu, Wei-Shi Zheng*, Jian-huang Lai, Shaogang Gong, and Tao Xiang. Exemplar-based Recognition of Human-Object Interactions. IEEE Transactions on Circuits and Systems for Video Technology (DOI: 10.1109/TCSVT.2015.2397200), 2015. (ICCV 2013) 3 Collective Activity 4 Summary 12 2. Human-Object Interaction: Exemplar Related work: Human-Object Mining object patterns from many images Detecting human and calculating their relationship. [Prest and Cordelia Schmid, TPAMI 2012] 13 2. Human-Object Interaction: Exemplar Related work: Mutual Context Using a mutual context model to jointly model interactions between objects and human poses. [Yao & Fei-Fei, TPAMI 2012] 14 2. Human-Object Interaction: Exemplar Related work: Visual Phrases Using complex visual composites and object models to make independent predictions. [Sadeghi , CVPR2011] 15 2. Human-Object Interaction: Exemplar Summary: Two popular models Geometric relationship (distance and angle) Binary patterns heavily relying on accurate and robust object detection and pose estimation. Our proposed model Pose-Object: a probability framework Including semantic information Overcoming the inaccurate detection and pose estimation 16 2. Human-Object Interaction: Exemplar Our Observations Motivation: For specific pose, the manipulated object would appear at similar relative positions. Modelling the exemplar as a Gaussian function 17 2. Human-Object Interaction: Exemplar Learned Exemplars Observations: an atomic pose can interact with two objects or even more. an object can interact with multiple atomic poses. each pair of pose and object corresponding to one exemplar. 18 2. Human-Object Interaction: Exemplar Spatial Pose-Object Interaction Using Exemplars object atomic pose 19 2. Human-Object Interaction: Exemplar Exemplar-based representation for Video 20 2. Human-Object Interaction: Exemplar Visualisation 21 2. Human-Object Interaction: Exemplar Exemplar-based HOI Representation The appearance interaction response Object Detection Score (O) Pose Appearance (P) Spatial Pose-Object Interaction (I) Contextual Information (C) Action-specific Ranking Matching Model Assign higher score to the ground truth label 22 2. Human-Object Interaction: Exemplar Action-specific Ranking Matching Model = = 23 2. Human-Object Interaction: Exemplar Evaluation Sports PPMI 24-class, 100 images for training and 100 for testing Gupta 6-classes, 300 images, 30 images for training and 20 for testing 6-classes, 60 videos SYSU Action dataset (new collection, http://sist.sysu.edu.cn/~zhwshi/students/jianfang/HomePage.htm ) 6-class, 119 videos, sitting and standing 24 2. Human-Object Interaction: Exemplar Samples of SYSU Set 25 2. Human-Object Interaction: Exemplar Evaluation of HOI descriptor Observations: Our spatial exemplar-based HOI feature has an important impact, especially on the SYSU action. The scene context is important in the two still image datasets (Sports and PPMI). The object context is important in Gupta set. 26 2. Human-Object Interaction: Exemplar Sports PPMI Gupta & SYSU Action 27 2. Human-Object Interaction: Exemplar 28 OUTLINE 1 Introduction 2 Human-Object Interaction Jian-Fang Hu, Wei-Shi Zheng*, Jian-huang Lai, Jianguo Zhang, "Jointly Learning Heterogeneous Features for RGB-D Activity Recognition," IEEE Conf. on Computer Vision and Pattern Recognition, June 2015. 3 Collective Activity 4 Summary 29 2. Human-Object Interaction: RGBD Depth-based Modelling insensitive to illumination variations invariant to color and texture changes reliable for estimating body silhouette and skeleton (human posture) Kinect Device Example sequence captured by Kinect RGB Depth Skeleton 30 2. Human-Object Interaction: RGBD Related work: Depth-based Descriptor Discriminative Depth Descriptors Comment: Do not use HON4D [Oreifej, CVPR2013] color information to facilitate recognition Super normal [Yang,CVPR2014] Actionlet [Wang, TPAMI2014] Range Sample [Lu, CVPR2014] …… 31 2. Human-Object Interaction: RGBD Related work: RGB-D based Fusion of RGB features and Depth features Deep Learning[Shao, IJCAI, 2013], Random Forests[Zhu, CVPRW, 2013], Structured sparsity model [Shahroudy, ISCCSP2014] ……. Comment: Ignore the connections between RGB and Depth channels 32 2. Human-Object Interaction: RGBD Our Observations The RGB and Depth channel do share some structures. Heterogenous features extracted from the Depth and RGB channel should be jointly learned with shared and specific structures considered. 33 2. Human-Object Interaction: RGBD Our Joint learning model Fig. A graphic illustration of our joint learning framework 34 2. Human-Object Interaction: RGBD Model Formulation (JOULE) --- joint heterogeneous features learning Reconstruction Loss Regularization term Prediction Loss S ( (W0 Wi ) i X i Y T min W0 ,{Wi },{ i } i 1 2 X i i i X i T T F 2 F Wi F ) W0 2 s.t., i i I , i 1,2,..., S T Orthogonality constrains Jointly learning the shared and specific structure among different features. Jointly learning shared structure across different activity classes. Learning structures guided by recognition 35 2 F 2. Human-Object Interaction: RGBD Model Optimization Propose a three-step iterative optimization algorithm. Until convergence! 36 2. Human-Object Interaction: RGBD Experiments MSR Daily CAD 60 16 activities, 320 video clips, 10 participants 14 activities, 68 video clips, 4 participants SYSU 3D HOI (new collection, will release soon): 12 HOI activities, 480 video clips, 40 participants 37 2. Human-Object Interaction: RGBD Samples of SYSU Set More objects to manipulate; more participants; more similar motions 38 2. Human-Object Interaction: RGBD Results on MSRD Set 39 2. Human-Object Interaction: RGBD Results on CAD60 40 2. Human-Object Interaction: RGBD Results on SYSU 3D HOI Set Setting-1, half samples of each activity class were selected for training and the rest for testing. Setting-2 is the cross-subject setting. 41 2. Human-Object Interaction: RGBD Parameter Evaluation Fig. Effects of parameter subspace dimension on the system performance. 42 OUTLINE 1 Introduction 2 Human-Object Interaction 3 Collective Activity Xiaobin Chang, Wei-Shi Zheng*, and Jianguo Zhang. Learning Person-Person Interaction in Collective Activity Recognition. IEEE Transactions on Image Processing, vol. 24, no. 6, pp. 1905-1918, 2015. 4 Summary 43 3. Learning Person–Person Interaction Related Work: Spatial Temporal Model Choi et al 09’ICCVW 1.Capturing the Spatial Distribution of Collective Activity. 2.Capturing the Temporal Variation of the Spatial Distribution . 44 3. Learning Person–Person Interaction Related Work: Spatial Temporal Model Choi et al 11’ CVPR Capturing the Spatial Temporal Information and finding out the most Discriminative ones for Collective Activity Recognition as well. 45 3. Learning Person–Person Interaction Related Work: Hierarchical Model Lan et al 12’ TPAMI 1.Collective Activity is based on the action of each person. 2.The connections among people can be inferred as latent variables. 46 3. Learning Person–Person Interaction Related Work: Hierarchical Model Amer et al 13’ICCV 1. Three layers are used. They are Object level, Individual Action level and Collective Activity Level, from bottom to top. 2. An And-Or Graph is used for modelling these three layers. 47 3. Learning Person–Person Interaction Related Work: Interactive Phrase Kong et al 13’TPAMI 1.Describing the person-person interaction by capturing the interaction patterns by exploiting motion relationships between body parts. 2. The interaction is inexplicitly captured by the model. 48 3. Learning Person–Person Interaction Related Work: Interactive Descriptor Tran et al 14’ Pattern Recognition Letters A Descriptor called LGA is used to capture the interactions among people for Collective Activity Recognition. 49 3. Learning Person–Person Interaction Related Work: Combine Model Choi et al 14’TPAMI 1. This model combines Different Tasks together: Collective Activity Recognition, Interaction Recognition, Individual Action Recognition, as well as Multi-people Tracking. 2. It believes different tasks can benefit from each others during learning procedure. 3.Hard to be optimised & Required many manual labels. 50 3. Learning Person–Person Interaction A Complete Learning Approach Spatial-Temporal Focused on Modeling Feature Person-Person Interaction Of Each Person’s Action Two connected atomic activities in one collective activity are either: 1) quite similar and spatially close to each other to form a meaningful collective activity (e.g. two people are walking together); 2) not quite similar but are strongly interacting to each other (e.g. facing each other when two people are talking, or fighting). Short Video Clips (~15 frames) 51 3. Learning Person–Person Interaction Inference person-person interaction under the collective activity m Interaction Response: Summarising all the person-person interactions Inference: The Interaction Response should be maximised under the ground-truth collective activity 52 3. Learning Person–Person Interaction Learning Matrix Factorisation: Advantages: 1. More Effective 2. Low-rank Representation - log det regularisation Term: Lm is of full rank. (Avoid the redundant problem) 53 3. Learning Person–Person Interaction Multi-task Extension Different Collective Activities are Different but Related. Class-Specific: Global Interaction is different; Shared Aspects: Local Interaction is sometimes similar Person Actions(standing, walking), 54 Spatial Distribution, etc. 3. Learning Person–Person Interaction Multi-task Extension α controls the balance between shared parameter and classspecific parameter in Ωm for modelling Class-Specific Information for modelling Shared Information 55 3. Learning Person–Person Interaction Multi-task Extension Gradient Descent & Interactive Optimization 56 3. Learning Person–Person Interaction Two Benchmark Datasets 1. Collective Activity Dataset (CAD) 44 video sequences; 5 activities (crossing, waiting, queuing, walking, talking); Exp. Setting: random splits 1/4 of the dataset for testing and the rest for training. 2. Choi’s Dataset 32 video sequences; 6 activities(gathering, talking, dismissal, walking together, chasing, and queuing); Exp. Setting: the standard experimental protocol of the 3-fold cross validation. 57 3. Learning Person–Person Interaction Visualization of the Learned Ω CAD dataset: Choi’s dataset: 58 3. Learning Person–Person Interaction Results On Two Benchmark Datasets CAD: Choi’s Dataset: 59 3. Learning Person–Person Interaction Parameter Evaluations: The impacts of α and d on CAD The impacts of α and d on Choi’s Dataset α varies from 0.1~0.9; d = {32,64,96,128,256,384} 60 3. Learning Person–Person Interaction Effect of logdet regularization The impact of β on both datasets The performances fall obviously without –logdet regularization(β = 0); The performances become stable when β = 0.3 61 OUTLINE 1 Introduction 2 Human-Object Interaction 3 Collective Activity 4 Summary 62 Summary Two types of interaction analysis are discussed HUMAN-OBJECT COLLECTIVE ACTIVITY Present Three Learning Models as well Action Specific Ranking Heterogeneous Fusion Person-Person Interaction Learning Contribute Two New Databases 63 Summary References Xiaobin Chang (student), Wei-Shi Zheng*, and Jianguo Zhang. Learning Person-Person Interaction in Collective Activity Recognition. IEEE Transactions on Image Processing, vol. 24, no. 6, pp. 1905-1918, 2015. Jian-Fang Hu (student), Wei-Shi Zheng*, Jian-huang Lai, Shaogang Gong, and Tao Xiang. Exemplar-based Recognition of Human-Object Interactions. IEEE Transactions on Circuits and Systems for Video Technology (DOI: 10.1109/TCSVT.2015.2397200), to appear, 2015. Jian-Fang Hu (student), Wei-Shi Zheng*, Jian-huang Lai, Jianguo Zhang, "Jointly Learning Heterogeneous Features for RGB-D Activity Recognition," IEEE Conf. on Computer Vision and Pattern Recognition, June 2015. 64 感谢 感谢我的学生积极参与这方面的研究 胡建芳 常晓斌 特别感谢董乐老师,祝您生日快乐! 感谢VALSE ONLINE提供的机会! 感谢大家在复活节假期参与这个报告! Q&A 65
© Copyright 2025