Pedestrian Detection and Tracking Lihi Zelnik-Manor Video Analysis Course Summer 2008 Credits • Some slides were adapted from Payam Sabzmeydani and Greg Mori • Some slides were adapted from Deva Ramanan Problem Classify a window as pedestrian or non-pedestrian Search exhaustively the scale-space image Viola, Jones & Snow • www.merl.com/projects/pedestrian/ Recall: Viola & Jones features Motion features Shift operators Differences between shifted images: Motion features Motion features ∆ U L R D Features type 3 Sums within a filter. Captures: Motion magnitude f j = φ j (U , L, R, D ) U Features type 1 Sum differences across filters Captures: Likelihood of region moving in U,L,R,D direction f i = ri (∆ ) − ri (U , L, R, D ) ∆ U Features type 2 Differences within a filter. Captures: Motion shear. f j = φ j (U , L, R, D ) U Features type 4 (appearance) Differences on input frame Captures: Appearance f j = φ j (U , L, R, D ) Features & Classifiers Features: Weak classifiers: Parameters are learned via ada-boost Details Cascade classifier for speed-up Dataset and training 8 video sequences ~2000 frames each 6 for training 2 for testing 2250 positive examples 2250 negative examples Pedestrian = 20x15 box Top filters The first 5 filters learned for the dynamic pedestrian detector. Top appearance filters The first 5 filters learned using appearance only Results Comparing dynamic classifier and appearance classifier Test sequence 1 Test sequence 2 Appearance based detection Motion + appearance based detection Using a single frame Different cues Wavelet coefficients (Mohan et al., PAMI 2001) Oriented gradients (Dalal and Triggs, CVPR 2005) SIFT features (Leibe et al., CVPR 2005) Edgelet features (Wu and Nevatia, ICCV 2005) “Shapelet features” (Sabzmeydani and Mori, CVPR 2007) Datasets • MIT : Standing pose, simple background, no occlusion • INRIA : Standing pose, complex background, partial occlusions 22 Dalal & Triggs, CVPR’05 Concatenated histograms of local gradients SVM Dense sampling Claim: “none of the keypoint detectors that we are aware of detect human body structures reliably” Dalal & Triggs, CVPR’05 average gradient image maximum maximum positive negative SVM SVM weight weight HOG HOG descriptor descriptor weighted weighted by by positive negative HOG weights descriptor weights Wu & Nevatia, ICCV’05 • Edgelet features: short line and curve segments • AdaBoost Sabzmeydani and Mori, CVPR’07 • Shapelet features: combinations of short line and curve segments • AdaBoost Start from smoothed gradient responses in different directions Shapelet features Shapelet final classifier Results on INRIA dataset People detection in video • Tracking: • Background Subtraction • Condensation • Explicit Motion Models Ramanan & Forsyth Tracking People by Learning their Appearance Look for candidate torsos Using a template rectangle detector Cluster torsos Final torso detector Find arms and legs Using a template rectangle detector near detected torsos + Clustering (as for torso) Results Weaknesses • The clustering step only works for sequences where : • limbs are reliably found by low-level detectors • limbs look different from the background. • If the algorithm produces bad clusters, the resulting appearance models will produce poor tracks. Detect whole person Stylized pose person detector Person model Build Model & Detect Detection Detect people by sampling from a one-leg, one-arm pictorial structure Results Results Ramanan et al. summary • Strengths • Works in spite of camera motion • Robust to drift • Auto-initializing • Weaknesses • Not applicable to real-time applications • Makes use of many heuristics (too many?) • May have problems dealing with lighting changes
© Copyright 2024