Research Statement in Computer Vision Ying Xiong School of Engineering and Applied Sciences Harvard University [email protected] April 17th, 2015 Low-level Vision by Consensus in a Spatial Hierarchy of Regions [1, 2] (a) (b) θp,Ip τp, Dp(·) + ϕp,ep Planar Non-planar (c) (d) k=3 Regions Pk k=3 k=4 k=2 k=2 k=1 k=1 Scene Map Z(n) n n Figure 1: Consensus framework for low-level vision, using binocular stereo as an example. (a) Window-based stereo matching with a slanted-plane model reduces ambiguity, but it requires guessing the correct window shapes and sizes throughoutt he scene. Consensus addresses this by explicitly considering all regions at all locations (depicted as a 2D cartoon (b) and in 1D organized by scale (c)). It reasons simultaneously about which regions are inliers to the slantedplane model and the correct slanted plane for each inlying region. The regional slanted planes must agree where they overlap, and in the objective this implies high-order factors that link the variables of thousands of regions that overlap each pixel (blue in (b) and (c)). When regions are organized hierarchically (red/pink in (b)), optimization becomes parallel and efficient. (d) The result is a distributed architecture, with computational units that iteratively perform the same set of computations and share information sparsely between parents and children. The framework can be applied to a variety of low-level tasks using a variety of regional models. We introduce a multi-scale framework for low-level vision, where the goal is estimating physical scene values from image data—such as depth from stereo image pairs. The framework uses a dense, overlapping set of image regions at multiple scales and a “local model,” such as a slanted-plane model for stereo disparity, that is expected to be valid piecewise across the visual field. Estimation is cast as optimization over a dichotomous mixture of variables, simultaneously determining which regions are inliers with respect to the local model (binary variables) and the correct co-ordinates in the local model space for each inlying region (continuous variables). When the regions are organized 1 into a multi-scale hierarchy, optimization can occur in an efficient and parallel architecture, where distributed computational units iteratively perform calculations and share information through sparse connections between parents and children. The framework performs well on a standard benchmark for binocular stereo, and it produces a distributional scene representation that is appropriate for combining with higher-level reasoning and other low-level cues. Log-Likelihood From Shading to Local Shape [9, 10] Figure 2: We infer from a Lambertian image patch a concise representation for the distribution of quadratic surfaces that are likely to have produced it. These distributions naturally encode different amounts of shape information based on what is locally available in the patch, and can be unimodal (row 2 & 4), multi-modal (row 3), or near-uniform (row 1). This inference is done across multiple scales. We develop a framework for extracting a concise representation of the shape information available from diffuse shading in a small image patch. This produces a mid-level scene descriptor, comprised of local shape distributions that are inferred separately at every image patch across multiple scales. The framework is based on a quadratic representation of local shape that, in the absence of noise, has guarantees on recovering accurate local shape and lighting. And when noise is present, the inferred local shape distributions provide useful shape information without over-committing to any particular image explanation. These local shape distributions naturally encode the fact that some smooth diffuse regions are more informative than others, and they enable efficient and robust reconstruction of object-scale shape. Experimental results show that this approach to surface reconstruction compares well against the state-of-art on both synthetic images and captured photographs. 2 Consumer Cameras as Radiometric Devices [14, 3, 4] 0.7 0.6 RAW Blue 0.5 0.4 0.3 (220,244,248) (255,255,255) (237,253,253) 0.2 (161,226,246) 0.1 0.0 0.8 Raw 0.6 Green 0.4 (199,249,212) (250,252,203) (255,255,145) (255,204,252) (189,210,83) (255,187,159) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 RAW Red Figure 3: Clusters of RAW measurements that each map to a single JPEG color value (indicated in parentheses) in a digital SLR camera (Canon EOS 40D). Close-ups of the clusters emphasize the variations in cluster size and orientation. When inverting the tone-mapping process, this is structured uncertainty that cannot be avoided. To produce images that are suitable for display, tone-mapping is widely used in digital cameras to map linear color measurements into narrow gamuts with limited dynamic range. This introduces non-linear distortion that must be undone, through a radiometric calibration process, before computer vision systems can analyze such photographs radiometrically. This paper considers the inherent uncertainty of undoing the effects of tone-mapping. We observe that this uncertainty varies substantially across color space, making some pixels more reliable than others. We introduce a model for this uncertainty and a method for fitting it to a given camera or imaging pipeline. Once fit, the model provides for each pixel in a tone-mapped digital photograph a probability distribution over linear scene colors that could have induced it. We demonstrate how these distributions can be useful for visual inference by incorporating them into estimation algorithms for a representative set of vision tasks. 3 Blind Estimation and Low-rate Sampling of Sparse MIMO Systems [11] Multipath environment Figure 4: A MIMO system where I sources and L sensors are linked through a collection of LI channels {hi,` (t)}. We present a blind estimation algorithm for multi-input and multi-output (MIMO) systems with sparse common support. Key to the proposed algorithm is a matrix generalization of the classical annihilating filter technique, which allows us to estimate the nonlinear parameters of the channels through an efficient and noniterative procedure. An attractive property of the proposed algorithm is that it only needs the sensor measurements at a narrow frequency band. By exploiting this feature, we can derive efficient sub-Nyquist sampling schemes which significantly reduce the number of samples that need to be retained at each sensor. Numerical simulations verify the accuracy of the proposed estimation algorithm and its robustness in the presence of noise. Fabricating BRDFs at High Spatial Resolution Using Wave Optics 2 Fabricating BRDFs Wave Anat Levin1 Daniel Glasner1 Using Ying Xiong Fr´eOptics do Durand3[5]William Freeman3 1 Wafer Weizmann Institute 2 Designed reflectance Harvard 3 MIT CSAIL Horizontal illumination Wojciech Matusik3 Todd Zickler2 Vertical illumination Figure 1: First column: a wafer fabricated using photolithography displaying spatially varying BRDFs at 220dpi. Second column: designed Figure First column: a to wafer using photolithography spatially varying BRDFs at 220dpi. pattern, 5: color coded according the 5fabricated reflectance functions used. Dithering is displaying exaggerated for better visualization. Rightmost columns: fabricatedcolumn: pattern as imaged under two different illumination directions. designed pattern includes anisotropic reflectance functions at Second designed pattern, color coded according to theThe 5 reflectance functions used. Dithering is exaggerated two opposite orientations. Hence, the image is inverted when light moves from the horizontal to the vertical directions. for better visualization. Rightmost columns: fabricated pattern as imaged under two different illumination directions. The designed pattern includes anisotropic reflectance functions at two opposite orientations. Hence, the image cations, including printing, product design, luminaire design, is seAbstract inverted when light moves from the horizontal to the vertical directions. curity markers visible under certain illumination conditions, and Recent attempts to fabricate surfaces with custom reflectance functions boast impressive angular resolution, yet their spatial resolution is limited. In this paper we present a method to construct spatially varying reflectance at a high resolution of up to 220dpi , orders of magnitude greater than previous attempts, albeit with a 4 lower angular resolution. The resolution of previous approaches is limited by the machining, but more fundamentally, by the geometric optics model on which they are built. Beyond a certain scale geometric optics models break down and wave effects must be taken into account. We present an analysis of incoherent reflectance based on wave optics and gain important insights into reflectance design. We further suggest and demonstrate a practical method, which takes into account the limitations of existing micro-fabrication techniques many others. The topic has been gaining much research interest in computer graphics [Weyrich et al. 2009; Finckh et al. 2010; Papas et al. 2011; Kiser et al. 2012; Dong et al. 2010; Haˇsan et al. 2010; Matusik et al. 2009; Patow and Pueyo 2005; Patow et al. 2007; Weyrich et al. 2007; Malzbender et al. 2012]. In computer vision, photometric stereo algorithms can be improved if the surface reflectance properties can be precisely controlled [Johnson et al. 2011]. Custom designed BRDFs can also help in appearance acquisition tasks such as a BRDF chart [Ren et al. 2011] and a planar light probe [Alldrin and Kriegman. 2006]. Recent attempts to fabricate surfaces with custom reflectance functions boast impressive angular resolution, yet their spatial resolu- Recent attempts to fabricate surfaces with custom reflectance functions boast impressive angular resolution, yet their spatial resolution is limited. In this paper we present a method to construct spatially varying reflectance at a high resolution of up to 220dpi , orders of magnitude greater than previous attempts, albeit with a lower angular resolution. The resolution of previous approaches is limited by the machining, but more fundamentally, by the geometric optics model on which they are built. Beyond a certain scale geometric optics models break down and wave effects must be taken into account. We present an analysis of incoherent reflectance based on wave optics and gain important insights into reflectance design. We further suggest and demonstrate a practical method, which takes into account the limitations of existing micro-fabrication techniques such as photolithography to design and fabricate a range of reflection effects, based on wave interference. Sparse Representation for Face Recognition [8, 7] Sparse Representation based Classification (SRC) has emerged as a new paradigm for solving recognition problems. This paper presents a constraint sampling feature extraction method that improves the SRC recognition rate. The method combines texture and shape features to significantly improve the recognition rate. Tests show that the combined constraint sampling and facial alignment achieves very high recognition accuracy on both the AR face database (99.52%) and the CAS-PEAL face database (99.54%). # !" SIFT for 3D Object [13, 12] !"#$#%&' (#) Figure 6: Clustering results of a plane sampled on Gaussian Sphere # $#*!"+#$%&'()*+ %& ' ( $#)* +, . / & 0'/ . , +* ,1 234* 3 0., 3 54* . 6 1 0 73 +, , & 3 0 849. / . Because of its invariance to image scaling, rotation, noise and illumination changes, SIFT algorithm is nowadays widely applied in many aspects of the image study. The feature used by this algorithm (SIFT feature) is a vector based8B " /;()]]V,-^ ,7 - %C^ ,,--." + / ( which ./) 0 1a0 314 2 such as FCG on local gradient, can! resist lot2 of"image variation extension, compression and rotation, meeting the " 01`_`aRbK " 4'ILcdabcRR _feature practical requirements 3D" 4 object recognition. Applying in view space partition, cutting the object from 56789#/( 3 - /of) 56 :"+) 7 / SIFT S67 " A B 0 1 e f g h 8 d M T i 3 j c R ;89<= " : 3 ; 5 < > 6 # ? @ 5 , 7 - # A 8B %C^_-RSQ " a4klmP5 ! B " C=!D E - 5 > 2 5 F G ? & ' $ & ' G @ 5 4e67nGopMT-M/;q[r:op % " A42 5 & ' , * B C " / ( H - ) - 7 . :;< fmsKg-klfm5/t- ! 4'01rPU DIE.B C $ F C G " ( : H I J K / H - 6 # LhX.()Ai j k l - m a n " o H J 3 p 5 JK? - / 6 7 " L - M N % ! 4 L : M / M % N -qurv ! [\ K q u s t E u v n G w x y " g Object Color Models from Multi-view Constraints its background and pattern matching can effectively enhance the robustness of the system and improve its speed and efficiency. With theoretical analysis and experimental verification, we prove the feasibility and advantages of applying 2 2 Saenko1 , Ayan , Ying Xiong , Todd Zickler2 and Trevor Darrell1 SIFTChakrabarti feature in 3D object recognition system. 1 UC Berkeley, 2 Harvard University 2 vor}@eecs.berkeley.edu {ayanc@eecs,yxiong@seas,zickler@seas}.harvard.edu Learning Object Color Models from Multi-view Constraints [6] stract hly discriminative for many obs difficult to infer from unconlluminant is not known. Tradistancy can improve surface reh uncalibrated images, but their on the background scene. In eval applications, we have acain multiple views of the same nts; we show in this paper that hese images provide important e color constancy. We introonstancy problem, and present ates of underlying surface reimation of these surface propresent in multiple images. The rrespondences obtained by varnd we show examples based on es. Our results show that multiantly improve estimates of both Figure 1. Top row: An object in three different environments with ct color (surface reflectance) Figure 7: Top row: An distinct object inand three differentilluminants. environments with distinct andlocal unknown illuminants. Bottom rows: unknown Bottom rows: Five regions e single-view method. Five local regions from the same object,object, extracted from five uncontrolled images images like those from the same extracted from five uncontrolled likeabove, demonstrating the extent of the variation inthose observed color. Our goal is tothe jointly infer true and the unknown illuminant above, demonstrating extent ofthe theobject’s variation incolors observed in each image. color. Our goal is to jointly infer the object’s true colors and the unknown illuminant in each image. Color is known to be highly discriminative for many object recognition tasks, but is difficult to infer from unconhat color is verytrolled discriminative images in which the illuminant is not known. Traditional methods for color constancy can improve surface and detection tasks, contemporeflectance estimates from such uncalibrated images, but their output depends significantly on the background scene. of surface reflectances intoa image scene.sets This In manyfrom recognition retrieval applications, we have access thatmeans containthat multiple views of the same to learn color models im- andtribution a color model that is learned for a particular object can be object in different environments; we show in this paper that correspondences between these images provide important ed conditions where the illumiconstraints that can improve color constancy. We introduce the multi-view color constancy problem, and present a heavily influenced by the background scenes that happen to cause illuminant color can have method to recover estimates of underlying surface reflectance based on joint estimation of these surface properties and appear during training, and again, its utility is limited. d image color (see Fig. 1), and the illuminants present in multiple images. The method can exploit image correspondences obtained by various alignnown, the learned color model and we show Thisexamples paper exploits the simple thatOur when ment techniques, based on matching localobservation region features. results show that multiview minative utility constraints in new images. learning object appearance models from uncontrolled can significantly improve estimates of both scene illuminants and object colorim(surface reflectance) when compared a baseline single-view with this difficulty is totocomagery, onemethod. often has more than one training view avail- ects by applying a traditional o each training image. This aphowever, because methods for cy—including various forms of 11], gamut mapping [9], and ly on prior models for the dis- able. Figure 1 shows an example where images of an object are acquired in distinct environments with unknown lighting. When images like these are available, we can estab6 lish region-level correspondences between images and use multi-view color constraints to simultaneously improve our object color model and our estimates of the illuminant color 169 References [1] Ayan Chakrabarti, Ying Xiong, Steven J Gortler, and Todd Zickler. Low-level vision by consensus in a spatial hierarchy of regions. arXiv preprint arXiv:1411.4894, 2014. [2] Ayan Chakrabarti, Ying Xiong, Steven J. Gortler, and Todd Zickler. Low-level vision by consensus in a spatial hierarchy of regions. In Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on. IEEE, 2015. [3] Ayan Chakrabarti, Ying Xiong, Baochen Sun, Trevor Darrell, Daniel Scharstein, Todd Zickler, and Kate Saenko. Modeling radiometric uncertainty for vision with tone-mapped color images. arXiv preprint arXiv:1311.6887, 2013. [4] Ayan Chakrabarti, Ying Xiong, Baochen Sun, Trevor Darrell, Daniel Scharstein, Todd Zickler, and Kate Saenko. Modeling radiometric uncertainty for vision with tone-mapped color images. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 36(11):2185–2198, November 2014. [5] Anat Levin, Daniel Glasner, Ying Xiong, Frédo Durand, William Freeman, Wojciech Matusik, and Todd Zickler. Fabricating brdfs at high spatial resolution using wave optics. ACM Transactions on Graphics (TOG), 32(4):144, 2013. [6] Trevor Owens, Kate Saenko, Ayan Chakrabarti, Ying Xiong, Todd Zickler, and Trevor Darrell. Learning object color models from multi-view constraints. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 169–176. IEEE, 2011. [7] Jing Wang, Guangda Su, Ying Xiong, Jiansheng Chen, Yan Shang, Jiongxin Liu, and Xiaolong Ren. Sparse representation for face recognition based on constraint sampling and face alignment. Tsinghua Science and Technology, 1:011, 2013. [8] Ying Xiong. Face recognition algorithm based on lasso. Undergraduate thesis at Tsinghua University, 2010. [9] Ying Xiong, Ayan Chakrabarti, Ronen Basri, Steven J Gortler, David W Jacobs, and Todd Zickler. From shading to local shape. arXiv preprint arXiv:1310.2916, 2013. [10] Ying Xiong, Ayan Chakrabarti, Ronen Basri, Steven J Gortler, David W Jacobs, and Todd Zickler. From shading to local shape. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2014 (to appear). [11] Ying Xiong and Yue M Lu. Blind estimation and low-rate sampling of sparse mimo systems with common support. In Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, pages 3893–3896. IEEE, 2012. [12] Ying Xiong and Huimin Ma. Extraction and application of 3d object sift feature. Journal of Image and Graphics, 5:018, 2010. [13] Ying Xiong and Hui min Ma. Application of sift feature in 3d object recognition. In Conference on Image and Graphics Technology and Applications (IGTA), 2009. [14] Ying Xiong, Kate Saenko, Trevor Darrell, and Todd Zickler. From pixels to physics: Probabilistic color derendering. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 358–365. IEEE, 2012. 7
© Copyright 2024