A Coarse-to-Fine Estimation Method for Spatial Layout of Indoor Scenes
LIU Tianliang1, GU Yanqiu1, CAO Dandan1, DAI Xiubin1, LUO Jiebo2
1. Jiangsu Provincial Key Lab of Image Processing and Image Communication, Nanjing University of Posts and Telecommunications, Nanjing 210003, China;
2. Department of Computer Science, University of Rochester, Rochester 14627, USA
Abstract:A coarse-to-fine estimation method for spatial layout is presented to effectively label the layout relationship of indoor scenes. Firstly, the adaptive threshold detection method with local discontinues is exploited to acquire the long straight lines of the given scene, which are splitted into the vertical lines and horizontal ones in terms of the corresponding directions. The vertical and horizontal vanishing points are estimated based on the vote mechanism and orthogonality principle, and the pairs of the rays led from two vanishing points at equal angular interval are used to generate the candidates of the given scene layout. Next, the informative edge and geometric context of the given scene are estimated with VGG-16 full convolution neural network, and the softmax classifier is applied to deciding the given fc7 features to obtain the layout category, while the global features merged with the informative edge and layout category are generated to roughly select the layout candidates. Then, the normal vector and depth map of the given scenes are estimated with the VGG-based spatial multi-scale convolution neural network to extract the related normal vector and geometric depth feature. And next, the 3D box spatial layout model can be parameterized by the angles between the rays from vanishing points, while the line membership, geometric context, normal vector and depth feature are accumulated via geometric integral image to extract the regional features of layout candidates, and the structural model parameter can be learned with cutting-plane method. Finally, the layout candidate with the highest structural prediction score is selected as the final spatial layout. Experimental results on the Hedau and LSUN datasets demonstrate that the presented method can obtain more accurate number of divided polygons and more precise boundary positions of spatial layout.
[1] 姚拓中,左文辉,宋加涛,等.结合物体先验和空域约束的室内空域布局推理[J]. 自动化学报,2017,43(8):1402-1411.Yao T Z, Zuo W H, Song J T, et al. Estimating spatial layout of cluttered rooms by using object prior and spatial constraints[J]. Acta Automatica Sinica, 2017, 43(8):1402-1411.
[2] 庄严,卢希彬,李云辉. 移动机器人基于三维激光测距的室内场景认知[J]. 自动化学报,2011,37(10):1232-1240.Zhuang Y, Lu X B, Li Y H. Mobile robot indoor scene cognition using 3D laser scanning[J]. Acta Automatica Sinica, 2011, 37(10):1232-1240.
[3] Hedau V, Hoiem D, Forsyth D. Recovering the spatial layout of cluttered rooms[C]//IEEE International Conference on Computer Vision. Piscataway, USA:IEEE, 2009:1849-1856.
[4] Hoiem D, Efros A A, Hebert M. Geometric context from a single image[C]//IEEE International Conference on Computer Vision. Piscataway, USA:IEEE, 2005:654-661.
[5] Lee D C, Hebert M, Kanade T. Geometric reasoning for single image structure recovery[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2009:2136-2143.
[6] Ramalingam S, Pillai J K, Jain A, et al. Manhattan junction catalogue for spatial reasoning of indoor scenes[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2013:3065-3072.
[7] Zhang J, Kan C, Schwing A G, et al. Estimating the 3D layout of indoor scenes and its clutter from depth sensors[C]//IEEE International Conference on Computer Vision. Piscataway, USA:IEEE, 2013:1273-1280.
[8] Wang H, Gould S, Roller D. Discriminative learning with latent variables for cluttered indoor scene understanding[J]. Communications of the ACM, 2010, 56(4):92-99.
[9] Schwing A G, Hazan T, Pollefeys M, et al. Efficient structured prediction for 3D indoor scene understanding[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2012:2815-2822.
[10] Mallya A, Lazebnik S. Learning informative edge maps for indoor scene layout prediction[C]//IEEE International Conference on Computer Vision. Piscataway, USA:IEEE, 2015:936-944.
[11] Rother C. A new approach to vanishing point detection in architectural environments[J]. Image and Vision Computing, 2002, 20(9/10):647-655.
[12] 吴培良,李亚南,杨芳,等. 一种基于CLM的服务机器人室内功能区分类方法[J].机器人,2018,40(2):188-194.Wu P L, Li Y N, Yang F, et al. A CLM-based method of indoor affordance areas classification for service robots[J]. Robot, 2018, 40(2):188-194.
[13] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[C]//IEEE International Conference on Neural Information Processing Systems. Piscataway, USA:IEEE, 2012:1097-1105.
[14] Dasgupta S, Fang K, Chen K, et al. DeLay:Robust spatial layout estimation for cluttered indoor scenes[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2016:616-624.
[15] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2015:3431-3440.
[16] Department of Computer Science, Princeton University. LSUN large-scale scene understanding challenge, room layout estimation dataset[DB/OL]. (2017-01-11)[2017-12-07].http://lsun.cs.princeton.edu/leaderboard/#roomlayout.
[17] Eigen D, Fergus R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture[C]//IEEE International Conference on Computer Vision. Piscataway, USA:IEEE, 2015:2650-2658.
[18] Tsochantaridis I, Joachims T, Hofmann T, et al. Large margin methods for structured and interdependent output variables[J]. Journal of Machine Learning Research, 2005, 6(2):1453-1484.
[19] Lee D C, Gupta A, Hebert M, et al. Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces[C]//Annual Conference on Neural Information Processing Systems. Vancouver, Canada:Springer, 2010:1288-1296.