A Fast Approach for Multi-Modality Surgical Trajectory Segmentation with Unsupervised Deep Learning
XIE Jiexin1,2, ZHAO Hongfa1,2, SHAO Zhenzhou1,3, SHI Zhiping1,2, GUAN Yong1,3
1. Information Engineering College, Capital Normal University, Beijing 100048, China;
2. Beijing Advanced Innovation Center for Imaging Technology, Beijing 100048, China;
3. Beijing Key Laboratory of Light Industrial Robot and Safety Verification, Capital Normal University, Beijing 100048, China
Abstract:Traditional trajectory segmentation approaches for surgical robot are time consuming, low-accuracy and prone to over-segmentation. For those problems, a multi-modality surgical trajectory segmentation approach is proposed based on DCED-Net (densely-concatenated convolutional encoder-decoder network) feature extraction network. DCED-Net adopts an unsupervised approach and a densely-concatenated structure, and the time consuming manual annotation is not required. Therefore, the image information can be transferred more effectively between convolutional layers, and the quality of extracted features is improved. The kinematic data and video data obtained after feature extraction are input into a transition state clustering (TSC) model to get pre-segmentation results. To further improve the segmentation accuracy, a post-merger processing algorithm based on the similarity between trajectory segments is proposed. By measuring four similarity indicators between trajectory segments, including principal component analysis, mutual information, data center distance, and dynamic time warping, the segments with high similarity are iteratively merged to reduce the impact of over-segmentation. A lot of experiments on the public data set JIGSAWS show that the proposed approach can increase the segmentation accuracy by up to 48.4% and accelerate the segmentation speed by more than 6 times, compared with the classical trajectory segmentation and clustering methods.
[1] Barbash G I, Glied S A. New technology and health care costs-The case of robot-assisted surgery[J]. New England Journal of Medicine, 2010, 363(8):701-704.
[2] Reiley C E, Lin H C, Varadarajan B, et al. Automatic recognition of surgical motions using statistical modeling for capturing variability[M]//Studies in Health Technology and Informatics, Vol.132. Amsterdam, Netherlands:IOS Press, 2008:396-401.
[3] Favaro A, Lad A, Formenti D, et al. Straight trajectory planning for keyhole neurosurgery in sheep with automatic brain structures segmentation[M]//Proceedings of SPIE, Vol.10135. Bellingham, USA:SPIE, 2017:UNSP 101352E.
[4] Murali A, Sen S, Kehoe B, et al. Learning by observation for surgical subtasks:Multilateral cutting of 3D viscoelastic and 2D orthotropic tissue phantoms[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2015:1202-1209.
[5] Lin H C, Shafran I, Yuh D, et al. Towards automatic skill evaluation:Detection and segmentation of robot-assisted surgical motions[J]. Computer Aided Surgery, 2006, 11(5):220-230.
[6] Reiley C E, Hager G D. Task versus subtask surgical skill evaluation of robotic minimally invasive surgery[C]//12th International Conference on Medical Image Computing and ComputerAssisted Intervention. Berlin, Germany:Springer, 2009:435-442.
[7] Ahmidi N, Gao Y, Béjar B, et al. String motif-based description of tool motion for detecting skill and gestures in robotic surgery[C]//16th International Conference on Medical Image Computing and Computer-Assisted Intervention. Berlin, Germany:Springer, 2013:26-33.
[8] Lee S H, Suh I H, Calinon S, et al. Autonomous framework for segmenting robot trajectories of manipulation task[J]. Autonomous Robots, 2015, 38(2):107-141.
[9] Krishnan S, Garg A, Patil S, et al. Transition state clustering:Unsupervised surgical trajectory segmentation for robot learning[J]. International Journal of Robotics Research, 2017, 36(13-14):1595-1618.
[10] Murali A, Garg A, Krishnan S, et al. TSC-DL:Unsupervised trajectory segmentation of multi-modal surgical demonstrations with deep learning[C]//IEEE International Conference on Robotics and Automation. Piscataway, USA:IEEE, 2016:4150-4157.
[11] Huang G, Liu Z, van der Maaten L, et al. Densely connected convolutional networks[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2017:2261-2269.
[12] Zhang Y L, Tian Y P, Kong Y, et al. Residual dense network for image super-resolution[EB/OL]. (2018-02-24)[2018-06-01]. https://arxiv.org/pdf/1802.08797.
[13] Odena A, Dumoulin V, Olah C. Deconvolution and checkerboard artifacts[J]. Distill, 2016, 1(10):e3.
[14] Kingma D P, Ba J L. Adam:A method for stochastic optimization[EB/OL]. (2015-07-23)[2018-06-01]. https://arxiv.org/pdf/1412.6980v8.
[15] Krzanowski W J. Between-groups comparison of principal components[J]. Journal of the American Statistical Association, 1979, 74(367):703-707.
[16] 韩敏,刘晓欣.基于互信息的分步式输入变量选择多元序列预测研究[J].自动化学报, 2012, 38(6):999-1006. Han M, Liu X X. Stepwise input variable selection based on mutual information for multivariate forecasting[J]. Acta Automatica Sinica, 2012, 38(6):999-1006.
[17] Berndt D J, Clifford J. Finding patterns in time series:A dynamic programming approach[M]//Advances in Knowledge Discovery and Data Mining. Menlo Park, USA:AAAI, 1996:229-248.
[18] Fard M J, Ameri S, Chinnam R B, et al. Soft boundary approach for unsupervised gesture segmentation in roboticassisted surgery[J]. IEEE Robotics and Automation Letters, 2017, 2(1):171-178.
[19] Gao Y, Vedula S S, Reiley C E, et al. JHU-ISI gesture and skill assessment working set (JIGSAWS):A surgical activity dataset for human motion modeling[C]//Modeling and Monitoring of Computer Assisted Interventions Workshop. 2014:1-10.
[20] Wu C X, Zhang J M, Savarese S, et al. Watch-n-patch:Unsupervised understanding of actions and relations[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, USA:IEEE, 2015:4362-4370.