计算机科学
判别式
人工智能
可解释性
自编码
边距(机器学习)
构造(python库)
特征学习
特征(语言学)
排名(信息检索)
机器学习
深度学习
语言学
哲学
程序设计语言
作者
Ziyi Wang,Bo Lu,Xiaojie Gao,Yueming Jin,Zerui Wang,Tak Hong Cheung,Pheng‐Ann Heng,Qi Dou,Yunhui Liu
标识
DOI:10.1016/j.media.2021.102296
摘要
In this paper, we propose a novel method of Unsupervised Disentanglement of Scene and Motion (UDSM) representations for minimally invasive surgery video retrieval within large databases, which has the potential to advance intelligent and efficient surgical teaching systems. To extract more discriminative video representations, two designed encoders with a triplet ranking loss and an adversarial learning mechanism are established to respectively capture the spatial and temporal information for achieving disentangled features from each frame with promising interpretability. In addition, the long-range temporal dependencies are improved in an integrated video level using a temporal aggregation module and then a set of compact binary codes that carries representative features is yielded to realize fast retrieval. The entire framework is trained in an unsupervised scheme, i.e., purely learning from raw surgical videos without using any annotation. We construct two large-scale minimally invasive surgery video datasets based on the public dataset Cholec80 and our in-house dataset of laparoscopic hysterectomy, to establish the learning process and validate the effectiveness of our proposed method qualitatively and quantitatively on the surgical video retrieval task. Extensive experiments show that our approach significantly outperforms the state-of-the-art video retrieval methods on both datasets, revealing a promising future for injecting intelligence in the next generation of surgical teaching systems.
科研通智能强力驱动
Strongly Powered by AbleSci AI