修补
计算机科学
计算机视觉
变压器
人工智能
像素
追踪
光流
推论
光学(聚焦)
帧(网络)
钥匙(锁)
图像(数学)
电信
计算机安全
物理
量子力学
操作系统
电压
光学
作者
Yongsheng Yu,Heng Fan,Libo Zhang
出处
期刊:Cornell University - arXiv
日期:2023-01-01
被引量:4
标识
DOI:10.48550/arxiv.2307.08629
摘要
Recent video inpainting methods have made remarkable progress by utilizing explicit guidance, such as optical flow, to propagate cross-frame pixels. However, there are cases where cross-frame recurrence of the masked video is not available, resulting in a deficiency. In such situation, instead of borrowing pixels from other frames, the focus of the model shifts towards addressing the inverse problem. In this paper, we introduce a dual-modality-compatible inpainting framework called Deficiency-aware Masked Transformer (DMT), which offers three key advantages. Firstly, we pretrain a image inpainting model DMT_img serve as a prior for distilling the video model DMT_vid, thereby benefiting the hallucination of deficiency cases. Secondly, the self-attention module selectively incorporates spatiotemporal tokens to accelerate inference and remove noise signals. Thirdly, a simple yet effective Receptive Field Contextualizer is integrated into DMT, further improving performance. Extensive experiments conducted on YouTube-VOS and DAVIS datasets demonstrate that DMT_vid significantly outperforms previous solutions. The code and video demonstrations can be found at github.com/yeates/DMT.
科研通智能强力驱动
Strongly Powered by AbleSci AI