计算机科学
分割
人工智能
稳健性(进化)
计算机视觉
图像分割
钥匙(锁)
卷积(计算机科学)
视频跟踪
尺度空间分割
基于分割的对象分类
模式识别(心理学)
视频处理
视频后处理
图像处理
骨料(复合)
计算复杂性理论
核(代数)
视频去噪
像素
稀疏矩阵
跟踪(教育)
作者
Jisheng Dang,Huicheng Zheng,Hao Chen,Ang Su,Yulan Guo,Tat-Seng Chua
标识
DOI:10.1109/tip.2025.3649365
摘要
Recent advances in "track-anything" models have significantly improved fine-grained video understanding by simultaneously handling multiple video segmentation and tracking tasks. However, existing models often struggle with robust and efficient temporal propagation. To address these challenges, we propose the Sparse Spatio-Temporal Propagation (SSTP) method, which achieves robust and efficient unified video segmentation by selectively leveraging key spatio-temporal features in videos. Specifically, we design a dynamic 3D spatio-temporal convolution to aggregate global multi-frame spatio-temporal information into memory frames during memory construction. Additionally, we introduce a spatio-temporal aggregation reading strategy to efficiently aggregate the relevant spatio-temporal features from multiple memory frames during memory retrieval. By combining SSTP with an image segmentation foundation model, such as the segment anything model, our method effectively addresses multiple data-scarce video segmentation tasks. Our experimental results demonstrate state-of-the-art performance on five video segmentation tasks across eleven datasets, outperforming both task-specific and unified methods. Notably, SSTP exhibits strong robustness in handling sparse, low-frame-rate videos, making it well-suited for real-world applications.
科研通智能强力驱动
Strongly Powered by AbleSci AI