自编码
残余物
MNIST数据库
计算机科学
人工智能
代表(政治)
模式识别(心理学)
任务(项目管理)
机器学习
算法
深度学习
工程类
政治
系统工程
法学
政治学
作者
Mohammad Adiban,Kalin Stefanov,Sabato Marco Siniscalchi,Giampiero Salvi
出处
期刊:Cornell University - arXiv
日期:2023-07-14
标识
DOI:10.48550/arxiv.2307.06701
摘要
We address the video prediction task by putting forth a novel model that combines (i) a novel hierarchical residual learning vector quantized variational autoencoder (HR-VQVAE), and (ii) a novel autoregressive spatiotemporal predictive model (AST-PM). We refer to this approach as a sequential hierarchical residual learning vector quantized variational autoencoder (S-HR-VQVAE). By leveraging the intrinsic capabilities of HR-VQVAE at modeling still images with a parsimonious representation, combined with the AST-PM's ability to handle spatiotemporal information, S-HR-VQVAE can better deal with major challenges in video prediction. These include learning spatiotemporal information, handling high dimensional data, combating blurry prediction, and implicit modeling of physical characteristics. Extensive experimental results on four challenging tasks, namely KTH Human Action, TrafficBJ, Human3.6M, and Kitti, demonstrate that our model compares favorably against state-of-the-art video prediction techniques both in quantitative and qualitative evaluations despite a much smaller model size. Finally, we boost S-HR-VQVAE by proposing a novel training method to jointly estimate the HR-VQVAE and AST-PM parameters.
科研通智能强力驱动
Strongly Powered by AbleSci AI