计算机科学
人工智能
计算机视觉
语音识别
声音(地理)
声学
物理
作者
Qiutang Qi,Haonan Cheng,Yang Wang,Long Ye,Shaobin Li
标识
DOI:10.1145/3581783.3613765
摘要
Existing methods are difficult to synthesize fine-grained footsteps based on video frames only. This is due to the complicated nonlinear mapping relationships between motion states, spatial locations and different footstep sounds. Aiming to address this issue, we propose a Rule-Data guided Fine-Grained Footstep Sound (RD-FGFS) synthesis method. To the best of our knowledge, our work takes the first step in integrating data-driven and rule modeling approaches for visually aligned footstep sound synthesis. Firstly, we design a learning-based footstep sound generation network (FSGN) architecture driven by pose and flow features. The FSGN is proposed for generating an initial target sound which captures timing cues. Secondly, a rule-based fine-grained footstep sound adjustment (FGFSA) method is designed based on the visual guidance, namely ground material, movement type, and displacement distance. The proposed FGFSA effectively constructs a mapping relationship between different visual cues and footstep sounds, enabling fine-grained variations of footstep sounds. Experimental results show that our method improves the visual and sound synchronization results of footsteps and achieves impressive performance in footstep sound fine-grained control.
科研通智能强力驱动
Strongly Powered by AbleSci AI