计算机科学
声音(地理)
动画
可视化
非语言交际
人机交互
语音识别
多媒体
人工智能
声学
计算机图形学(图像)
沟通
物理
社会学
作者
Fangzhou Wang,Hidehisa Nagano,Kunio Kashino,Takeo Igarashi
标识
DOI:10.1109/tmm.2016.2613641
摘要
Sound information in videos plays an important role in shaping the user experience. When sound is not accessible in videos, text captions can provide sound information. However, conventional text captions are not very expressive for nonverbal sounds because they are designed to visualize speech sounds. Here, we present a framework to automatically transform nonverbal video sounds into animated sound words and position them near the sound source objects in the video for visualization. This provides natural visual representation of nonverbal sounds with rich information about the sound category and dynamics. To evaluate how the animated sound words generated by our framework affect the user experience, we implemented an experimental system and conducted a user study involving over 300 participants from an online crowdsourcing service. The results of the user study show that the animated sound words can effectively and naturally visualize the dynamics of sound while clarifying the position of the sound source as well as contribute to making video-watching more enjoyable and increasing the visual impact of videos.
科研通智能强力驱动
Strongly Powered by AbleSci AI