计算机科学
图像(数学)
人工智能
情报检索
自然语言处理
作者
Ayse D. Lokmanoglu,Dror Walter
标识
DOI:10.1080/19312458.2025.2549707
摘要
Understanding visual narratives is essential for analyzing media representation. This paper introduces VisTopics, a computational framework for analyzing large-scale visual datasets through frame extraction, deduplication, image captioning, and topic modeling. In Study 1, we apply VisTopics to 452 NBC News videos, reducing 11,070 frames to 6,928 unique images and identifying 35 latent topics – ranging from political unrest to environmental disasters – using Latent Dirichlet Allocation (LDA) on AI-generated captions. Study 2 extends this approach to 7,290 socially mediated news images, demonstrating the method's scalability beyond video. We validate the reliability of VisTopics through human-coded accuracy, internal coherence metrics, and comparative replication of traditional image clustering methods. Our findings show that caption-based topic modeling captures semantically coherent groupings while enabling interpretable, human-readable outputs. By bridging the gap between visual representation and semantic meaning, VisTopics provides a transformative tool for advancing the methodological toolkit in computational media studies. Future research may leverage VisTopics for comparative analyses across media outlets or geographic regions, offering insights into the shifting landscapes of media narratives and their societal implications.
科研通智能强力驱动
Strongly Powered by AbleSci AI