The mainstream launch of generative AI video platforms represents a major change to the socio-technical system of digital media, raising critical questions about public perception and societal impact. While research has explored isolated technical or ethical facets, a holistic understanding of the user experience of AI-generated videos—as an interrelated set of perceptions, emotions, and behaviors—remains underdeveloped. This study addresses this gap by conceptualizing public discourse as a complex system of interconnected themes. We apply a mixed-methods approach that combines quantitative LDA topic modeling with qualitative interpretation to analyze 11,418 YouTube comments reacting to AI-generated videos. The study’s primary contribution is the development of a novel, three-tiered framework that models user experience. This framework organizes 15 empirically derived topics into three interdependent layers: (1) Socio-Technical Systems and Platforms (the enabling infrastructure), (2) AI-Generated Content and Esthetics (the direct user-artifact interaction), and (3) Societal and Ethical Implications (the emergent macro-level consequences). Interpreting this systemic structure through the lens of the ABC model of attitudes, our analysis reveals the distinct Affective (e.g., the “uncanny valley”), Behavioral (e.g., memetic participation), and Cognitive (e.g., epistemic anxiety) dimensions that constitute the major elements of user experience. This empirically grounded model provides a holistic map of public discourse, offering actionable insights for managing the complex interplay between technological innovation and societal adaptation within this evolving digital system.