ABSTRACT As visual content is increasingly prioritized by social media platforms, the effective interplay between image and text is critical for capturing consumer attention. This research aims to investigate the effects of two novel visual cues—image entropy (disorder) and text direction—and presents the concept of an image–text fit effect. Through three eye‐tracking experiments, we demonstrate that high‐entropy (vs. low‐entropy) images and vertical (vs. horizontal) text direction significantly increase consumer attention. We identify a “feeling right” sense as the underlying psychological mechanism, which can be explained via time perception association. Furthermore, we examine the moderating effect of emoji intensity in social media communications on capturing consumer attention. These findings increase the theoretical understanding of visual marketing and provide actionable strategies for practitioners.