Abstract Infrared–visible image fusion (IVIF) integrates complementary thermal and photometric cues for surveillance, remote sensing, and autonomous perception. Existing surveys, while comprehensive, provide limited guidance for design-to-deployment and seldom relate fusion quality to task outcomes or device constraints. This work provides a unified perspective that organizes IVIF methods along an interface-attention-alignment coordinate system covering classical spatial/transform pipelines and contemporary deep paradigms (generative, discriminative, multi-task, hybrid/Transformer, dynamic). Building on literature through 2025, we synthesize fidelity-robustness-efficiency trade-offs and introduce a comparison-to-deployment protocol that couples fusion metrics with task accuracy (AP/mIoU), latency, memory footprint, and condition-performance characterization (misregistration, noise, illumination/weather). We consolidate Transformer/hybrid coverage with practical recipes and focused guidance on temporal consistency, robustness auditing, and physics-grounded interpretability. Compared with previous reviews, our survey concurrently addresses four under-covered dimensions-video temporal consistency, robustness auditing, task-aware evaluation, and deployment reporting-and distills a practical checklist linking architectural choices to operating conditions and hardware budgets, enabling reproducible, task-relevant IVIF practice.