A single-modal infrared or visible image offers limited representation in scenes with lighting degradation or extreme weather. We propose a multi-modal fusion framework, named SDSFusion, for all-day and all-weather infrared and visible image fusion. SDSFusion exploits the commonality in image processing to achieve enhancement, fusion, and semantic task interaction in a unified framework guided by semantic awareness and multi-scale features and losses. To address the disparity between infrared and visible images in degraded scenes, we differentiate modal features in a unified fusion model. Unlike existing joint fusion methods, we propose an adversarial generative network that refines the reconstruction of low-light images by embedding fused features. It provides feature-level brightness supplementation and image reconstruction to refine brightness and contrast. Extensive experiments in degraded scenes confirm that our approach is superior to state-of-the-art approaches in visual quality and performance, demonstrating the effectiveness of interaction improvement. The code will be posted at: https://github.com/Liling-yang/SDSFusion.