Multimodal sarcasm analysis is one of the most challenging research branch of the sentiment analysis area, due to the presence of cross-modality incongruity. However, existing works mainly attend to the coarse-grained incongruity analysis, and totally ignore the sentiment semantic coupling issue. This indeed limits the discriminate capability and robustness of the sarcasm analysis model. In order to address the above issue, we propose a novel fine-grained semantic disentanglement network (FSDN). Specifically, the intra-modality semantic disentanglement is performed to investigate the more intrinsic semantic cues of the same modality. Additionally, the inter-modality semantic disentanglement is leveraged to simultaneously facilitate the common and intrinsic semantic cues across modalities. Furthermore, the dual-spatial semantic interaction block is presented to explore the long-range cross-spatial semantic context between the obtained verbal and non-verbal semantic space with the global view. The above semantic disentanglement processes with both local and global views significantly unleash much more robustness even for the sarcasm case consists of multiple semantic message. Various experiments indicate that FSDN can receive state-of-the-art or competitive performance.