As deepfake technology advances, forgery detection techniques have evolved beyond simple classification to include fine-grained localization. However, existing deepfake localization methods struggle with with real-world deepfake videos, which are often multi-face scenarios with only some parts manipulated. To address the above-mentioned problems, we propose a Multi-View Inconsistency Measurement (MVIM) network that simultaneously measures inconsistencies from noise and temporal view to detect and locate tampered regions. Specifically, considering the noise inconsistencies in multi-face scenarios where fake faces have inconsistent noise patterns compared to real faces and backgrounds, we design a Noise Inconsistency Measurement (Noise-IM) module that measures noise similarity among faces and between faces and backgrounds using a masked attention mechanism to identify suspected tampered regions in noise domain. Since facial jitter of tampered regions in deepfake videos is observed to be more intense than that of real regions, we design a Temporal Inconsistency Measurement (Temporal-IM) module which adopts self-attention mechanism and fine-grained bi-direction convolutions to capture tampering traces between frames in temporal domain. Inconsistency features obtained by the two modules are fused for detecting and locating tampered regions. The superiority of our MVIM network is verified by extensive experiments with many state-of-the-art methods in different benchmark datasets.