ABSTRACT X‐ray detection of contraband is crucial for public safety; however, it often faces challenges due to cluttered backgrounds and overlapping objects in security inspection images. This study proposes a novel detection framework based on You Only Look Once version 8 (YOLOv8), incorporating three key innovations: multi‐scale cross‐axis attention (MCA), which captures global dependencies through horizontal and vertical collaborative attention, effectively mitigating irrelevant features in complex X‐ray scenarios; a lightweight bottleneck architecture using partial convolution (PConv), which significantly reduces floating point operations (FLOPs) while preserving positional sensitivity; and the focal‐enhanced intersection over union (Focaler‐IoU) loss function, which dynamically weights difficult samples to enhance regression accuracy. Experiments on the prohibited item detection in the X‐ray dataset revealed that our model achieves a mean average precision (IoU = 0.5) (mAP@0.5) of 97.3%, outperforming YOLOv8s by 1.2 percentage points, and maintains real‐time performance of 121 frames per second, surpassing YOLOv10‐S (96.5%) and YOLOv12‐S (96.8%). Ablation studies highlight the contribution of each module: MCA enhances mAP by 0.7%, PConv decreases FLOPs by 31%, and Focaler‐IoU increases precision by 0.9% and recall by 2.4%. The proposed method exhibits substantial potential for real‐time security inspections.