ABSTRACT This study addresses the challenge of pedestrian detection in low‐light conditions, in which traditional detection models often suffer performance degradation due to insufficient illumination and low contrast. We propose a novel detection model, YOLO‐LFormer, which integrates low‐light image enhancement with a lightweight vision transformer. Lightweight YUV transformer‐based network for low‐light image enhancement (LYT‐Net) is employed to enhance image brightness and details, while a MobileViTv3 backbone network combines CNN and transformer structures to extract local and global features. The temporal–spatial attention (TSA) attention mechanism and reparameterized convolution with channel shuffle (RCS) reparameterized convolution are introduced to enhance feature representation, and the Wise‐IOUv3 loss function optimizes bounding box regression. Experiments on the BDD100K low‐light dataset demonstrate that YOLO‐LFormer achieves 78.42% and 44.35% on mAP@0.5 and mAP@0.5:0.95, respectively, outperforming various mainstream detection models. This approach offers high accuracy, real‐time performance, and suitability for resource‐constrained practical scenarios.