Abstract Vehicle load identification (VLI) is pivotal for bridge health monitoring, safety assessment, and intelligent maintenance. However, computer vision‐based VLI is confronted by two critical challenges, that is, compromised identification accuracy under dynamic scene and computational constraints imposed by edge monitoring devices. To this end, a low‐complexity real‐time detection Transformer (LC‐RTDETR) is developed to establish a framework for bridge VLI. The proposed LC‐RTDETR provides foundational perception for VLI and features three advantages: (1) lightweight feature extraction via the star network backbone, (2) robust feature representation enabled by the dynamic‐range histogram self‐attention module for single‐scale fusion, and (3) enhanced multi‐scale processing efficiency through the proposed context‐guided spatial feature reconstruction pyramid network. This architecture augments accuracy in complex scenes while reducing computational demands. For continuous trajectory acquisition, detections from the proposed LC‐RTDETR are utilized by BoT‐SORT tracking, which incorporates bridge‐specific camera motion estimation and two‐stage identity association. Precise vehicle positioning is achieved through dual‐bounding‐box localization, in which body‐suspension error minimization and orientation vector updating are implemented. Experimentally, LC‐RTDETR outperforms RTDETR with a 9.8% higher frames per second, 48.2% fewer parameters, and 65.4% lower floating‐point operations. Practical validation confirms robustness to illumination changes, occlusion, motion blur, and adverse weather while accurately capturing stable trajectory during lane‐changing maneuvers and speed fluctuations to enable vehicle localization. Finally, effective weight‐position matching is fully integrated within the framework.