摘要
Multiple object tracking (MOT) is a technique of localizing numerous moving objects over time in a video clip. There are several uses for MOT, including augmented reality, traffic management, medical imaging, surveillance and security, video editing, and video transmission and compression. Generally, MOT is a two-step process that includes object detection and association. Initially, a distinct identifier is allocated to each identified object in the first frame, and then motion trajectories of the detected objects were extracted. All the objects are detected, and their track is preserved in every frame that follows in an image stream. Afterward, the trajectories of each detected object are determined in the existing frame based on its position in the previous frame. MOT aims to determine improved object connections to increase the affinity between objects in the subsequent frames. But exact multiple object tracing is extremely difficult. The challenges are either due to object deformation, namely, pose variation, occlusion, and background clutter, or due to the dynamic environmental variations, namely, fog, snow, rain, and dust particles. In order to cope with these challenges, plenty of work is suggested exploiting deeplearning (DL). In this chapter, we have reviewed the various DL-based MOT algorithms utilized for object detection and tracking. Salient features of these algorithms are reviewed along with performance analysis. In addition, recent performance metrics for MOT algorithms performance evaluation are exhaustively analysed for their application to real-world.