摘要
Crowd counting in smart surveillance systems plays a crucial role in Internet of Things (IoT) and smart cities, and can affect various aspects such as public safety, crowd management, and urban planning. Using surveillance data to centrally train a crowd counting model raises significant privacy concerns. Traditional methods try to alleviate the concern by reducing the focus on individuals, but the concern still needs to be thoroughly resolved. In this work, we develop a horizontal federated learning (HFL) framework to train the crowd counting models which can preserve privacy simultaneously. This framework enables the smart surveillance system to learn from model aggregation without accessing the private data stored on local devices. Therefore, it eliminates the need for video data transmission, reduces communication costs, and avoids raw data leakage. Due to the lack of FL crowd counting datasets, we design four non-IID (non-Independent and Identically Distributed) partitioning strategies, including feature-skew, quantity-skew, scene-skew, and time-skew, to simulate real-world FL scenarios. In addition, we present an efficient fully convolutional network (e-FCN) for each client to demonstrate the practical applicability of the proposed framework. The e-FCN adopts an encoder-decoder architecture with fewer parameters, making it communication-friendly and easier to train. This design can achieve competitive performance compared to more complex models in surveillance crowd counting in literature. Finally, we evaluate the proposed HFL framework with e-FCN under our skew strategies on multiple real-world datasets, including Crowd Surveillance, ShanghaiTech PartB, WorldExpo’10, FDST, CityUHK-X, UCSD, and MALL. Extensive experiments allow us to present our developed Federated Crowd Counting benchmark as a reference for future research and provide guidance for FL algorithm selection in smart surveillance system deployment.