Visual inspection of civil infrastructure has traditionally relied on manual operations, characterized by high labor intensity, low efficiency, and limited scalability, which significantly constrains its effectiveness in modern maintenance and management scenarios. Although deep learning technologies have demonstrated remarkable potential for automation and precision, their practical implementation in real-world engineering contexts remains hindered by the scarcity of large-scale, high-quality annotated datasets. To address this challenge, this study constructs a UAV-based dataset of building surface defects, comprising 14,471 high-resolution images captured across six structural types and five representative defect categories from both urban and rural environments. The dataset includes various defect types such as cracks, abscission, leakage, corrosion, and bulging, recorded under diverse illumination and environmental conditions. Each image is annotated with standardized bounding boxes and systematically divided into training, validation, and testing subsets. This dataset provides a comprehensive, diverse, and publicly accessible benchmark for advancing multi-task research in defect detection, segmentation, and automated visual assessment of building surfaces.