Abstract Due to the high complexity and technical requirements of industrial production processes, surface defects will inevitably appear, which seriously affect the quality of products. Although existing lightweight detection networks are highly efficient, they are susceptible to false or missed detection of non-salient defects due to the lack of semantic information. In contrast, the diffusion model can generate higher-order semantic representations in the denoising process. Therefore, this paper aims to incorporate the higher-order modeling capability of diffusion models into the detection framework, to better support the classification and localization of challenging targets. First, the denoising diffusion probabilistic model (DDPM) is pre-trained to extract the features of the denoising process to construct a feature repository. In particular, to avoid the potential bottleneck of memory caused by the dataloader loading high-dimensional features, a residual convolutional variational auto-encoder is designed to further compress the feature repository. The image is fed into both the image backbone and feature repository for feature extraction and querying respectively. The queried latent features are reconstructed and filtered to obtain high-dimensional DDPM features. A dynamic cross-fusion method is proposed to fully refine the contextual features of DDPM to optimize the detection model. Finally, we employ knowledge distillation to migrate the higher-order modeling capabilities back into the lightweight baseline model without additional efficiency cost. Experiment results demonstrate that our method achieves competitive results on several industrial datasets.