Radiology report generation, which automatically generates diagnostic textual reports from medical images, plays a crucial role in improving clinical efficiency and diagnostic accuracy. However, existing radiology report generation models face numerous challenges, such as lack of interpretability as well as description inaccuracy. To address these issues, we propose an integrated framework that enhances radiology report generation by combining target detection with contextual alignment of relevant region descriptions. Target detection focuses on clinically significant areas within medical images, while contextual alignment ensures that the generated text is directly linked to visual findings. Additionally, we introduce a full-spectrum feature fusion method that combines both high- and low-frequency features from the images. This approach captures details and broader structures, allowing the model to gain a more comprehensive and hierarchical understanding of the images. We validated the effectiveness of our method on the public dataset MIMIC-CXR. The results indicate that our method outperforms previous approaches on multiple evaluation metrics. Notably, in terms of the average of the six traditional metrics, our method (VTAG) achieved a significant improvement of 14.3%, compared to the state-of-the-art model MLRG.