摘要
Background: Molecular subtyping guides bladder cancer (BCa) care but typically requires RNA profiling. This study aimed to develop pathology-based subtypes of BCa using pathology deep learning features derived from routinely obtained hematoxylin and eosin (H&E)-stained whole-slide images (WSIs). Methods: We developed a pathology-based subtype of BCa based on deep learning features extracted from H&E-stained WSIs. A modified Net50 model was trained to distinguish be-tween tumor and normal regions and extract patch-level deep features. These features were aggregated at the WSI level, followed by weighted gene co-expression network analysis (WGCNA), Cox regression, and unsupervised K-means clustering to define pathology-based subtypes. External validation was performed using WSIs from four independent centers and transcriptomic data from IMvigor210 and GSE32894 cohorts. Interpretability used Grad-CAM on tumor patches. results: WSI and patient distribution across centers A total of 457 WSIs from 379 patients in the TCGA dataset were used for clustering, 244 WSIs from 179 patients in GD2H, 178 WSIs from 164 patients in STPH, 257 WSIs from 251 patients in ZSSY, and 291 WSIs from 112 patients in PN. Detailed clinical and pathological characteristics of the patients are shown in Table 1. Performance of the trained Resnet50 model in distinguishing between tumor and normal patches The Resnet50 model, following transfer learning, showed novel performance in distinguishing tumor and normal patches across the TCGA training set (Figure 2A-B), internal validation set (Figure 2A,C) and external validation sets (Figure 2D-G). For the TCGA training set, the model achieved an accuracy of 98%, with a sensitivity of 99% and specificity of 97%. The performance was slightly lower in the TCGA validation set, where the accuracy was 97%, with sensitivity and specificity both at 97%. In the external validation datasets, the model achieved an accuracy ranging from 95% to 98%, with sensitivity between 92% and 96%, and specificity ranging from 95% to 99%. These results highlight the model’s robust performance across diverse datasets (Table S1). Identification and prognostic significance of pathology-based consensus molecular subtypes Through WGCNA in the TCGA dataset, the red, green, and yellow modules (comprising a total of 771 pathology-based deep learning features) were found to be strongly correlated with the luminal and basal/squamous scores in BCa (Figure 3A). These features were then subjected to univariate Cox regression analysis, resulting in the selection of 163 prognostically significant features. These features were subsequently used in K-means clustering to establish the pathology-based consensus molecular classification. The SSE plot indicated that the optimal K-value was 4, as the elbow point was observed at K = 4 (Figure 3B). Therefore, K = 4 was selected for the development of the classification, resulting in the identification of four distinct subtypes: cluster 0, cluster 1, cluster 2, and cluster 3. PCA plot demonstrated clear separation among these subtypes, highlighting their distinct distribution (Figure 3C). Kaplan-Meier survival analysis of the TCGA (Figure 3D), IMvigor210 (Figure 3E), and GSE32894 (Figure 3F) datasets revealed significant prognostic differences between the subtypes. Cluster 1 patients exhibited the poorest prognosis, while cluster 2 patients had the best outcomes. Cluster 0 and cluster 3 patients showed intermediate survival outcomes, positioned between the prognosis of cluster 1 and cluster 2 patients . Similar prognostic trends were observed in external validation datasets from the four centers included in this study, reinforcing the robustness and predictive value of the pathology-based consensus molecular classification (Figure 4). Association of consensus molecular subtypes with clinical and pathological features To explore the relationship between the pathology-based consensus molecular subtypes and clinical/pathological features, we performed a detailed analysis using data from the TCGA. The molecular subtypes were significantly associated with several clinical and pathological characteristics, including age, tumor stage, grade, and T stage and N stage (Figure 5A). Patients in cluster 1, who had the worst prognosis, were more likely to present with advanced age, high T stage, high N stage, high tumor grade, and advanced overall tumor stage compared to patients in other clusters (Figure 5B). These findings suggest that the pathology-based consensus molecular subtypes not only predict prognosis but also correlate with key clinical features, providing valuable insights into the underlying biology of BCa. Relationship between pathology-based consensus molecular subtypes and existing BCa molecular subtypes To investigate the relationship between the pathology-based consensus molecular subtypes and existing BCa molecular subtypes, we compared our classification system with the well-established molecular subtypes from TCGA, Lund, UNC, MDA, CIT, EUA and Baylor. The analysis revealed that patients in cluster 0 and cluster 1 showed a mixed association with both basal and luminal subtypes. Approximately half of the patients in these clusters were related to the basal-like subtypes, while the other half were associated with the luminal-like subtypes. In contrast, patients in cluster 2 and cluster 3 were predominantly associated with the luminal-like subtypes, reflecting their favorable prognosis and lower tumor stages (Figure 6A). Heatmap analysis of molecular subtype-related genes further confirmed these findings. Cluster 0 and cluster 1 patients exhibited high expression of basal-associated genes in about half of the cases, while cluster 2 and cluster 3 patients primarily showed high expression of luminal-associated genes. Furthermore, a significant difference in the expression of basal and luminal-related genes was observed between the subtypes, indicating that the pathology-based consensus molecular subtypes reflect distinct molecular profiles (Figure 6B). Gene mutation differences across pathology-based consensus molecular subtypes To investigate gene mutation differences across the identified molecular subtypes, the top 10 most frequent gene mutations in each subtype were visualized using a waterfall plot. The analysis revealed that TP53 was the most frequently mutated gene in patients from cluster 0, cluster 1 and cluster 2, while FGFR3 was the most frequent mutation in cluster 3 patients (Figure 7A). This differentiation in mutation profiles highlights the molecular diversity between the subtypes and may contribute to their distinct prognoses. Further analysis of mutations in cancer-related pathways and TMB was performed using heatmap visualization (Figure 7B). The results revealed significant differences in the mutation frequencies of several key pathways, including: mismatch repair, TP53 signaling pathway, apoptosis, WNT signaling pathway, TGF Beta signaling pathway, focal adhesion, ECM receptor interaction, cell cycle activity, smooth muscle, myofibroblasts, interferon response, late cell cycle, FGFR3 coexpressed genes, immune checkpoint genes. Additionally, differences in TMB were observed across the four subtypes, with cluster 3 showing the lowest mutational burden compared to the other clusters. These findings suggest that specific oncogenic pathways and TMB values may contribute to the differences in clinical outcomes and prognosis across the subtypes, providing deeper insights into the molecular mechanisms underlying BCa. Immune infiltration differences across pathology-based consensus molecular subtypes To investigate immune infiltration differences across the pathology-based consensus molecular subtypes, we first examined the ESTIMATE, immune, and stromal scores for each subtype. Overall, cluster1 exhibited the highest ESTIMATE, immune, and stromal scores, indicating a higher level of immune and stromal component presence in these patients. In contrast, cluster 2 and cluster 3 showed relatively lower scores, suggesting differences in the tumor microenvironment across the subtypes (Figure 8A-C). Further analysis of 22 tumor-associated immune cell types revealed significant variations in immune cell infiltration across the subtypes. Specifically, macrophages M0 and M1 were highly infiltrated in cluster 1, while dendritic cells and B memory cells exhibited lower infiltration levels in this subtype. Conversely, macrophages M0 and M1 showed low infiltration in cluster 3, while dendritic cells and B memory cells were highly infiltrated in cluster 3 (Figure 8D-E). These findings suggest that the immune microenvironment, particularly the abundance of specific immune cells, varies significantly across the subtypes, which may contribute to differences in prognosis and response to immunotherapies. Differences in regulon activity and clinical-relevant signatures across pathology-based consensus molecular subtypes To investigate the differences in regulon activity and clinical-relevant signatures across the four identified molecular subtypes, we performed a detailed analysis using heatmap visualization. In cluster 1, high activity was observed in the following key oncogenic regulons: EGFR, FOXM1, STAT3, FGFR1, and GATA6. Conversely, low activity was detected in FGFR3, FOXA1, ERBB3, RARG, RXRA, GATA3, ERBB2, PPARG, and AR. In contrast, cluster 3 showed the opposite profile, with high activity in FGFR3, FOXA1, ERBB3, RARG, RXRA, GATA3, ERBB2, PPARG, and AR, and low activity in EGFR, FOXM1, STAT3, FGFR1, and GATA6 (Figure 9A). These