ABSTRACT Region embedding maps geographic regions into a unified latent vector space, facilitating various geographic intelligent tasks. However, most existing methods treat each region as a single, homogeneous unit, thereby overlooking the spatial heterogeneity that exists within. To address this limitation, we propose Structure‐Aware Region Embedding (SARE), a novel SARE framework that explicitly models the local spatial heterogeneity within regions. SARE first partitions each region into regular subregions, allowing data to be collected and modeled at a finer granularity. It then integrates point‐of‐interest (POI) data and remote sensing (RS) images to learn the semantic features of subregions. By jointly capturing spatial structural semantics and multiview feature semantics through the proposed hex‐level fusion (HLF) and region‐level fusion (RLF) modules, SARE generates rich and expressive region embeddings. Experiments on four tasks demonstrate that SARE outperforms state‐of‐the‐art baselines. The visualizations of spatial heterogeneity and similar region searches further illustrate SARE is superior and more explainable than previous methods. This paper not only offers new insights into region embedding methods but also provides methodological reference for various geographic intelligent applications.