To navigate complex physical environments, animals keep track of the spatial relations among objects using various reference frames, both body-based (e.g., left/right) and environment-based (e.g., east/west), but how these spatial representations interact remains unresolved. Whereas neuroscientific findings show habitual integration across reference frames, psycholinguistic accounts suggest humans use one reference frame at a time, as in speech. This article examines whether people spontaneously use two reference frames in the same action. When placing a single object in a two-dimensional array, adult participants ( N = 110) routinely used an environment-based frame to determine the object’s left–right position while using a body-based frame to determine its front–back position at the same time. Such hybrid responses were prevalent among both Indigenous Tsimane’ and educated U.S. participants, suggesting that people across cultures habitually construct compound cognitive maps to represent the multidimensional spatial relations that compose natural settings.