Context drives our interpretations of music as surprising, frightening, or awe-inspiring. However, it remains unclear how formal musical training affects our ability to hierarchically integrate complex tonal information to efficiently predict, remember, and segment music. We scrambled naturalistic music at multiple timescales to manipulate coherent tonal context while controlling for multiple acoustic cues. Memory (Experiment 1; n = 108, age range = 19–41 years) and prediction (Experiment 2; n = 108, age range = 20–41 years) improved with more intact context for both musicians and nonmusicians. Listeners’ event boundaries were influenced by the amount of tonal context but also reflected nested phrase structure, and musicians were more sensitive to longer-timescale “hyperphrase” structure (Experiment 3; n = 95, age range = 20–42 years) and could better identify the amount of scrambling (Experiment 4; n = 108, age range = 19–41 years). These results indicate that listeners integrate tonal context across complex phrases to efficiently encode, predict, and segment naturalistic music and that in general, training has surprisingly little impact on this integration.