Abstract Controlling multiple degrees of freedom of light in a small‐footprint with high‐efficiency in a foundry‐manufacturable platform is foundational for a range of classical and quantum technologies. Achieving this requires photonic design strategies that go beyond traditional physical intuition‐based methods. Reinforcement learning (RL), a subset of machine learning, is successful in achieving optimum outcomes in dynamically evolving environments, e.g., in strategy games or self‐driving cars. Here, a novel paradigm based on reinforcement learning is presented for photonic device design, and multifunctional metasurface optics and integrated photonic devices operating in the visible are realized. RL‐based metasurface optics operating at free‐space wavelengths of 461 and 689 nm designed are fabricated and experimentally characterized to simultaneously deflect input light at large deflection angles and maintain or change its polarization. Further, the RL approach is used to design in‐plane integrated photonic devices such as bends, mode‐converters, wavelength demultiplexers, and beamsplitters, as well as waveguide‐coupled grating out‐couplers to both control the angle of the out‐of‐plane beam emission and polarization at visible wavelengths on a silicon nitride platform. The results, targeting a two‐color strontium magneto‐optical trap for the realization of a portable, alignment‐free optical lattice clock, elucidate the potential of reinforcement learning for the design of high‐performance optics.