Catalyst\ndiscovery and optimization is key to solving many societal\nand energy challenges including solar fuel synthesis, long-term energy\nstorage, and renewable fertilizer production. Despite considerable\neffort by the catalysis community to apply machine learning models\nto the computational catalyst discovery process, it remains an open\nchallenge to build models that can generalize across both elemental\ncompositions of surfaces and adsorbate identity/configurations, perhaps\nbecause datasets have been smaller in catalysis than in related fields.\nTo address this, we developed the OC20 dataset, consisting of 1,281,040\ndensity functional theory (DFT) relaxations (∼264,890,000 single-point\nevaluations) across a wide swath of materials, surfaces, and adsorbates\n(nitrogen, carbon, and oxygen chemistries). We supplemented this dataset\nwith randomly perturbed structures, short timescale molecular dynamics,\nand electronic structure analyses. The dataset comprises three central\ntasks indicative of day-to-day catalyst modeling and comes with predefined\ntrain/validation/test splits to facilitate direct comparisons with\nfuture model development efforts. We applied three state-of-the-art\ngraph neural network models (CGCNN, SchNet, and DimeNet++) to each\nof these tasks as baseline demonstrations for the community to build\non. In almost every task, no upper limit on model size was identified,\nsuggesting that even larger models are likely to improve on initial\nresults. The dataset and baseline models are both provided as open\nresources as well as a public leader board to encourage community\ncontributions to solve these important tasks.