培训(气象学)
过程(计算)
歧管(流体力学)
计算机科学
人工智能
地理
工程类
机械工程
气象学
操作系统
作者
Jialin Mao,Itay Griniasty,Han Kheng Teoh,Rahul Ramesh,Rubing Yang,Mark K. Transtrum,James P. Sethna,Pratik Chaudhari
标识
DOI:10.1073/pnas.2310002121
摘要
We develop information-geometric techniques to analyze the trajectories of the predictions of deep networks during training. By examining the underlying high-dimensional probabilistic models, we reveal that the training process explores an effectively low-dimensional manifold. Networks with a wide range of architectures, sizes, trained using different optimization methods, regularization techniques, data augmentation techniques, and weight initializations lie on the same manifold in the prediction space. We study the details of this manifold to find that networks with different architectures follow distinguishable trajectories, but other factors have a minimal influence; larger networks train along a similar manifold as that of smaller networks, just faster; and networks initialized at very different parts of the prediction space converge to the solution along a similar manifold.
科研通智能强力驱动
Strongly Powered by AbleSci AI