规范化(社会学)
计算机科学
深层神经网络
平滑度
人工智能
机器学习
理论(学习稳定性)
人工神经网络
过程(计算)
数学优化
数学
人类学
操作系统
数学分析
社会学
作者
Shibani Santurkar,Dimitris Tsipras,Andrew Ilyas,Aleksander Mądry
摘要
Batch Normalization (BatchNorm) is a widely adopted technique that enables faster and more stable training of deep neural networks (DNNs). Despite its pervasiveness, the exact reasons for BatchNorm's effectiveness are still poorly understood. The popular belief is that this effectiveness stems from controlling the change of the layers' input distributions during training to reduce the so-called internal covariate shift. In this work, we demonstrate that such distributional stability of layer inputs has little to do with the success of BatchNorm. Instead, we uncover a more fundamental impact of BatchNorm on the training process: it makes the optimization landscape significantly smoother. This smoothness induces a more predictive and stable behavior of the gradients, allowing for faster training.
科研通智能强力驱动
Strongly Powered by AbleSci AI