李普希茨连续性
仿射变换
稳健性(进化)
规范(哲学)
数学
数学优化
计算机科学
应用数学
纯数学
政治学
法学
基因
生物化学
化学
作者
Cem Anil,James M. Lucas,Roger Grosse
出处
期刊:Cornell University - arXiv
日期:2018-01-01
被引量:1
标识
DOI:10.48550/arxiv.1811.05381
摘要
Training neural networks under a strict Lipschitz constraint is useful for provable adversarial robustness, generalization bounds, interpretable gradients, and Wasserstein distance estimation. By the composition property of Lipschitz functions, it suffices to ensure that each individual affine transformation or nonlinear activation is 1-Lipschitz. The challenge is to do this while maintaining the expressive power. We identify a necessary property for such an architecture: each of the layers must preserve the gradient norm during backpropagation. Based on this, we propose to combine a gradient norm preserving activation function, GroupSort, with norm-constrained weight matrices. We show that norm-constrained GroupSort architectures are universal Lipschitz function approximators. Empirically, we show that norm-constrained GroupSort networks achieve tighter estimates of Wasserstein distance than their ReLU counterparts and can achieve provable adversarial robustness guarantees with little cost to accuracy.
科研通智能强力驱动
Strongly Powered by AbleSci AI