Linear Recursive Feature Machines provably recover low-rank matrices

维数之咒降维设计矩阵特征（语言学）计算机科学特征向量秩（图论）人工神经网络估计员人工智能基质（化学分析）算法线性回归机器学习模式识别（心理学）数学统计复合材料哲学组合数学材料科学语言学

作者

Adityanarayanan Radhakrishnan,Mikhail Belkin,Dmitriy Drusvyatskiy

出处

期刊：Proceedings of the National Academy of Sciences of the United States of America [National Academy of Sciences]
日期：2025-03-28 卷期号：122 (13)

链接

arxiv.org arxiv.org nih.govdoi.org

标识

DOI：10.1073/pnas.2411325122

摘要

A fundamental problem in machine learning is to understand how neural networks make accurate predictions, while seemingly bypassing the curse of dimensionality. A possible explanation is that common training algorithms for neural networks implicitly perform dimensionality reduction—a process called feature learning. Recent work [A. Radhakrishnan, D. Beaglehole, P. Pandit, M. Belkin, Science 383 , 1461–1467 (2024).] posited that the effects of feature learning can be elicited from a classical statistical estimator called the average gradient outer product (AGOP). The authors proposed Recursive Feature Machines (RFMs) as an algorithm that explicitly performs feature learning by alternating between 1) reweighting the feature vectors by the AGOP and 2) learning the prediction function in the transformed space. In this work, we develop theoretical guarantees for how RFM performs dimensionality reduction by focusing on the class of overparameterized problems arising in sparse linear regression and low-rank matrix recovery. Specifically, we show that RFM restricted to linear models (lin-RFM) reduces to a variant of the well-studied Iteratively Reweighted Least Squares (IRLS) algorithm. Furthermore, our results connect feature learning in neural networks and classical sparse recovery algorithms and shed light on how neural networks recover low rank structure from data. In addition, we provide an implementation of lin-RFM that scales to matrices with millions of missing entries. Our implementation is faster than the standard IRLS algorithms since it avoids forming singular value decompositions. It also outperforms deep linear networks for sparse linear regression and low-rank matrix completion.

求助该文献

最长约 10秒，即可获得该文献文件

Linear Recursive Feature Machines provably recover low-rank matrices

今日热心研友