Şimşek
Şimşek
Home
Publications
Posts
Light
Dark
Automatic
Linear networks
Deep Linear Networks Dynamics: Low-Rank Biases Induced by Initialization Scale and L2 Regularization
For deep linear networks (DLN), various hyperparameters alter the dynamics of training dramatically. We investigate how the rank of the linear map found by gradient descent is affected by (1) the initialization norm and (2) the addition of $L_2$ …
Cite
×