Particle Physics Seminar: Charting the Topography of the Neural Network Landscape with Thermal-Like Noise
Noam Levi, TAU
The training of neural networks is a challenging optimization problem, and understanding the landscape that guides the optimization process remains an open problem in computer science. In our research, we used Statistical Mechanics methods, including phase-space exploration with Langevin dynamics, to study this landscape for networks whose number of parameters far exceeds the number of data points, performing a classification task on random and real data. By analyzing the fluctuation statistics, we were able to infer a clear geometric description of the convergence region, much like in thermal dynamics at a constant temperature. We discovered that the convergence region is a low-dimensional manifold, and its dimension can be readily obtained from the fluctuations. The number of data points near the classification decision boundary controls this dimension. We also found that a quadratic approximation of the loss near the minimum is inadequate due to the exponential nature of the decision boundary and the flatness of the low-loss region. Our simplified loss model explains this behavior and reproduces the observed fluctuation statistics. I will explain how our findings can have implications for the theoretical understanding of deep learning optimization, especially for some less understood phenomena such as grokking and double descent.
Seminar Organizer: Dr. Adi Ashkanzi