Computer Sciences Colloquium - Exponentially vanishing sub-optimal local minima in multilayer neural networks

Daniel Soudry​

26 March 2017, 11:00 
Schreiber Building, Room 006 
Computer Sciences Colloquium

Background:

 

Multilayer neural networks, trained with simple variants of stochastic gradient descent (SGD), have achieved state-of-the-art performances in many areas of machine learning. It has long been a mystery why does SGD work so well – rather than converging to sub-optimal local minima with high training error (and therefore, high test error).

 

 

Results:

 

We examine a neural network with a single hidden layer, quadratic loss, and piecewise linear units, trained in a binary classification task on a standard normal input. We prove that the volume of differentiable regions of the empiric loss containing sub-optimal differentiable local minima is exponentially vanishing in comparison with the same volume of global minima, given "mild" (polylogarithmicly) over-parameterization. This suggests why SGD tends to converge to global minima in such networks.

 
Tel Aviv University makes every effort to respect copyright. If you own copyright to the content contained
here and / or the use of such content is in your opinion infringing Contact us as soon as possible >>