Special Colloquium
John Wilmes
Georgia Institute of Technology
The Complexity of Learning Neural Networks
Abstract: The empirical successes of ``deep learning'' currently lack rigorous theoretical explanation. As a first step, we ask whether data generated by
neural networks with a single hidden layer, smooth activation functions and benign input distributions can be learned efficiently. We give a surprisingly
general polynomial-time analysis of gradient descent when the hidden layer uses unbiased sigmoid gates, exploiting new connections we make with tools from
spherical harmonics. However, when the generating network uses arbitrary biases, the problem appears intractable. We construct a family of simple neural
networks that is hard to learn in the sense that any statistical query algorithm (including all known variants of stochastic gradient descent with any
loss function, for any model architecture) needs an exponential number of queries on data labeled by such a network even using tolerance inversely
proportional to the input dimensionality. Joint work with Le Song, Santosh Vempala, and Bo Xie.
Friday January 19, 2018 at 3:00 PM in SEO 636