Download

View publication

Abstract

The recipe of success behind the deep learning phenomenon has been the combination of neural networks and gradient-based optimisation. Understanding the fundamental behaviour of optimisation in this context has lagged behind the empirical success of deep learning. We aim to add to a growing set of results trying to understand the behaviour of gradient descent. We find a continuous time flow, called \textit{the principal flow} to help describe gradient descent dynamics. Unlike existing flows, the principal flow better captures the dynamics of gradient descent and can be used to explain its unstable or oscillatory behaviour. The principal flow is depends on the eigendecomposition of the Hessian, which allows us to shed light on recently observed behaviours in deep learning, such as the edge of stability results. Using our new understanding of instability, we propose an adaptive learning rate which leads to stable training and can find the right balance in the stability-performance trade-off.