[video] Sepp Hochreiter - 2018-05-04

Johannes Kepler University Linz

Image credit: Wikipedia


Deep Learning in Linz (Sepp Hochreiter, Johannes Kepler University Linz)


Deep Learning has emerged as one of the most successful fields of machine learning and artificial intelligence with overwhelming success in industrial speech, language and vision benchmarks. Consequently it evolved into the central field of research for IT giants like Google, facebook, Microsoft, Baidu, and Amazon. Deep Learning is founded on novel neural network techniques, the recent availability of very fast computers, and massive data sets.

The main obstacle to learning deep neural networks is the vanishing gradient problem, which impedes credit assignment to the first layers of a deep network or to early elements of a sequence. Most major advances in Deep Learning can be related to avoiding the vanishing gradient. These advances include unsupervised stacking, ReLUs, residual networks, highway networks, and LSTM networks.

Our current research is on self-normalizing neural networks (SNNs) which automatically avoid the vanishing gradient. SNNs are proved to converge to neuron activations having mean zero and variance one across samples using the Banach fixed-point theorem even under the presence of noise and perturbations.

Generative adversarial networks (GANs) excel in generating images with complex generative models for which maximum likelihood is infeasible. We proved using the theory of stochastic approximations that a two time-scale update rule for training GANs converge under mild assumptions to a local Nash equilibrium. Using an analog to electric fields we derived Coulomb GANs for which it can be shown that there exists only one local Nash equilibrium that is the global one. With Coulomb GANs the learning problem is formulated as a potential field, where generated samples are attracted to training set samples but repel each other.

Recently we focused on reinforcement learning and improved the credit assignment for delayed rewards. For delayed reward, Monte Carlo (MC) has high variance while temporal difference (TD) like Q-learning or SARSA has a very slow correction of its bias. We show that our LSTM-based approach to credit assignment learns exponentially faster than MC and TD for delayed rewards.