Maximilian Karl joined the Volkswagen Group ML Research Lab as a research scientist in 2017. He holds a Masters degree in Robotics, Cognition, Intelligence from the Technical University of Munich. In 2020 he received his PhD from Technical University of Munich, under the supervision of Patrick van der Smagt. During his PhD he focused on developing new methods for efficently computing the intrinsic motivation empowerment, making the application on real robotic systems possible. This induced the development of a new method capable of system indentification by learning latent state space models from raw sensor data, which also found application outside the area of intrinsic motivation reasearch. The combination of learned model dynamics and the development of a variational approximation of empowerment made it possible to make a real quadrocopter fly trough intrinsic motivation.
His PhD thesis unsupervised control explores the usage of intrinsic motivation for controlling real robotic systems. How should robots behave when no external goal or reward is provided? Biological systems seem to be capable of learning and generating complex movements and intelligent behaviour without the need of an external teacher. Replicating such behaviour in robots previously required a complicated design and tuning of cost functions. Instead, in this thesis the use of empowerment is proposed and demonstrated. This universal formulation tries to mimic the internal drive of biological systems. The thesis also explores ties of empowerment to earlier works about the definition of life from Erwin Schrödinger, clarifying the connection between the information theoretic definition of intrinsic motivation and entropy production from Physics. Formally empowerment is defined as the maximum possible information transfer through the agent and its environment. States that make higher transfer of information possible are those which also maximise the influence an agent has over future states. Examples for situations with higher influence include upright walking or balancing, avoidance of obstacles, the use of tools but also the general organisation or restructuring of the surroundings. However the computation of empowerment was previously too expensive in environments with continuous valued state and action spaces, preventing the use for real robotic system. The thesis presents several improvements on dynamics model learning, efficient channel capacity computation and optimal control, for making the application to real robots possible. Results on a selection of simulated robots can be found below, followed by the demonstration on a real quadrocopter.
A bipedal robot controlled with empowerment as cost function tries to stay upright and balances on his legs. In the following video you can see a bipedal robot in its initial learning phase. When the robot looses its balance a clear reduction in influence can be observed in the form of reduced empowerment values.
When applied to multiple agents in a restricted space a swarm-like behaviour emerges. The individual agents try to avoid collisions with walls but also with each other. An interaction with other agents increases the overall influence on the environment, making the agent gather in the centre of the space.
Finally we applied the algorithm to a real robotic system: a quadrocopter with local sensing and limited compute capabilities. The video shows how the intrinsically motivated agent initiates a take off but also tries to keep hovering and avoid collisions with walls.
- intrinsic motivation
- unsupervised learning of state space models
- approximate inference for stochastic optimal control
- embedded systems and robotics