Q-Learning for a Bipedal Walker

Frederik Tollund Juutilainen, Younes Haroun Bakhti & Sara Almeida Santos Daugbjerg

Studenteropgave: Semesterprojekt

Abstrakt

This project studies how a bipedal body can learn forward movement, within a simu- lated 2D physics environment through the reinforcement learning method Q-Learning. Q-Learning is a model-free form of reinforcement learning, which can be used to find the optimal policy for solving a Markov Decision Process. Through experimentation and parametrisation, it was possible to create several types of test-cases that could be used for further testing the extent of the learning agents abilities, in order to analyse the re- sults and optimise the designed software. We could conclude, from the findings, that the learning algorithm was unable to find a stable pattern of state-actions, which resulted in forward movement. This could be due to the high complexity and large number of states, which restricted performance of the designed software. However, in the less com- plex cases, the agent demonstrated that, even in a short amount of time, it could find a stable position that would yield a high enough reward to be regarded as successful and was thereby succesful in learning from experience, albeit at a smaller scale.

UddannelserDatalogi, (Bachelor/kandidatuddannelse) Bachelor el. kandidat
SprogEngelsk
Udgivelsesdato29 maj 2015
VejledereOle Torp Lassen

Emneord

  • Bipedal
  • Markov Decision Processes
  • Artificial Intelligence
  • Reinforcement learning
  • Q-Learning
  • Walker
  • Machine Learning