Sep 27

First, the below video can give a brief overview about RoboCup which this article is about:



The final 2009 2D simulation game

Abstract:
For the requisite of robotic implementations in many industries and daily human lives, numerous research has been done widely by world class researchers. Although many implementations have been applied in different fields, robots are still not clever enough to do complicated commissions. That is the reason RoboCup has been introduced in 1993 as a chance for researchers to investigate more about robotics tasks in a sophisticated game - football.

RoboCup is an annual competition for researchers from all over the world to do research about robot football which requires most complicated individual skills and multi-agent cooperation. Although a large amount of research about individual skills such as dribbling, passing, tackling skills has been done, further improvements are always essential for the robots to play a better football.

In this thesis, three problems in RoboCup 2D simulation league are investigated. The first is the path planning problem where the agent should find a path to a target position using 4 actions, i.e. dashing forward, dashing backward, dashing to the left and dashing to the right. The second is to improve the dribbling skill in order for the agent to move to a better position while keeping the ball still and intercepted by the opponents. The third is to help the goalie better positioning in some emergent situations where an opponent is trying to shoot the ball. All experiments were undertaken within the RoboCup 2D simulation version 13.2.0. The results have shown successful and partially successful approaches which mean considerable contribution in the field.

Click on the link to dowload the report .

Even god made mistakes, please let me know what mistakes I have made.

  • Share/Save/Bookmark
Jun 05

Q-learning [Watkins, 1989] is one of the most popular reinforcement learning methods. One of the advantages of Q-learning is its ability to compare the expected utility of the available actions without requiring a model of the environment.

The content of Q-learning is inside the below equation:

Q_{t+1}(a, s)=(1-\alpha_{t})Q_{t}(a,s)+\alpha_{t}[r_{t}(s)+\gamma\max_{a^{'}}{Q_{t}(a',s')}]

Where:

  • Q_{t}(a,s) is the Q-value at time t, state s with action a.
  • r_{t} is the reward.
  • \alpha is the learning rate. The learning rate determines how fast and how important the new information is to be learned. If \alpha is 0, the agent does not learn anything. If \alpha is 1, only the new information is considered and all old information is discarded.
  • \gamma is the discount factor. The discount factor is in range [0..1] and is used to weight new term reinforcement more heavily than distant future reinforcement. The closer \gamma is to 1, the greater the weight of future reinforcement.

So what does the equation mean ? We now assume \alpha=1 and \gamma=1, then the equation becomes:

Q_{t+1}(a, s)=r_{t}(s)+max_{a'}{Q_{t}(a',s')}

It is now easy to see that the Q-value of state-action pair (a,s) is equal to the maximum Q-value of next state (for all next actions) adding the reward of action a. The learning method is obviously a dynamic algorithm that gives the optimal Q-value for state-action pairs.

When the discount factor is enabled (<1),  it makes the reward reduce by time and hence the total reward at time t is given by:

R_{t}=r_{t}+\gamma r_{t+1} + \gamma^2 r_{t+2} + \dots + \gamma^n r_{t+n} + \dots

The bellow java applet is a very good illustration of Q-learning (thank to Vander B. Frank):

For the detail of how the applet works, please reach the document of Vander B. Frank through this PDF.

Coming soon: how Q-learning is implemented to improve the dribbling skill in RoboCup 2D Simulation (my MSc project).



Bibliography

1. Wikipedia: Q-learning [http://en.wikipedia.org/wiki/Q-learning].

2. Vander B. Frank: Q-learning. IRIDIA, Universit Libre de Bruxelles. 7, 2003. [PDF]

3. Watkins, C.J.C.H. (1989). Learning from Delayed Rewards. PhD thesis, Cambridge University, Cambridge, England

Even god made mistakes, please let me know what mistakes I have made.

  • Share/Save/Bookmark
Tagged with:
Jun 05

I love these 3 most !

  • Share/Save/Bookmark
Tagged with:
Jun 05

This is where I would like to share my knowledge in the fields including web technology, artificial intelligence, mobile communication & device !

  • Share/Save/Bookmark
preload preload preload