Bengaluru, India
Mon - Fri : 09:00 - 17:00
Mon - Fri : 09:00 - 17:00

Beginner to Advance: Having fun with Reinforcement learning #1

Beginner to Advance: Having fun with Reinforcement learning #1

This series is about making the beginner-to-advance topics in reinforcement learning easy for everyone. We are starting with the basics however if you need to jump to a later part in the series then please go ahead.

What is Reinforcement Learning ?

Reinforcement learning is learning what to do – how to map situations to actions so as to maximize a numerical reward signal. The learner is not told which actions to take, but instead must discover which actions yield the most reward by trying them. – Richard S. Sutton

It is pretty simple, an agent (you will create) takes an action in an environment. As feedback, the agent receives a reward & next state of the environmentHere agent will learn by trial & error approach while interacting with the environment.

Understand with an Example:

Lets understand Reinforcement Learning from the point of view of Conor Mcgregor.

Conor Mcgregor vs Khabib Nurmagomedov UFC 229 🙂

Agent: Conor Mcgregor.

Environment: Octagon or UFC fighting ring, I am including Khabib to be a part of the environment.

Reinforcement Learning: Conor Mcgregor vs Khabib Nurmagomedov

Action: Action taken by the agent in a point in time like abusing khabib “#@$*#%”

Reward: Agent received a punch representing a negative reward here.

Next State: Mach continue for the agent to take more actions in the environment.

The learner is not told which actions to take but instead must discover which actions yield the most reward by trying them. Two distinguishing features of reinforcement learning are trial-and-error search and delayed reward-Richard S. Sutton

Couple of Pointers

  1. Reinforcement learning is different from unsupervised learning as reinforcement learning is trying to maximize a reward signal instead of trying to find hidden structure.
  2. Agent is interacting over time with its environment to achieve
    a goal.
  3. Exploration & Exploitation Trade-off: To maximize the reward, an agent should use (exploit) those actions that had worked in the past.

However, to be able to know all the actions resulting in maximized reward, the agent should try (explore) new actions which were not previously taken.

These new actions may lead the agent to a new unexplored major reward (big cheese)or may lead the mouse to the hungry cat 😐

4. Delayed reward:

Andras Adorjan vs Istvan Polgar, 1972- While playing chess a grad master may sacrifice a queen to gain an advantage leading to a victory after 10 moves giving him a delayed reward.

Queen sacrifice

Lets continue in the part 2, where we will learn about policy & value functions.

References –

  1. Reinforcement Learning, Second Edition by Richard S. Sutton & Andrew G. Barto

If you have any comment or question, then do write it in the comment.

To see similar post, follow me on Medium & Linkedin.

If you enjoyed then Clap it! Share it! Follow Me!!

Join Discussion


  • bedava December 23, 2020 at 2:19 pm

    I really love your site.. Great colors & theme. Morganne Hermie Baudelaire

  • web-dl December 23, 2020 at 4:01 pm

    Right here is the right website for anyone who wants to understand this topic. Silva Fremont Mayeda

  • online February 6, 2021 at 6:26 pm

    Your style is so unique in comparison to other folks I’ve read stuff from. Many thanks for posting when you’ve got the opportunity, Guess I’ll just book mark this blog. Livia Keith Callum

  • anime February 7, 2021 at 2:50 pm

    Really appreciate you sharing this article post. Really looking forward to read more. Fantastic. Morena Codie Lauter

  • anime February 7, 2021 at 7:24 pm

    Some genuinely prime articles on this website , saved to fav. Kate Hartley Gora

Your Comment

Leave a Reply Now

Your email address will not be published. Required fields are marked *

ten + nineteen =