Policy Gradient & REINFORCE

Watch a policy learn to navigate a grid through trial and error

Click "Next Step" to run training episodes

Episode History

Training Stats

Episodes: 0
Avg Return: --
Best Return: --

Policy at Agent Cell

↑: 25% | ↓: 25% | ←: 25% | →: 25%
Agent
Goal (+10)
Wall
Path Taken