Policy Gradient & REINFORCE

Watch a policy learn to navigate a grid through trial and error

Click "Next Step" to run training episodes

                        Episodes: 0
Avg Return: --
Best Return: --
                    

                        ↑: 25% | ↓: 25% | ←: 25% | →: 25%
                    

Agent

Goal (+10)

Wall

Path Taken