PPO Training Loop Simulation
Watch each phase of PPO: rollout → GAE advantages → multi-epoch clipped optimization → KL monitoring
Step 1Rollout
Step 2GAE
Step 3Normalize
Step 4Optimize
Step 5KL Check
Iteration 1 · Phase 1/5
Epoch -/3
Policy & Trajectories
Training Curves
Advantage Distribution