PPO Clipped Surrogate Objective

Explore how clipping constrains policy updates — visualize all four cases and their gradients

Objective L(θ)
Gradient ∂L/∂r
Both Views
Unclipped: r·Â
Clipped: clip(r)·Â
LCLIP = min(...)
Clip region
LCLIP(θ) = 𝔼t[ min( rt(θ) Ât, clip(rt, 1−ε, 1+ε) Ât ) ]

Parameters

Advantage Ât+1.00
Clip ε0.20
Current rt(θ)1.00

Current Point

Region
Inside clip
LCLIP value
1.00
Lunclip value
1.00
Gradient
 = +1.00

Four Cases

ÂrtEffectGrad
+↑ > 1+εClipZERO
+↓ < 1−εCorrectFULL
↓ < 1−εClipZERO
↑ > 1+εCorrectFULL
±in boundsNormalÂ

Quick Presets