Ctrl/Cmd + P and select "Save as PDF".
function AStar(start, goal, h)
OPEN β priority queue with {start}, g(start) = 0
CLOSED β empty set
while OPEN is not empty do
n β node in OPEN with lowest f(n) = g(n) + h(n)
if n = goal then
return reconstruct_path(n)
move n to CLOSED
for each successor m of n do
tentative_g β g(n) + cost(n, m)
if tentative_g < g(m) then
g(m) β tentative_g
parent(m) β n
add or update m in OPEN
return failure // no path exists| Classical Search | Reinforcement Learning | |
|---|---|---|
| Model of world | Known exactly | Unknown β learned from interaction |
| Outcomes | Deterministic (typically) | Stochastic |
| Objective | Reach a goal state | Maximize cumulative reward |
| Approach | Compute optimal plan | Learn optimal behavior from experience |
| MDP Concept | LLM Equivalent |
|---|---|
| State $s$ | Prompt + tokens generated so far |
| Action $a$ | Next token from vocabulary (|V| β 100K) |
| Policy $\pi_\theta(a|s)$ | Transformer's softmax output distribution |
| Transition $P(s'|s,a)$ | Deterministic: append chosen token to context |
| Reward $R$ | Score at end of generation (human, reward model, or verifier) |
| Episode | One complete generated response |