RLHF vs DPO Pipeline Demo

Click "Next Step" to walk through the alignment pipeline

Models in memory4

Training complexityHigh

Online generation?Yes

ExplorationYes

Used byOpenAI, Anthropic

Models in memory2

Training complexityLow

Online generation?No

ExplorationNo (offline)

Used byMeta, open-source