# 策略梯度推导参考章节

- 来源：Nathan Lambert (@natolambert)
- 发布时间：2026-06-13 01:28
- AIHOT 分数：46
- AIHOT 链接：https://aihot.virxact.com/items/cmqb7j2w7009sslruksp1w73k
- 原文链接：https://x.com/natolambert/status/2065486388641018241

## AI 摘要

策略梯度推导：
https://rlhfbook.com/c/06-policy-gradients#deriving-the-policy-gradient

## 正文

derivation of policy gradient：
https://rlhfbook.com/c/06-policy-gradients#deriving-the-policy-gradient

### 引用推文

> Harsh Bhatt：derivation of Policy Gradient.