본문으로 건너뛰기

Juhyeon's Blog

❯

Reinforcement Learning

❯

❯

Value

2026년 4월 13일1분 분량

Summary

$V_{π} (s) = E [\sum_{k = o}^{\infty} γ^{k} r_{t + k + 1} ∣ S_{t} = s]$
정책 $π$ 하에서 상태 $s$ 시작 시 할인된 누적 보상의 기대값.
미래를 예측한 “가치 평가”

Value는 return의 기대값

공유하기

그래프 뷰

Properties

No properties

백링크

The Student's Guide to Cognitive NeuroScience
Memory
Architecture
Benchmarks
LLMs
Fundamentals
self-consciousness
Theory of mind
Vision
Model-Free RL

Created with Quartz v4.5.2 © 2026

GitHub
Blog