RMSProp

RMSProp?

“Leaky AdaGrad”.
AdaGrad의 lr 소실 문제를 EMA 적용하여 해결.(by Hinton)

Code

epsilon = 1e-7
grad_squared = 0
while True:
	dx = compute_gradient(x)
	grad_squared = decay_rate * grad_squared + (1 - decay_rate) * dx * dx # 여기가 AdaGrad랑 다름!
	x -= learning_rate * dx / (np.sqrt(grad_squared)+ epsilon)

Check

Motivation

gradient가 uneven하여, conservative한 lr이 필요.

SGD 예시에서 봤던, 불균형한 dimension 별 gradient scale.

Idea: “divide lr by moving average of squared gradients..”

RMSProp

Division by running average of squared gradients adjusted per-weight step size

→ Division in $w_{2}$ direction will be large in $w_{1}$ direction will be small.

This allows for increasing the lr compared to vanilla SGD

However: In first iterations, moving average biased towards 0.

Juhyeon's Blog

탐색기

RMSProp

그래프 뷰

Properties

백링크