RMSProp?
“Leaky AdaGrad”.
AdaGrad의 lr 소실 문제를 EMA 적용하여 해결.(by Hinton)
Code
epsilon = 1e-7 grad_squared = 0 while True: dx = compute_gradient(x) grad_squared = decay_rate * grad_squared + (1 - decay_rate) * dx * dx # 여기가 AdaGrad랑 다름! x -= learning_rate * dx / (np.sqrt(grad_squared)+ epsilon)
Check
Motivation
- gradient가 uneven하여, conservative한 lr이 필요.
- SGD 예시에서 봤던, 불균형한 dimension 별 gradient scale.
- Idea: “divide lr by moving average of squared gradients..”
RMSProp
- Division by running average of squared gradients adjusted per-weight step size
- → Division in direction will be large in direction will be small.
- This allows for increasing the lr compared to vanilla SGD
- However: In first iterations, moving average biased towards 0.

