Softmax classifier(=multinomial logistic regression)

classification에서 모델의 output으로 나온 score를 확률로 해석하고 싶은 경우, softmax function을 사용함. 기존 score들을 확률 값으로 변환하기 위해 exp function을 사용함. 즉, 이를 기존 classifier 마지막 layer에 붙이면, 각 클래스에 해당하는 개별 확률 값들을 얻을 수 있다.

→ 즉, softmax의 return은 모든 클래스에 해당하는 probability mass function

cross-entropy loss = softmax loss

classification task 시 모델 학습은 모델이 예측한 output(p.m.f.) 이 label probability distribution에 가까워지도록 이루어져야 한다. 이는 두 확률 분포 간 유사도를 얘기해주는 KL divergence 관점에서 아래와 같이 visualize 된다. (KL- divergence가 작아진다.)

Detail Process

조금 자세히 보자면, 각 데이터별 모든 클래스에 대한 score를 exp function으로 양수화 시킨다.
이후, 이를 normalization하여 총합 1로 맞춰준다.
이 때, 기존 score라 하던, exp 지수에 들어가는 값을 logit이라고 한다.
이는 Maximum Likelihood Estimation 를 따른다.

Problem

Q1. What is the min/max possible value of softmax loss $L_{i}$ ?
→ Since $L_{i}$ is formed as the negative log of probability, the range is $[0, inf)$
Q2. At initialization all $s_{j}$ will be approximately equal; what is the sotmax loss $L_{i}$ , assuming X classes?
→ Since the all classes are estimated as the same probabilities, each class is equally $\frac{1}{C}$ therefore, each $L_{i} = log C$

Softmax vs SVM loss

Softmax vs SVM loss

$L_{i} = - log (\frac{e ^{s_{y_{i}}}}{\sum _{j} e ^{s_{j}}})$
$L_{i} = \sum_{j \neq = y_{i}} max (0, s_{j} - s_{y_{i}} + 1)$

Following the below scenario, calculate each losses.
$[10, - 2, 3]$
$[10, 9, 9]$
$[10, - 100, - 100]$
and $y_{i} = 0$

and What if the 10 → 20?

Answer?

Softmax loss:
SVM loss:

Softmax is sensitive however, SVM does not.

→ 하고 싶은 말은 즉, SVM은 margin인 1 값 정도 근처면 학습이 되지 않지만, softmax는 계속 가능.

Softmax

def softmax(x):
	exp_x = np.exp(x)
	sum_exp_x = np.sum(exp_x)
	return exp_x / sum_exp_x

Juhyeon's Blog

탐색기

Softmax classifier

Detail Process

Softmax vs SVM loss

그래프 뷰

목차

Properties

백링크