Summary
“Count-based의 전역 정보 + Word2Vec의 학습적 효율성을 결합한 모델”
Count-based vs Direct prediction
Multi column
Count based LSA, HAL (Lund & Burgess), COALS, Hellinger-PCA (Rohde et al, Lebret & Collobert)
- Fast training
- Efficient usage of statistics
- Primarily used to capture word similarity
- Disproportionate importance given to large counts
Direct Prediction Skip-gram/CBOW (Mikolov et al) NNLM, HLBL, RNN (Bengio et al; Collobert & Weston; Huang et al; Mnih & Hinton)
- Scales with corpus size
- Inefficient usage of statistics
- Generate improved performance on other tasks
- Can capture complex patterns beyond word similarity
Important
“Crucial insight: Ratios of co-occurrence probabilities can encode meaning components”
- 를 보면, ice랑 더 관련 있음. 이러한 확률 비를 vector space에 모델링 하는 것이 GloVe
위의 걸 어떻게 Design?
Q: How can we capture ratios of co-occurrence probabilities as linear meaning components in a word vector space (i.e., embedding ratios of co-occurrence probabilities)?
A: Log-bilinear model
with vector differencs
GloVe
Summary
count base, direct prediction을 합치니,
- fast training
- scalable to huge copora
- good performance even with small corpus and small vectors
