Emotion Classification + VA Regression 동시 수행 모델 조사 리포트

CHI 2026 Poster 논문 리뷰어 대응을 위한 모델 다양성 확보 전략

1. 배경 및 요구사항

리뷰어 핵심 지적

동일 학습 파이프라인(EmotiEffLib/Savchenko)의 모델 2개만 사용하여 AI 모델 다양성이 부족함.
다양한 training regime, 아키텍처, 학습 데이터에 기반한 모델을 추가하여 AI 평정의 generalizability를 강화해야 함.

모델 선정 기준

Emotional category classification (7~8 basic emotions) + Valence-Arousal regression 동시 수행
Open checkpoint 필수
모델 크기가 과도하지 않을 것
기존 EmotiEffLib와 다른 training pipeline 우선

2. 현재 프로젝트에서 사용 중인 모델

모델	Framework	Backbone	Pre-training	MTL	Pipeline
enet_b0_8_va_mtl	EmotiEffLib	EfficientNet-B0	VGGFace2 → AffectNet	✓	A
mobilevit_va_mtl	EmotiEffLib	MobileViT	VGGFace2 → AffectNet	✓	A
mbf_va_mtl	EmotiEffLib	MobileFaceNet	VGGFace2 → AffectNet	✓	A
emonet	face-analysis	Stacked Hourglass	AffectNet direct	✓	B

문제점: 상위 3개 모델이 모두 동일한 VGGFace2 → AffectNet 파이프라인(Pipeline A). EmoNet만 독립적(Pipeline B).

3. 조사 결과: 사용 가능한 모델

3-1. EmoNet ✅ 이미 프로젝트에 구현됨

항목	내용
논문	Toisoul et al., Nature Machine Intelligence 2021
기관	Samsung AI Center Cambridge + Imperial College London
GitHub	face-analysis/emonet
태스크	8 Emotion Classes + Valence + Arousal (동시)
Backbone	Stacked Hourglass Networks (~100M params)
학습 데이터	AffectNet 직접 학습 (VGGFace2 pre-training 없음)
Checkpoint	emonet_5.pth, emonet_8.pth (CC BY-NC-ND)
Pipeline	B — AffectNet direct training
상태	`face_emotion_analysis/models/emonet.py`에 이미 구현됨
Action	즉시 실험에 포함 가능

차별점 vs EmotiEffLib: AffectNet direct training, VGGFace2 pre-training 없음, 완전히 다른 아키텍처(Hourglass vs EfficientNet)

3-2. wtomin Multitask-CNN ✅ 체크포인트 공개 확인, 통합 코드 작성 완료

항목	내용
논문	Deng et al., FG-2020 ABAW Competition Solution (Winner)
GitHub	wtomin/Multitask-Emotion-Recognition
태스크	7 Expression + Valence-Arousal + 8 AU (3가지 동시)
Backbone	ResNet-50 (~25M params)
학습 데이터	Aff-Wild2 (video) + AffectNet + DISFA + ExpW + AFEW-VA
학습 전략	Teacher-student distillation + Incomplete label multi-task learning
Checkpoint	5 CNN student models (0.pth ~ 4.pth)
Pipeline	C — Aff-Wild2 + Multi-DB + Incomplete Labels
상태	`face_emotion_analysis/models/wtomin_mtl.py` 통합 완료
주의	원본 다운로드 링크(HKUST SharePoint) 만료 가능 → 저자 연락 필요

차별점 vs EmotiEffLib:

학습 데이터: Multi-database (5+ datasets) vs Single-dataset (AffectNet only)
학습 전략: Semi-supervised + Incomplete label fusion vs Standard supervised
Pre-training: FER+ → Multi-DB fine-tune vs VGGFace2 → AffectNet
VA 인코딩: 20-bin discretization + softmax → weighted sum vs Direct regression

3-3. Behaviour4All ⚠️ 코드 확보 필요

항목	내용
논문	Kollias et al., arXiv 2024.09 (2409.17717)
기관	iBUG, Queen Mary University of London
태스크	Face Localization + VA + Expression (7) + 17 AU (4가지 동시!)
Backbone	FacebehaviourNet (23.1M params, 3.8 GFLOPs)
학습 데이터	12개 대규모 in-the-wild 데이터셋, 5M+ 이미지
성능	VA CCC: AffectNet 62.0% (original) / 78.1% (new protocol)
Pipeline	D — 12 datasets, Distribution Matching
상태	논문에서 “open-source” 명시하나 GitHub 저장소 미발견
Action	저자(d.kollias@qmul.ac.uk)에게 직접 코드/체크포인트 요청

차별점: Fairness 관점에서 다양한 인구통계 데이터로 학습 → manuscript 주제와 직결

4. 조사 결과: 탈락한 후보들

모델	탈락 사유
LibreFace (WACV 2024)	VA regression 미지원 — AU + Expression만
OpenFace 3.0	Continuous VA 출력 불확실
MA-Net, POSTER V2, DAN	Classification only, VA head 없음
MT-EmotiDDAMFN	공개 미확인 + 동일 VGGFace2 → AffectNet 파이프라인

5. VLLM 참고 정보 (실험 미포함)

모델	규모	특징
Emotion-LLaMA (NeurIPS 2024)	~7B+	Multimodal, GPT-4V 대비 +8.52%
InternVL2 (CVPR 2024 Oral)	1B~241B	Zero-shot prompting 가능
GPT-4V/4o	Closed	V r=0.87, A r=0.72 (zero-shot)

→ Static face image 분석에는 과도하고, checkpoint 확보 또는 재현성 문제로 실험에서 제외.

6. Pipeline 다양성 분석

Pipeline A: EmotiEffLib (Savchenko)  → VGGFace2 → AffectNet MTL
  Models: enet_b0_8_va_mtl, mobilevit_va_mtl, mbf_va_mtl

Pipeline B: EmoNet (Toisoul/Pantic) → AffectNet direct training
  Models: emonet_8

Pipeline C: wtomin MTL (Deng/Shi)   → FER+ → Aff-Wild2 + Multi-DB + Incomplete Labels
  Models: Multitask-CNN (5 student ensemble)

Pipeline D: Behaviour4All (Kollias) → 12 datasets, 5M+ images, Distribution Matching
  Models: FacebehaviourNet  [확보 시도 중]

최소 3개 독립 pipeline (A, B, C)으로 리뷰어의 “동일 파이프라인” 지적 해소 가능.
Behaviour4All까지 확보하면 4개 pipeline으로 매우 강력한 대응.

7. 최종 추천 모델 조합

Tier 1: 확실히 사용 가능

#	모델	Pipeline	Backbone	Params	상태
1	enet_b0_8_va_mtl	A (EmotiEffLib)	EfficientNet-B0	~5M	기존 사용 중
2	mobilevit_va_mtl	A (EmotiEffLib)	MobileViT	~6M	기존 사용 중
3	emonet	B (Samsung AI)	Stacked Hourglass	~100M	이미 구현됨
4	wtomin_mtl	C (FG-2020 ABAW)	ResNet-50	~25M	통합 코드 완료

Tier 2: 확보 시도

#	모델	필요 조치
5	Behaviour4All	저자에게 코드/checkpoint 요청

8. Action Items

우선순위	항목	상태
🔴 즉시	EmoNet(emonet_8) 실험 실행	이미 구현됨, 바로 실행 가능
🔴 즉시	wtomin MTL 체크포인트 확보	다운로드 링크 만료 → 저자 연락
🟡 단기	wtomin 체크포인트 확보 후 inference 테스트	통합 코드 준비 완료
🟡 단기	Behaviour4All 저자에게 이메일	d.kollias@qmul.ac.uk
🟢 실험 후	전체 모델 CCC/ICC 메트릭으로 human rating agreement 측정	-
🟢 논문 수정	모델 선정 근거에 “다양한 training pipeline” 명시	-

9. 논문에 포함할 모델 비교 테이블 (draft)

Model	Architecture	Pre-training	Training Data	Training Strategy	EXPR Classes	VA Output
EmotiEffLib (enet_b0)	EfficientNet-B0	VGGFace2	AffectNet	Supervised MTL	8	Regression
EmotiEffLib (mobilevit)	MobileViT	VGGFace2	AffectNet	Supervised MTL	8	Regression
EmoNet	Stacked Hourglass	None	AffectNet	End-to-end MTL	8	Regression
wtomin MTL	ResNet-50	FER+	Aff-Wild2 + Multi-DB	Teacher-Student Distill.	7	20-bin Discretized
Behaviour4All*	FacebehaviourNet	None	12 datasets (5M+)	Distribution Matching	7	Regression

*확보 시도 중

10. 참고 자료

확정 사용 모델

EmoNet GitHub — Nature MI 2021
wtomin MTL GitHub — FG-2020
EmotiEffLib GitHub — Savchenko’s HSEmotion

확보 시도

Behaviour4All arXiv — Kollias et al. 2024

참고 VLLM

Emotion-LLaMA GitHub — NeurIPS 2024
InternVL GitHub — CVPR 2024 Oral

탈락 후보

LibreFace — VA 미지원
OpenFace 3.0 — VA 미확인
awesome-SOTA-FER — FER 모델 종합 목록

ABAW Competition

Report generated: 2026-02-14
For CHI 2026 Poster revision

Juhyeon's Blog

탐색기

model_survey_report