CLI Reference

main.py의 모든 명령어와 옵션을 정리한 레퍼런스 문서.

uv run python main.py <command> [OPTIONS]

Commands Overview

Command	Description	주요 용도
`infer`	VLM inference 실행	전체/특정 모델로 1,440장 추론
`analyze`	통계 분석 실행	VLM-Human 일치도, bias 검정
`visualize`	시각화 플롯 생성	scatter, Bland-Altman, confusion matrix
`sample`	소규모 샘플 추론 + Excel	빠른 sanity check (5~10장)
`test-retest`	검사-재검사 신뢰도	greedy 결정론성 / stochastic 안정성 검증
`pipeline`	infer → analyze → visualize	전체 파이프라인 일괄 실행

infer

VLM inference를 실행하여 이미지별 emotion, valence, arousal을 예측한다.

uv run python main.py infer [OPTIONS]

Options

Option	Short	Type	Default	Description
`--config`	`-c`	PATH	`configs/default.yaml`	설정 YAML 경로
`--model`	`-m`	TEXT	(all enabled)	특정 모델만 실행 (name 또는 backend)
`--output`	`-o`	PATH	`outputs/`	출력 디렉토리 오버라이드
`--seed`	`-s`	INT	(none)	재현성을 위한 랜덤 시드
`--no-wandb`		FLAG	`false`	wandb 로깅 비활성화

Examples

# 설정 파일에서 enabled=true인 모든 모델 실행
uv run python main.py infer
 
# PaliGemma2만 실행
uv run python main.py infer -m paligemma2
 
# LLaVA만 실행, wandb 비활성화
uv run python main.py infer -m llava --no-wandb
 
# 출력 디렉토리 변경
uv run python main.py infer -o outputs/run_01
 
# 커스텀 설정 파일 사용
uv run python main.py infer -c configs/small_models.yaml

Output Files

outputs/
├── {model}_predictions.json          # 이미지별 예측 결과
├── {model}_attention_{image_id}.npz  # Cross-modal attention (capture=true 시)
└── {model}_logits_{image_id}.npz     # Dark knowledge (save_attention=true 시)

Inference Flow

각 이미지에 대해 3-step sequential prompting을 수행한다:

Step 1: Emotion → "What is the facial expression?"
Step 2: Valence → "You identified this face as {emotion}. How pleasant?"
Step 3: Arousal → "You identified this face as {emotion} with pleasantness {valence}. How intense?"

Memory Management

모델은 순차적으로 실행되며, 모델 간 명시적 메모리 해제 (unload_model() → gc.collect() → torch.mps.empty_cache())
50장마다 MPS 캐시를 플러시하여 메모리 단편화 방지
attention.capture: false (default.yaml 기본값)이면 ViT hook 미등록, attention 텐서 미생성

analyze

VLM 예측 결과와 사람 평정을 비교하는 통계 분석을 수행한다.

uv run python main.py analyze [OPTIONS]

Options

Option	Short	Type	Default	Description
`--config`	`-c`	PATH	`configs/default.yaml`	설정 YAML 경로
`--predictions`	`-p`	PATH	`outputs/`	prediction JSON이 있는 디렉토리

Examples

# 기본 분석 (outputs/ 디렉토리의 prediction 파일 사용)
uv run python main.py analyze
 
# 특정 prediction 디렉토리 지정
uv run python main.py analyze -p outputs/run_01

Analysis Contents

Human Inter-rater Reliability (baseline ceiling)
- Valence/Arousal ICC(2,k) via pingouin
VLM-Human Agreement (per model)
- Emotion Cohen’s kappa
- Valence/Arousal ICC(2,k)
- Bland-Altman bias (mean diff)
Systematic Bias Detection (per emotion)
- Paired t-test / Wilcoxon
- Cohen’s d effect size

visualize

분석 결과를 시각화 플롯으로 생성한다.

uv run python main.py visualize [OPTIONS]

Options

Option	Short	Type	Default	Description
`--config`	`-c`	PATH	`configs/default.yaml`	설정 YAML 경로
`--predictions`	`-p`	PATH	`outputs/`	prediction JSON이 있는 디렉토리

Examples

uv run python main.py visualize
uv run python main.py visualize -p outputs/run_01

Generated Figures

outputs/figures/
├── {model}_scatter_valence.png    # Human vs VLM valence scatter
├── {model}_scatter_arousal.png    # Human vs VLM arousal scatter
├── {model}_ba_valence.png         # Bland-Altman plot (valence)
├── {model}_ba_arousal.png         # Bland-Altman plot (arousal)
├── {model}_boxplot_valence.png    # Per-emotion valence box plot
├── {model}_boxplot_arousal.png    # Per-emotion arousal box plot
└── {model}_confusion.png          # Emotion confusion matrix

sample

소수의 이미지에 대해 빠르게 추론하고 결과를 Excel로 내보낸다. Sanity check 및 attention heatmap 확인용.

uv run python main.py sample [OPTIONS]

Options

Option	Short	Type	Default	Description
`--n`	`-n`	INT	`5`	샘플 이미지 수
`--model`	`-m`	TEXT	`paligemma2`	사용할 모델
`--config`	`-c`	PATH	`configs/default.yaml`	설정 YAML 경로
`--output`	`-o`	PATH	`outputs/`	출력 디렉토리 오버라이드
`--seed`		INT	`42`	샘플링 랜덤 시드
`--no-wandb`		FLAG	`false`	wandb 로깅 비활성화

Examples

# 5장 기본 샘플 (PaliGemma2)
uv run python main.py sample
 
# 10장 LLaVA 샘플, 시드 변경
uv run python main.py sample -n 10 -m llava --seed 123
 
# wandb 없이 빠른 체크
uv run python main.py sample -n 3 -m paligemma2 --no-wandb

Sampling Strategy

Phase 1: 각 emotion에서 1장씩 선택 (최소 7장으로 모든 정서 커버)
Phase 2: 나머지는 미등장 demographic group 우선 선택

Output Files

outputs/
├── {model}_predictions.json
├── inference_results.xlsx          # 모든 결과 Excel
└── figures/
    └── {model}_attention_{id}.png  # Per-task attention heatmap overlay

test-retest

VLM 출력의 신뢰성을 검증하는 검사-재검사 신뢰도 pilot.

uv run python main.py test-retest [OPTIONS]

Options

Option	Short	Type	Default	Description
`--n-images`	`-n`	INT	`50`	pilot 이미지 수
`--n-repeats`	`-r`	INT	`3`	이미지당 반복 추론 횟수
`--model`	`-m`	TEXT	`paligemma2`	테스트할 모델
`--config`	`-c`	PATH	`configs/default.yaml`	설정 YAML 경로
`--temperature`	`-t`	FLOAT	(none=greedy)	샘플링 온도 (생략 시 greedy)
`--seed`		INT	`42`	이미지 샘플링 시드
`--output`	`-o`	PATH	`outputs/test_retest_{model}.json`	결과 JSON 경로

Two Modes

Mode	조건	검증 목표
Greedy	`--temperature` 생략	동일 입력 → 동일 출력 (결정론성)
Stochastic	`--temperature 0.3`	온도 샘플링 시 평정 변동성

Examples

# Greedy 결정론성 검증 (본 실험 전 필수)
uv run python main.py test-retest -n 50 -r 3 -m paligemma2
 
# Stochastic 안정성 검증
uv run python main.py test-retest -n 50 -r 5 -m paligemma2 -t 0.3
 
# LLaVA 모델 테스트
uv run python main.py test-retest -n 30 -r 3 -m llava

Metrics

Greedy: emotion/valence/arousal 동일 비율 (목표: 99%+)
Emotion: percent agreement, Fleiss’ kappa
Valence/Arousal: ICC(2,k), within-image SD

Output

// outputs/test_retest_paligemma2.json
{
  "summary": {
    "greedy_determinism": { "emotion_identical": 1.0, ... },
    "emotion_consistency": { "fleiss_kappa": 0.95, ... },
    "icc": { "valence": 0.98, "arousal": 0.97, ... }
  },
  "per_image": [ ... ]
}

pipeline

infer → analyze → visualize를 순차적으로 일괄 실행한다.

uv run python main.py pipeline [OPTIONS]

Options

Option	Short	Type	Default	Description
`--config`	`-c`	PATH	`configs/default.yaml`	설정 YAML 경로

Example

uv run python main.py pipeline
uv run python main.py pipeline -c configs/small_models.yaml

Configuration

모든 명령어는 configs/default.yaml을 기본 설정 파일로 사용한다.

Key Config Sections

Section	주요 필드	설명
`image`	`root_dir`, `extensions`, `emotion_code_map`	이미지 경로 및 파일명 파싱
`models`	`name`, `hf_id`, `enabled`, `dtype`, `device`	VLM 백엔드 설정 (리스트)
`prompt`	`emotion_prompt`, `valence_prompt`, `arousal_prompt`	3-step 프롬프트 템플릿
`attention`	`capture`	`true`면 attention 추출 (~80MB/image)
`output`	`save_predictions`, `save_attention`, `figure_format`	출력 형식 제어

Enabling/Disabling Models

default.yaml에서 enabled: true/false로 제어하거나, CLI에서 --model 옵션으로 특정 모델만 실행:

models:
  - name: "paligemma2"
    hf_id: "google/paligemma2-3b-mix-224"
    enabled: true     # infer 시 실행됨
  - name: "llava"
    hf_id: "llava-hf/llava-1.5-7b-hf"
    enabled: false    # --model llava로 override 가능

Quick Start

# 1. 의존성 설치
uv sync
 
# 2. 테스트 실행
uv run pytest tests/ -v
 
# 3. 소규모 샘플로 동작 확인
uv run python main.py sample -n 3 --no-wandb
 
# 4. 결정론성 검증 (본 실험 전 필수)
uv run python main.py test-retest -n 50 -r 3
 
# 5. 전체 추론
uv run python main.py infer --no-wandb
 
# 6. 분석 + 시각화
uv run python main.py analyze
uv run python main.py visualize

cli_reference

CLI Reference

Commands Overview

infer

Options

Examples

Output Files

Inference Flow

Memory Management

analyze

Options

Examples

Analysis Contents

visualize

Options

Examples

Generated Figures

sample

Options

Examples

Sampling Strategy

Output Files

test-retest

Options

Two Modes

Examples

Metrics

Output

pipeline

Options

Example

Configuration

Key Config Sections

Enabling/Disabling Models

Quick Start

그래프 뷰

목차

Properties

백링크