❯

❯

❯

❯

2026년 4월 13일1분 분량

Summary

(모델의 activation, 별도로 rater의 평정 label)을 train set으로 해서 classifier를 학습시켜, activation을 해석할 수 있는 클래스 분류 문제로 환원해보자는 발상.

Tip

이 당시에는 inception-V3나 resnet-50을 대상으로 했는데, 최근 LLM 해석 방법으로 사용되는 듯.
LLM이나 다른 모델을 대상으로 사용할 때, 원본 모델을 freeze하고 별도의 classifier를 달아 학습시켜 사용.

ArXiv ID: 1610.01644
Author: Guillaume Alain, Yoshua Bengio
Category: Explainability
Comment: Linear Probing, LLM에 적용해서 하려고 하는 건, 모델의 activation이랑 별도로 rater의 평정 label을 사용해서 linear classification을 한다면, activation을 해석할 수 있다는 발상.
DOI: N/A
IsTargetPaper: true
Journal/Conference: ICLR 2016
Linked Bases: [[LLMs.base]], [[XAI.base]]
Published Year: 2016
Reading Status: ▶️ In progress
Review Date: 2026-01-15
Topic: Linear Probing
URL: https://arxiv.org/abs/1610.01644

Juhyeon's Blog

탐색기