본문으로 건너뛰기

Juhyeon's Blog

❯

❯

❯

❯

Factual Self Awareness in Language Models Representation, Robustness, and Scaling

Factual Self-Awareness in Language Models - Representation, Robustness, and Scaling

2026년 2월 11일2분 분량

Introduction

LLM의 factual incorrectness가 배포의 주요 우려사항
LLM이 생성 후 fact-checking이 가능하다는 기존 발견 존재
본 연구는 생성 시점(at the time of generation)에 factual recall의 정확성을 dictate하는 내부 compass의 존재를 입증

Related Papers

LLM hallucination 및 factual recall 연구
Probing 및 linear representation 분석
Self-knowledge 관련 연구 (Do I Know This Entity 등)

Methods

주어진 subject entity와 relation에 대해, Transformer residual stream에서 올바른 attribute를 recall할 수 있는지를 dictate하는 linear feature 인코딩 발견
Self-awareness signal의 robustness를 minor formatting variation에 대해 검증
Context perturbation 영향 분석 (다양한 example selection 전략)
Model size 및 training dynamics에 걸친 scaling 실험

Results

Self-awareness signal이 formatting variation에 robust함
Training 중 self-awareness가 빠르게 출현하고 intermediate layer에서 peak
Model size에 따른 scaling 패턴 확인

Discussion

LLM 내부에 intrinsic self-monitoring capability가 존재
해석 가능성(interpretability)과 신뢰성(reliability)에 기여
Factual self-awareness가 representation level에서 linear하게 인코딩됨

공유하기

그래프 뷰

Introduction
Related Papers
Methods
Results
Discussion

Properties

Author: Hovhannes Tamoyan et al.
Comment: LLM이 factual recall 시점에 정확성을 내부적으로 인코딩하는 self-awareness direction을 발견
IsTargetPaper: true
Journal/Conference: ICLR 2025
Published Year: 2025
Reading Status: Not Started
Review Date: 2026-02-02
Topic: LLM Self-Awareness, Factual Recall, Representation, Interpretability
URL: https://www.semanticscholar.org/paper/5778bd480af49144fb0a8bc177e13409e95dac47

백링크

Architecture
Fundamentals
LLMs
Memory
self-consciousness
Unlabeled
Vision

Created with Quartz v4.5.2 © 2026

GitHub
Blog