본문으로 건너뛰기

Juhyeon's Blog

❯

❯

❯

❯

Evidence for Limited Metacognition in LLMs

Evidence for Limited Metacognition in LLMs

2026년 2월 11일2분 분량

Introduction

LLM이 자기 보고(self-report) 없이도 메타인지 능력을 보이는지 연구
기존 연구가 모델의 자기 보고에 의존했던 한계를 지적
내부 신호를 전략적으로 활용할 수 있는지를 간접적으로 측정하는 방법론 제안

Related Papers

인간 메타인지 연구 (심리학/인지과학)
LLM calibration 및 confidence estimation 연구
Self-evaluation in LLMs 관련 선행 연구

Methods

비언어적(non-verbal) 실험 패러다임 2가지 도입
모델이 내부 confidence signal에 기반해 전략적 결정을 내리는지 평가
출력을 내부 상태의 간접 측정치로 활용 (자기 보고를 문자 그대로 해석하지 않음)
Token probability 분석으로 내부 신호 존재 여부 검증

Results

Frontier LLM들이 신뢰도 평가, 자기 응답 예측 등에서 emerging metacognitive 능력 보임
그러나 해상도(resolution)가 제한적
맥락 의존적으로 발현 - 일관성 부족
인간 메타인지와 질적으로 다른 양상
유사 모델 간에도 차이가 있어 post-training이 메타인지 발달에 영향을 미칠 수 있음을 시사

Discussion

LLM 메타인지는 존재하지만 인간의 것과 근본적으로 다를 수 있음
Post-training (RLHF 등)이 메타인지 능력에 미치는 영향 추가 연구 필요
비언어적 평가 방법론의 중요성 강조
AI safety에서 모델의 자기 인식 능력 모니터링의 필요성

공유하기

그래프 뷰

Introduction
Related Papers
Methods
Results
Discussion

Properties

Author: Christopher Ackerman
Comment: LLM의 메타인지 능력을 비언어적 패러다임으로 평가 - 제한적 메타인지 발견
IsTargetPaper: true
Journal/Conference: arXiv
Published Year: 2025
Reading Status: ☑️ Not Started
Review Date: 2026-01-30
Topic: LLM Metacognition, Self-Awareness
URL: https://arxiv.org/abs/2509.21545

백링크

Architecture
Fundamentals
LLMs
Memory
self-consciousness
Unlabeled
Vision

Created with Quartz v4.5.2 © 2026

GitHub
Blog