본문으로 건너뛰기

Juhyeon's Blog

❯

❯

❯

❯

Can LLMs Predict Their Own Failures Self Awareness via Internal Circuits

Can LLMs Predict Their Own Failures - Self-Awareness via Internal Circuits

2026년 2월 11일2분 분량

Introduction

Frozen LLM이 자신의 정확도를 내부 회로를 통해 예측할 수 있는지 연구
Gnosis: 내부 신호를 관찰하여 자기 검증하는 경량 self-awareness 메커니즘 제안
Hidden states와 attention 패턴에서 정확도 신호를 디코딩

Related Papers

LLM uncertainty estimation 연구
Self-verification 및 self-consistency 연구
Probing 기반 내부 표현 분석 연구

Methods

Gnosis: frozen LLM의 hidden states와 attention 패턴을 수동적으로 관찰
Internal traces를 fixed-budget descriptors로 압축
정확도 예측을 무시할 수 있는 추론 비용으로 수행 (5M 파라미터 추가)
1.7B~20B 파라미터 모델에서 평가

Results

Math reasoning, open-domain QA, academic knowledge 벤치마크에서 일관된 성능
강력한 internal baseline과 대규모 external judge를 accuracy, calibration 모두에서 능가
Sequence length에 독립적으로 작동
실패 생성의 조기 탐지(early detection) 지원

Discussion

LLM 내부에 자기 정확도에 대한 신호가 존재함을 강하게 시사
경량 메커니즘으로 실용적 self-awareness 구현 가능
Safety-critical 응용에서의 활용 가능성
향후 더 다양한 과제와 모델에서의 검증 필요

공유하기

그래프 뷰

Introduction
Related Papers
Methods
Results
Discussion

Properties

Author: Amirhosein Ghasemabadi et al.
Comment: Gnosis - hidden states와 attention 패턴으로 자기 정확도를 예측하는 경량 메커니즘
IsTargetPaper: true
Journal/Conference: arXiv
Published Year: 2025
Reading Status: ☑️ Not Started
Review Date: 2026-01-30
Topic: LLM Self-Awareness, Failure Prediction
URL: https://arxiv.org/abs/2512.20578

백링크

Architecture
Fundamentals
LLMs
Memory
self-consciousness
Unlabeled
Vision

Created with Quartz v4.5.2 © 2026

GitHub
Blog