본문으로 건너뛰기

Juhyeon's Blog

❯

❯

❯

❯

Large Language Models Do NOT Really Know What They Dont Know

Large Language Models Do NOT Really Know What They Dont Know

2026년 2월 11일1분 분량

Introduction

LLM의 hidden states, attention weights, token probabilities가 factuality signal을 encode한다는 주장에 대한 반론
Shortcut이나 spurious association에 의한 factual error도 동일한 training objective에서 발생
Internal computation이 factual output과 hallucinated output을 신뢰성 있게 구분할 수 있는지 조사

Related Papers

LLM knows what it knows 관련 연구
Hallucination detection via internal states

Methods

Subject information 의존도에 따른 두 가지 hallucination 유형 비교
Hidden-state geometry 분석
Mechanistic analysis of factual query processing

Results

Subject knowledge와 연관된 hallucination: correct response와 동일한 internal recall process → 구분 불가
Subject knowledge와 무관한 hallucination: 별개의 clustered representation → 탐지 가능
LLM은 truthfulness가 아닌 knowledge recall 패턴만 encode

Discussion

“LLMs don’t really know what they don’t know”라는 근본적 한계 제시
Self-knowledge 연구에서 hallucination 유형 구분의 중요성 시사

공유하기

그래프 뷰

Introduction
Related Papers
Methods
Results
Discussion

Properties

Author: Chi Seng Cheang et al.
Comment: LLM internal state에서 truthfulness가 encode되지 않음을 mechanistic 분석으로 입증
IsTargetPaper: true
Journal/Conference: arXiv
Published Year: 2025
Reading Status: Not Started
Review Date: 2026-02-01
Topic: LLM self-knowledge, hallucination, internal representations
URL: https://arxiv.org/abs/2510.09033

백링크

Architecture
Fundamentals
LLMs
Memory
self-consciousness
Unlabeled
Vision

Created with Quartz v4.5.2 © 2026

GitHub
Blog