본문으로 건너뛰기

Juhyeon's Blog

❯

❯

❯

❯

Cognitive Dissonance Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness

Cognitive Dissonance - Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness

2026년 2월 11일1분 분량

Introduction

LLM이 내부적으로 truthfulness를 표현하면서도 거짓 출력을 생성하는 현상 조사
Internal representation과 output 간의 “cognitive dissonance”

Related Papers

Probing for truthfulness
Representation engineering

Methods

Linear probing으로 internal truthfulness representation 추출
Output과의 불일치 패턴 분석

Results

Internal representation은 truth를 encode하지만 output은 이를 반영하지 못하는 경우 존재
Cognitive dissonance의 빈도와 조건 분석

Discussion

Self-knowledge가 존재하나 행동으로 연결되지 않는 현상
Introspection 능력과 output faithfulness의 괴리

공유하기

그래프 뷰

Introduction
Related Papers
Methods
Results
Discussion

Properties

Author: Kevin Liu et al.
Comment: LLM의 내부 truthfulness representation과 실제 output 간의 불일치(cognitive dissonance) 분석
IsTargetPaper: true
Journal/Conference: EMNLP 2024
Published Year: 2024
Reading Status: Not Started
Review Date: 2026-02-01
Topic: Internal truthfulness representation, output disagreement, cognitive dissonance
URL: https://arxiv.org/abs/2312.03729

백링크

Architecture
Fundamentals
LLMs
Memory
self-consciousness
Unlabeled
Vision

Created with Quartz v4.5.2 © 2026

GitHub
Blog