본문으로 건너뛰기

Juhyeon's Blog

❯

❯

❯

❯

Learning to Trust Your Feelings Leveraging Self awareness in LLMs for Hallucination Mitigation

Learning to Trust Your Feelings - Leveraging Self-awareness in LLMs for Hallucination Mitigation

2026년 2월 11일1분 분량

Introduction

LLM이 자신의 internal knowledge state를 인식하고 표현할 수 있는지 평가
Knowledge state probing에서 85% 이상의 accuracy 관찰 → robust self-awareness 존재
그러나 generation 시 internal knowledge를 faithfully express하지 못해 hallucination 발생

Related Papers

Hallucination detection and mitigation
RLHF

Methods

DreamCatcher: knowledge probing + consistency checking을 결합한 자동 hallucination annotation 도구
Knowledge preference data ranking
RLKF (Reinforcement Learning from Knowledge Feedback): knowledge preference를 reward로 활용

Results

RLKF가 모델의 internal knowledge state 활용 능력을 효과적으로 향상
Knowledge-based 및 honesty-related task에서 성능 개선
53 citations

Discussion

Self-awareness가 존재하지만 생성 과정에서 활용되지 못하는 gap을 식별
Internal state와 output의 alignment이 핵심 과제

공유하기

그래프 뷰

Introduction
Related Papers
Methods
Results
Discussion

Properties

Author: Yuxin Liang et al.
Comment: LLM이 85% 이상의 knowledge state probing accuracy를 보이나 generation 시 faithfully express하지 못함을 발견
IsTargetPaper: true
Journal/Conference: KnowledgeNLP Workshop
Published Year: 2024
Reading Status: Not Started
Review Date: 2026-02-01
Topic: LLM self-awareness, hallucination, RLKF
URL: https://arxiv.org/abs/2401.15449

백링크

Architecture
Fundamentals
LLMs
Memory
self-consciousness
Unlabeled
Vision

Created with Quartz v4.5.2 © 2026

GitHub
Blog