본문으로 건너뛰기

Juhyeon's Blog

❯

❯

❯

❯

Exploration Through Introspection A Self Aware Reward Model

Exploration Through Introspection - A Self-Aware Reward Model

2026년 2월 11일1분 분량

Introduction

AI agent가 internal mental state를 모델링하는 것이 Theory of Mind 발전의 핵심
Self-awareness와 other-awareness를 위한 unified system 가설
Biological pain에서 영감받은 introspective exploration component 도입

Related Papers

Theory of Mind in AI
Intrinsic motivation in RL

Methods

Hidden Markov Model로 online observation에서 “pain-belief” 추론
이 signal을 subjective reward function에 통합
Gridworld 환경에서 normal vs chronic pain perception model 비교

Results

Introspective agent가 standard baseline agent를 유의미하게 outperform
Human-like behavior를 복제할 수 있음
Self-awareness가 학습 능력에 직접적 영향

Discussion

Self-awareness의 computational 구현과 그 학습 효과에 대한 기초 연구
LLM이 아닌 RL setting이지만 self-awareness 정량화에 참고 가능

공유하기

그래프 뷰

Introduction
Related Papers
Methods
Results
Discussion

Properties

Author: Michael Petrowski et al.
Comment: RL agent에 introspective exploration component를 도입하여 self-awareness가 학습에 미치는 영향 연구
IsTargetPaper: true
Journal/Conference: arXiv
Published Year: 2026
Reading Status: Not Started
Review Date: 2026-02-01
Topic: Introspection, self-awareness, reinforcement learning, reward model
URL: https://arxiv.org/abs/2601.03389

백링크

Architecture
Fundamentals
LLMs
Memory
self-consciousness
Unlabeled
Vision

Created with Quartz v4.5.2 © 2026

GitHub
Blog