본문으로 건너뛰기

Juhyeon's Blog

❯

❯

❯

❯

Reflection Bench Evaluating Epistemic Agency in Large Language Models

Reflection-Bench - Evaluating Epistemic Agency in Large Language Models

2026년 2월 11일2분 분량

Introduction

LLM이 AI agent의 cognitive engine으로 점점 더 많이 사용됨
Agent의 신뢰성과 효과성은 intrinsic epistemic agency에 크게 의존하나 연구 부족
Epistemic agency: 동적 환경에 대한 belief를 유연하게 구성, 적응, 모니터링하는 능력

Related Papers

LLM agent 벤치마크
Cognitive psychology 기반 AI 평가
Metacognition 및 meta-reflection 연구

Methods

Epistemic agency의 전체 과정을 7개 상호 관련 차원으로 특성화:
1. Prediction
2. Decision-making
3. Perception
4. Memory
5. Counterfactual thinking
6. Belief updating
7. Meta-reflection
인지심리학에서 영감받은 7개 task로 구성된 Reflection-Bench 제안
장기적 relevance와 data leakage 최소화 설계

Results

16개 모델을 3가지 prompting 전략으로 평가
명확한 3-tier 성능 계층 구조 확인
현재 LLM의 meta-reflection 능력에서 특히 큰 한계 발견
SOTA LLM도 epistemic agency의 초보적 징후만 보임

Discussion

핵심 cognitive function 향상, cross-functional coordination 개선, adaptive processing 메커니즘 개발 필요
코드 및 데이터: https://github.com/AI45Lab/ReflectionBench

공유하기

그래프 뷰

Introduction
Related Papers
Methods
Results
Discussion

Properties

Author: Lingyu Li et al.
Comment: 인지심리학 기반 7가지 task로 LLM의 epistemic agency를 평가하는 벤치마크
IsTargetPaper: true
Journal/Conference: ICML 2025
Published Year: 2024
Reading Status: Not Started
Review Date: 2026-02-02
Topic: LLM Benchmark, Epistemic Agency, Metacognition, Cognitive Psychology
URL: https://www.semanticscholar.org/paper/6875e36ecb05f73d9e5a98729af0b927bb4f94d6

백링크

Architecture
Fundamentals
LLMs
Memory
self-consciousness
Unlabeled
Vision

Created with Quartz v4.5.2 © 2026

GitHub
Blog