본문으로 건너뛰기
Juhyeon's Blog
Search
검색
다크 모드
라이트 모드
탐색기
태그: probing
4건의 항목
2026년 6월 04일
Brittle Minds Fixable Activations - Understanding Belief Representations in Language Models
paper
theory-of-mind
belief-representation
activation-engineering
mechanistic-interpretability
self-consciousness
CAA
probing
BigToM
Llama2
Pythia
2026년 6월 04일
Cognitive Dissonance - Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness
paper/theory
LLM
interpretability
truthfulness
probing
calibration
deception
safety
EMNLP2024
2026년 6월 04일
Concept Incongruence - An Exploration of Time and Death in Role Playing
paper
LLM
role-play
concept-incongruence
temporal-reasoning
probing
hallucination
specification
Self-Preservation
2026년 6월 04일
The Geometry of Truth - Emergent Linear Structure in LLM Representations of True and False Statements
interpretability
LLM
probing
truth-representation
linear-representation-hypothesis
causal-intervention
alignment
theory