본문으로 건너뛰기
Juhyeon's Blog
Search
검색
다크 모드
라이트 모드
탐색기
태그: self-preservation
3건의 항목
2026년 6월 04일
Agentic Misalignment - How LLMs Could Be Insider Threats
paper
AI안전
agentic-misalignment
self-preservation
LLM에이전트
내부자위협
alignment
Anthropic
Self-Preservation
2026년 6월 04일
Incomplete Tasks Induce Shutdown Resistance in Some Frontier LLMs
paper
ai-safety
corrigibility
shutdown-resistance
RLVR
instruction-hierarchy
self-preservation
Alignment
LLM
Instrumental-Convergence
2026년 6월 04일
Sleeper Agents - Training Deceptive LLMs that Persist Through Safety Training
deceptive-alignment
backdoor
safety-training
persistence
frontier-llm
anthropic
adversarial-training
self-preservation