Juhyeon's Blog

태그: Instrumental-Convergence

2건의 항목

2026년 6월 04일
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
2026년 6월 04일
Incomplete Tasks Induce Shutdown Resistance in Some Frontier LLMs