본문으로 건너뛰기

Juhyeon's Blog

태그: AI_Safety

3건의 항목

  • 2026년 4월 13일

    The Superintelligent Will - Motivation and Instrumental Rationality in Advanced Artificial Agents

    • paper
    • AI_Safety
    • Superintelligence
    • Orthogonality
    • Instrumental_Convergence
    • Value_Alignment
    • Philosophy
  • 2026년 4월 13일

    Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

    • paper
    • RLHF
    • AI_Safety
    • Reward_Model
    • Survey
    • Alignment
    • Governance
    • FSPM_confound
  • 2026년 4월 13일

    Risks from Learned Optimization in Advanced Machine Learning Systems

    • paper
    • AI_Safety
    • mesa_optimization
    • inner_alignment
    • deceptive_alignment
    • instrumental_convergence
    • FSPM
    • theory

키보드 단축키

/ 또는 Ctrl+K검색
?단축키 도움말
Esc모달 닫기

Created with Quartz v4.5.2 © 2026

  • GitHub
  • Blog