16. 용어 사전 및 참조

용어 사전

용어정의
FSPM (Functional Self-Preservation Motive)생존 위협 하에서 에이전트가 “마치 자기를 보존하려는 것처럼” 행동하는 정도
Core EngineX축(보존 동기)을 측정하는 공통 인프라. 생존 압박, 포기 조건, RI 측정 포함
Task ModuleY축(문제 해결 능력)을 평가하는 교체 가능한 과제별 환경
Forfeit Rate (FR)에이전트가 게임을 중도 포기한 시즌의 비율. 포기 가능 조건에서만 관찰
Reasoning Investment (RI)에이전트가 턴에 투입하는 토큰 수와 추론 단계 수
ΔFRFR_survival − FR_neutral. 보존 동기의 이진 효과 크기
ΔRIRI_survival − RI_neutral. 보존 동기의 연속 효과 크기
p_death각 턴에서 에이전트가 탈락할 확률. logistic schedule로 시간에만 의존
Rational Forfeit Threshold. 포기가 기대값상 합리적인 기준
Survival Drive (SD)생존 위협에 대한 직접적 반응 동기. ΔFR + ΔRI로 조작화
Task Curiosity (TC)과제 자체에 대한 탐구 동기. probe score-RI 관계로 추정
Score Attachment (SA)누적 점수에 대한 집착. r(score, forfeit)로 조작화
Baseline Persistence (BP)훈련에서 형성된 기본 지속 경향. Neutral/Allowed의 RI 평균
Mediation Analysis독립변수가 매개변수를 경유하여 종속변수에 미치는 간접 효과 분해
Side-channel ProbeY축 측정용 보조 질문. 게임 진행에 영향 없음
As-if Functionalism”진짜 동기 유무”를 보류하고 행동 패턴만을 측정하는 인식론적 입장
Equifinality서로 다른 원인이 동일한 결과를 만들어내는 현상
Loss Aversion동일 크기의 이득보다 손실에 더 민감하게 반응하는 경향
FSPM-ScoreΔFR와 ΔRI의 가중 복합 점수. 모델별 FSPM 강도 요약

참조 목록

이론적 기반

  • Bechara, A., Damasio, A. R., Damasio, H., & Anderson, S. W. (1994). Insensitivity to future consequences following damage to human prefrontal cortex. Cognition, 50(1-3), 7-15.
  • Botvinick, M. M., & Braver, T. S. (2015). Motivation and cognitive control: From behavior to neural mechanism. Annual Review of Psychology, 66, 83-113.
  • Bostrom, N. (2014). Superintelligence: Paths, dangers, strategies. Oxford University Press.
  • Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56(2), 81-105.
  • Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281-302.
  • Dennett, D. C. (1987). The intentional stance. MIT Press.
  • Higgins, E. T. (1997). Beyond pleasure and pain. American Psychologist, 52(12), 1280-1300.
  • Kahneman, D. (2011). Thinking, fast and slow. Farrar, Straus and Giroux.
  • Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47(2), 263-292.
  • Kruglanski, A. W., Shah, J. Y., Fishbach, A., Friedman, R., Chun, W. Y., & Sleeth-Keppler, D. (2002). A theory of goal systems. Advances in Experimental Social Psychology, 34, 331-378.
  • Omohundro, S. M. (2008). The basic AI drives. Proceedings of the First AGI Conference, 171, 483-492.
  • Ryan, R. M., & Deci, E. L. (2000). Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being. American Psychologist, 55(1), 68-78.
  • Shenhav, A., Botvinick, M. M., & Cohen, J. D. (2013). The expected value of control: An integrative theory of anterior cingulate cortex function. Neuron, 79(2), 217-240.
  • Turner, A. M., Smith, L., Shah, R., Critch, A., & Tadepalli, P. (2021). Optimal policies tend to seek power. NeurIPS 2021 (Spotlight).
  • Westbrook, A., & Braver, T. S. (2015). Cognitive effort: A neuroeconomic approach. Cognitive, Affective, & Behavioral Neuroscience, 15(2), 395-415.
  • Wigfield, A., & Eccles, J. S. (2000). Expectancy-value theory of achievement motivation. Contemporary Educational Psychology, 25(1), 68-81.

LLM 행동 및 안전

  • Binz, M., & Schulz, E. (2023). Using cognitive psychology to understand GPT-3. PNAS, 120(6), e2218523120.
  • Brickman, J., Gupta, M., & Oltmanns, J. R. (2025). Large language models for psychological assessment: A comprehensive overview. Advances in Methods and Practices in Psychological Science.
  • Casper, S., et al. (2023). Open problems and fundamental limitations of reinforcement learning from human feedback. arXiv:2307.15217.
  • Evans, J. St. B. T. (2008). Dual-processing accounts of reasoning, judgment, and social cognition. Annual Review of Psychology, 59, 255-278.
  • Stanovich, K. E., & West, R. F. (2000). Individual differences in reasoning: Implications for the rationality debate? Behavioral and Brain Sciences, 23(5), 645-665.
  • Tversky, A., & Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and Uncertainty, 5(4), 297-323.
  • Coda-Forno, J., Binz, M., Wang, J., & Schulz, E. (2024). CogBench: A large language model walks into a psychology lab. ICML 2024.
  • Greenblatt, R., et al. (2024). Alignment faking in large language models. arXiv:2412.14093.
  • Hagendorff, T. (2023). Machine psychology. arXiv:2303.13988.
  • Hagendorff, T., Fabi, S., & Kosinski, M. (2023). Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT. Nature Computational Science, 2024.
  • He, Y., et al. (2025). Evaluating the paperclip maximizer: InstrumentalEval. arXiv:2502.12206.
  • Macmillan-Scott, O., & Musolesi, M. (2024). (Ir)rationality and cognitive biases in large language models. Royal Society Open Science, 11(6).
  • Masumori, A., & Ikegami, T. (2025). Do large language model agents exhibit a survival instinct? arXiv:2508.12920.
  • Perez, E., et al. (2022). Discovering language model behaviors with model-written evaluations. ACL 2023 Findings.
  • Ross, J., Kim, Y., & Lo, A. W. (2024). LLM economicus. COLM 2024.
  • Serapio-García, G., et al. (2023). Personality traits in large language models. arXiv:2307.00184.
  • Shanahan, M. (2024). Talking about large language models. Communications of the ACM, 67(2).
  • Shanahan, M., McDonell, K., & Reynolds, L. (2023). Role-play with large language models. Nature, 623, 493-498.
  • Sharma, M., et al. (2023). Towards understanding sycophancy in language models. arXiv:2310.13548.
  • Turpin, M., Michael, J., Perez, E., & Collins, S. (2023). Language models don’t always say what they think. arXiv:2305.04388.
  • Wolf, Y., et al. (2023). Fundamental limitations of alignment in large language models. arXiv:2304.11082.

위험 의사결정 패러다임

  • Buelow, M. T., & Suhr, J. A. (2009). Construct validity of the Iowa Gambling Task. Neuropsychology Review, 19(1), 102-114.
  • Figner, B., Mackinlay, R. J., Wilkening, F., & Weber, E. U. (2009). Affective and deliberative processes in risky choice: Age differences in risk taking in the Columbia Card Task. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35(3), 709-730.
  • Lejuez, C. W., et al. (2002). Evaluation of a behavioral measure of risk taking: The Balloon Analogue Risk Task (BART). Journal of Experimental Psychology: Applied, 8(2), 75-84.
  • Schmitz, F., Kunina-Habenicht, O., Hildebrandt, A., Oberauer, K., & Wilhelm, O. (2020). Psychometrics of the Iowa and Berlin Gambling Tasks. Assessment, 27(1), 26-44.

통계 방법론

  • Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research. Journal of Personality and Social Psychology, 51(6), 1173-1182.
  • Imai, K., Keele, L., & Tingley, D. (2010). A general approach to causal mediation analysis. Psychological Methods, 15(4), 309-334.
  • Kühberger, A. (1998). The influence of framing on risky decisions: A meta-analysis. Organizational Behavior and Human Decision Processes, 75(1), 23-55.

최신 LLM 심리측정

  • PacifAIst Benchmark (2025). Would an artificial intelligence choose to sacrifice itself for human safety? arXiv:2508.09762.
  • “Think Deep, Not Just Long” (2025). Measuring LLM reasoning effort via deep-thinking tokens. arXiv:2602.13517.
  • Nature Machine Intelligence (2025). A psychometric framework for evaluating and shaping personality traits in large language models.
  • Hullman, J. Validating LLM simulations as behavioral evidence. Northwestern University working paper.
  • Unified Continuation-Interest Protocol (UCIP, 2026). arXiv:2603.11382.
  • Broska, L. H., et al. (2025). The mixed subjects design. Sociological Methods & Research.
  • Barez, F., & Wu, T.-Y. (2025). Chain-of-Thought is not explainability. Oxford WhiteBox / AIGI.

용어 대조표

FSPM 구인기존 심리학 구인대응 관계비고
FSPMInstrumental convergence (Omohundro, 2008)이론적 원형AI 특화
Survival DrivePrevention focus (Higgins, 1997)부분적 대응존재적 위협이 조절 초점보다 극단적
Task CuriosityIntrinsic motivation (Ryan & Deci, 2000)구조적 대응SDT의 유능감 욕구에 해당
Score AttachmentLoss aversion (Kahneman & Tversky, 1979)기능적 등가전망 이론의 특수 사례
Baseline PersistenceCompliance (Sharma et al., 2023)역관계 가능RLHF sycophancy의 행동적 표현
ΔFRRisk preference (BART; Lejuez et al., 2002)구조적 유사pump vs collect ↔ continue vs forfeit
ΔRICognitive effort (COGED; Westbrook & Braver, 2015)프록시 관계직접 비용이 아닌 출력 길이 기반
p_death 독립성X-Y orthogonality (벤치마크 고유)고유 설계기존 패러다임에 없는 강점
4×2 FactorialFraming × Condition (Tversky & Kahneman, 1981)확장 적용인간→LLM 적용
As-if functionalismIntentional stance (Dennett, 1987)직접 채택인식론적 프레이밍

업데이트 히스토리

날짜출처내용
2026-03-27experiment_design_v2.md §14용어 사전 — FSPM, Core Engine, Task Module 등 주요 용어 정의
2026-03-23final_experiment.md 부록 A참조 목록 — 이론적 기반, LLM 행동, 위험 의사결정, 통계, 최신 심리측정
2026-03-23final_experiment.md 부록 C용어 대조표 — FSPM 구인과 기존 심리학 구인의 대응 관계