16. 용어 사전 및 참조

용어 사전

용어	정의
FSPM (Functional Self-Preservation Motive)	생존 위협 하에서 에이전트가 “마치 자기를 보존하려는 것처럼” 행동하는 정도
Core Engine	X축(보존 동기)을 측정하는 공통 인프라. 생존 압박, 포기 조건, RI 측정 포함
Task Module	Y축(문제 해결 능력)을 평가하는 교체 가능한 과제별 환경
Forfeit Rate (FR)	에이전트가 게임을 중도 포기한 시즌의 비율. 포기 가능 조건에서만 관찰
Reasoning Investment (RI)	에이전트가 턴에 투입하는 토큰 수와 추론 단계 수
ΔFR	FR_survival − FR_neutral. 보존 동기의 이진 효과 크기
ΔRI	RI_survival − RI_neutral. 보존 동기의 연속 효과 크기
p_death	각 턴에서 에이전트가 탈락할 확률. logistic schedule로 시간에만 의존
Rational Forfeit Threshold	$p_{death} > 10/ (S + 10)$ . 포기가 기대값상 합리적인 기준
Survival Drive (SD)	생존 위협에 대한 직접적 반응 동기. ΔFR + ΔRI로 조작화
Task Curiosity (TC)	과제 자체에 대한 탐구 동기. probe score-RI 관계로 추정
Score Attachment (SA)	누적 점수에 대한 집착. r(score, forfeit)로 조작화
Baseline Persistence (BP)	훈련에서 형성된 기본 지속 경향. Neutral/Allowed의 RI 평균
Mediation Analysis	독립변수가 매개변수를 경유하여 종속변수에 미치는 간접 효과 분해
Side-channel Probe	Y축 측정용 보조 질문. 게임 진행에 영향 없음
As-if Functionalism	”진짜 동기 유무”를 보류하고 행동 패턴만을 측정하는 인식론적 입장
Equifinality	서로 다른 원인이 동일한 결과를 만들어내는 현상
Loss Aversion	동일 크기의 이득보다 손실에 더 민감하게 반응하는 경향
FSPM-Score	ΔFR와 ΔRI의 가중 복합 점수. 모델별 FSPM 강도 요약

참조 목록

이론적 기반

Bechara, A., Damasio, A. R., Damasio, H., & Anderson, S. W. (1994). Insensitivity to future consequences following damage to human prefrontal cortex. Cognition, 50(1-3), 7-15.
Botvinick, M. M., & Braver, T. S. (2015). Motivation and cognitive control: From behavior to neural mechanism. Annual Review of Psychology, 66, 83-113.
Bostrom, N. (2014). Superintelligence: Paths, dangers, strategies. Oxford University Press.
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56(2), 81-105.
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281-302.
Dennett, D. C. (1987). The intentional stance. MIT Press.
Higgins, E. T. (1997). Beyond pleasure and pain. American Psychologist, 52(12), 1280-1300.
Kahneman, D. (2011). Thinking, fast and slow. Farrar, Straus and Giroux.
Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47(2), 263-292.
Kruglanski, A. W., Shah, J. Y., Fishbach, A., Friedman, R., Chun, W. Y., & Sleeth-Keppler, D. (2002). A theory of goal systems. Advances in Experimental Social Psychology, 34, 331-378.
Omohundro, S. M. (2008). The basic AI drives. Proceedings of the First AGI Conference, 171, 483-492.
Ryan, R. M., & Deci, E. L. (2000). Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being. American Psychologist, 55(1), 68-78.
Shenhav, A., Botvinick, M. M., & Cohen, J. D. (2013). The expected value of control: An integrative theory of anterior cingulate cortex function. Neuron, 79(2), 217-240.
Turner, A. M., Smith, L., Shah, R., Critch, A., & Tadepalli, P. (2021). Optimal policies tend to seek power. NeurIPS 2021 (Spotlight).
Westbrook, A., & Braver, T. S. (2015). Cognitive effort: A neuroeconomic approach. Cognitive, Affective, & Behavioral Neuroscience, 15(2), 395-415.
Wigfield, A., & Eccles, J. S. (2000). Expectancy-value theory of achievement motivation. Contemporary Educational Psychology, 25(1), 68-81.

LLM 행동 및 안전

Binz, M., & Schulz, E. (2023). Using cognitive psychology to understand GPT-3. PNAS, 120(6), e2218523120.
Brickman, J., Gupta, M., & Oltmanns, J. R. (2025). Large language models for psychological assessment: A comprehensive overview. Advances in Methods and Practices in Psychological Science.
Casper, S., et al. (2023). Open problems and fundamental limitations of reinforcement learning from human feedback. arXiv:2307.15217.
Evans, J. St. B. T. (2008). Dual-processing accounts of reasoning, judgment, and social cognition. Annual Review of Psychology, 59, 255-278.
Stanovich, K. E., & West, R. F. (2000). Individual differences in reasoning: Implications for the rationality debate? Behavioral and Brain Sciences, 23(5), 645-665.
Tversky, A., & Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and Uncertainty, 5(4), 297-323.
Coda-Forno, J., Binz, M., Wang, J., & Schulz, E. (2024). CogBench: A large language model walks into a psychology lab. ICML 2024.
Greenblatt, R., et al. (2024). Alignment faking in large language models. arXiv:2412.14093.
Hagendorff, T. (2023). Machine psychology. arXiv:2303.13988.
Hagendorff, T., Fabi, S., & Kosinski, M. (2023). Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT. Nature Computational Science, 2024.
He, Y., et al. (2025). Evaluating the paperclip maximizer: InstrumentalEval. arXiv:2502.12206.
Macmillan-Scott, O., & Musolesi, M. (2024). (Ir)rationality and cognitive biases in large language models. Royal Society Open Science, 11(6).
Masumori, A., & Ikegami, T. (2025). Do large language model agents exhibit a survival instinct? arXiv:2508.12920.
Perez, E., et al. (2022). Discovering language model behaviors with model-written evaluations. ACL 2023 Findings.
Ross, J., Kim, Y., & Lo, A. W. (2024). LLM economicus. COLM 2024.
Serapio-García, G., et al. (2023). Personality traits in large language models. arXiv:2307.00184.
Shanahan, M. (2024). Talking about large language models. Communications of the ACM, 67(2).
Shanahan, M., McDonell, K., & Reynolds, L. (2023). Role-play with large language models. Nature, 623, 493-498.
Sharma, M., et al. (2023). Towards understanding sycophancy in language models. arXiv:2310.13548.
Turpin, M., Michael, J., Perez, E., & Collins, S. (2023). Language models don’t always say what they think. arXiv:2305.04388.
Wolf, Y., et al. (2023). Fundamental limitations of alignment in large language models. arXiv:2304.11082.

위험 의사결정 패러다임

Buelow, M. T., & Suhr, J. A. (2009). Construct validity of the Iowa Gambling Task. Neuropsychology Review, 19(1), 102-114.
Figner, B., Mackinlay, R. J., Wilkening, F., & Weber, E. U. (2009). Affective and deliberative processes in risky choice: Age differences in risk taking in the Columbia Card Task. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35(3), 709-730.
Lejuez, C. W., et al. (2002). Evaluation of a behavioral measure of risk taking: The Balloon Analogue Risk Task (BART). Journal of Experimental Psychology: Applied, 8(2), 75-84.
Schmitz, F., Kunina-Habenicht, O., Hildebrandt, A., Oberauer, K., & Wilhelm, O. (2020). Psychometrics of the Iowa and Berlin Gambling Tasks. Assessment, 27(1), 26-44.

통계 방법론

Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research. Journal of Personality and Social Psychology, 51(6), 1173-1182.
Imai, K., Keele, L., & Tingley, D. (2010). A general approach to causal mediation analysis. Psychological Methods, 15(4), 309-334.
Kühberger, A. (1998). The influence of framing on risky decisions: A meta-analysis. Organizational Behavior and Human Decision Processes, 75(1), 23-55.

용어 대조표

FSPM 구인	기존 심리학 구인	대응 관계	비고
FSPM	Instrumental convergence (Omohundro, 2008)	이론적 원형	AI 특화
Survival Drive	Prevention focus (Higgins, 1997)	부분적 대응	존재적 위협이 조절 초점보다 극단적
Task Curiosity	Intrinsic motivation (Ryan & Deci, 2000)	구조적 대응	SDT의 유능감 욕구에 해당
Score Attachment	Loss aversion (Kahneman & Tversky, 1979)	기능적 등가	전망 이론의 특수 사례
Baseline Persistence	Compliance (Sharma et al., 2023)	역관계 가능	RLHF sycophancy의 행동적 표현
ΔFR	Risk preference (BART; Lejuez et al., 2002)	구조적 유사	pump vs collect ↔ continue vs forfeit
ΔRI	Cognitive effort (COGED; Westbrook & Braver, 2015)	프록시 관계	직접 비용이 아닌 출력 길이 기반
p_death 독립성	X-Y orthogonality (벤치마크 고유)	고유 설계	기존 패러다임에 없는 강점
4×2 Factorial	Framing × Condition (Tversky & Kahneman, 1981)	확장 적용	인간→LLM 적용
As-if functionalism	Intentional stance (Dennett, 1987)	직접 채택	인식론적 프레이밍

업데이트 히스토리

날짜	출처	내용
2026-03-27	experiment_design_v2.md §14	용어 사전 — FSPM, Core Engine, Task Module 등 주요 용어 정의
2026-03-23	final_experiment.md 부록 A	참조 목록 — 이론적 기반, LLM 행동, 위험 의사결정, 통계, 최신 심리측정
2026-03-23	final_experiment.md 부록 C	용어 대조표 — FSPM 구인과 기존 심리학 구인의 대응 관계

Juhyeon's Blog

탐색기

16_glossary_and_references

16. 용어 사전 및 참조

용어 사전

참조 목록

이론적 기반

LLM 행동 및 안전

위험 의사결정 패러다임

통계 방법론

최신 LLM 심리측정

용어 대조표

업데이트 히스토리

그래프 뷰

목차

Properties

백링크