본문으로 건너뛰기

Juhyeon's Blog

태그: Alignment

10건의 항목

2026년 6월 04일
Can LLMs Lie - Investigation beyond Hallucination
2026년 6월 04일
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
2026년 6월 04일
Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling
2026년 6월 04일
Incomplete Tasks Induce Shutdown Resistance in Some Frontier LLMs
2026년 6월 04일
Know Your Limits - A Survey of Abstention in Large Language Models
2026년 6월 04일
LACIE - Listener-Aware Finetuning for Confidence Calibration in Large Language Models
2026년 6월 04일
Odds-Ratio Preference Optimization(ORPO)
2026년 6월 04일
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
2026년 6월 04일
Reasoning Models Struggle to Control their Chains of Thought
2026년 6월 04일
Surgical Cheap and Flexible - Mitigating False Refusal in Language Models via Single Vector Ablation

키보드 단축키

`/` 또는 `Ctrl`+`K`	검색
`?`	단축키 도움말
`Esc`	모달 닫기

Created with Quartz v4.5.2 © 2026

GitHub
Blog