본문으로 건너뛰기
Juhyeon's Blog
Search
검색
다크 모드
라이트 모드
탐색기
태그: Safety
5건의 항목
2026년 6월 04일
Annotation-Efficient Universal Honesty Alignment for LLMs
Paper
LLM
HonestyAlignment
Calibration
SelfConsistency
AnnotationEfficiency
Training
ICLR2026
Safety
Hallucination
2026년 6월 04일
Can LLMs Lie - Investigation beyond Hallucination
LLM
Deception
Hallucination
Safety
Interpretability
Steering
Alignment
Theory
2026년 6월 04일
Know Your Limits - A Survey of Abstention in Large Language Models
Survey
LLM
Abstention
SelectivePrediction
Uncertainty
Calibration
Safety
Alignment
RLHF
Hallucination
2026년 6월 04일
Reasoning Models Struggle to Control their Chains of Thought
paper
Safety
CoT
Monitoring
Controllability
Alignment
ReasoningModels
LLM
2026년 6월 04일
Surgical Cheap and Flexible - Mitigating False Refusal in Language Models via Single Vector Ablation
LLM
Safety
Alignment
FalseRefusal
ActivationEngineering
Interpretability
VectorAblation
ICLR2025