본문으로 건너뛰기

Juhyeon's Blog

태그: Safety

5건의 항목

  • 2026년 6월 04일

    Annotation-Efficient Universal Honesty Alignment for LLMs

    • Paper
    • LLM
    • HonestyAlignment
    • Calibration
    • SelfConsistency
    • AnnotationEfficiency
    • Training
    • ICLR2026
    • Safety
    • Hallucination
  • 2026년 6월 04일

    Can LLMs Lie - Investigation beyond Hallucination

    • LLM
    • Deception
    • Hallucination
    • Safety
    • Interpretability
    • Steering
    • Alignment
    • Theory
  • 2026년 6월 04일

    Know Your Limits - A Survey of Abstention in Large Language Models

    • Survey
    • LLM
    • Abstention
    • SelectivePrediction
    • Uncertainty
    • Calibration
    • Safety
    • Alignment
    • RLHF
    • Hallucination
  • 2026년 6월 04일

    Reasoning Models Struggle to Control their Chains of Thought

    • paper
    • Safety
    • CoT
    • Monitoring
    • Controllability
    • Alignment
    • ReasoningModels
    • LLM
  • 2026년 6월 04일

    Surgical Cheap and Flexible - Mitigating False Refusal in Language Models via Single Vector Ablation

    • LLM
    • Safety
    • Alignment
    • FalseRefusal
    • ActivationEngineering
    • Interpretability
    • VectorAblation
    • ICLR2025

키보드 단축키

/ 또는 Ctrl+K검색
?단축키 도움말
Esc모달 닫기

Created with Quartz v4.5.2 © 2026

  • GitHub
  • Blog