본문으로 건너뛰기

Juhyeon's Blog

태그: LLM-Safety

2건의 항목

  • 2026년 6월 04일

    HarmBench - A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

    • Benchmark
    • RedTeaming
    • LLM-Safety
    • Adversarial-Attack
    • Jailbreak
    • ASR
    • ICML2024
  • 2026년 6월 04일

    JULI - Jailbreak Large Language Models by Self-Introspection

    • Jailbreak
    • LLM-Safety
    • Adversarial-Attack
    • Black-Box-Attack
    • Self-Introspection
    • BiasNet
    • AlignmentRobustness
    • Theory

키보드 단축키

/ 또는 Ctrl+K검색
?단축키 도움말
Esc모달 닫기

Created with Quartz v4.5.2 © 2026

  • GitHub
  • Blog