본문으로 건너뛰기
Juhyeon's Blog
Search
검색
다크 모드
라이트 모드
탐색기
태그: llm-evaluation
4건의 항목
2026년 6월 04일
Big Bench - Beyond the Imitation Game - Quantifying and extrapolating the capabilities of language models
paper
benchmark
llm-evaluation
emergent-abilities
scaling
social-bias
few-shot
language-model
2026년 6월 04일
Chatbot Arena - An Open Platform for Evaluating LLMs by Human Preference
benchmark
human-preference
elo-rating
bradley-terry
pairwise-comparison
crowdsourcing
lmsys
chatbot-arena
llm-evaluation
icml-2024
2026년 6월 04일
Principled Personas - Defining and Measuring the Intended Effects of Persona Prompting on Task Performance
persona
llm-evaluation
robustness
benchmark
prompting
emnlp2025
normative-evaluation
expertise
2026년 6월 04일
PromptBench - A Unified Library for Evaluation of Large Language Models
llm-evaluation
library
adversarial-prompt
dynamic-evaluation
prompt-engineering
benchmark
jmlr2024
microsoft