본문으로 건너뛰기
Juhyeon's Blog
Search
검색
다크 모드
라이트 모드
탐색기
태그: Theory
14건의 항목
2026년 6월 04일
Belief in the Machine - Investigating Epistemological Blind Spots of Language Models
LLM
Epistemology
Belief
Knowledge
KaBLE
Benchmark
TheoryOfMind
Factivity
FirstPerson
Self-Consciousness
Evaluation
Theory
2026년 6월 04일
C0-C1-C2 Theory(GNWT - Global Neuronal Workspace Theory)
consciousness
GNWT
GlobalWorkspace
C0C1C2
Metacognition
SelfMonitoring
Dehaene
NeuroscienceTheory
AIConsciousness
Theory
SC-TOM
2026년 6월 04일
Can LLMs Lie - Investigation beyond Hallucination
LLM
Deception
Hallucination
Safety
Interpretability
Steering
Alignment
Theory
2026년 6월 04일
Deep Learning and the Information Bottleneck Principle
Theory
InformationBottleneck
DeepLearningTheory
RepresentationLearning
MutualInformation
Generalization
Tishby
ITW2015
2026년 6월 04일
Deep Learning for Case-Based Reasoning through Prototypes- A Neural Network that Explains Its Predictions
XAI
Interpretability
PrototypeLearning
CaseBasedReasoning
Autoencoder
DeepLearning
AAAI2018
Theory
2026년 6월 04일
DeepSHAP- Explaining a Series of Models by Propagating Shapley Values
XAI
Interpretability
SHAP
DeepSHAP
ShapleyValue
ModelPipeline
DeepLearning
Theory
2026년 6월 04일
Grad-CAM- Visual Explanations from Deep Networks via Gradient-based Localization
XAI
Interpretability
GradCAM
CNN
ClassActivationMap
VisualExplanation
Theory
ICCV2017
2026년 6월 04일
Interpretability Beyond Feature Attribution- Quantitative Testing with Concept Activation Vectors (TCAV)
XAI
Interpretability
TCAV
ConceptActivationVector
ICML2018
Probing
Theory
2026년 6월 04일
JULI - Jailbreak Large Language Models by Self-Introspection
Jailbreak
LLM-Safety
Adversarial-Attack
Black-Box-Attack
Self-Introspection
BiasNet
AlignmentRobustness
Theory
2026년 6월 04일
Knowing What LLMs DO NOT Know - A Simple Yet Effective Self-Detection Method
LLM
Hallucination
SelfDetection
Uncertainty
Metacognition
NAACL2024
SelfKnowledge
Theory
2026년 6월 04일
LIME- “Why Should I Trust You”- Explaining the Predictions of Any Classifier
XAI
LIME
Interpretability
ModelAgnostic
LocalExplanation
SurrogateModel
KDD2016
Theory
2026년 6월 04일
Principles for Responsible AI Consciousness Research
AI-Ethics
AI-Consciousness
Moral-Status
Research-Ethics
Sentience
AI-Governance
Theory
Normative
Butlin2025
2026년 6월 04일
SHAP-A Unified Approach to Interpreting Model Predictions
XAI
SHAP
ShapleyValue
FeatureAttribution
Interpretability
GameTheory
Theory
NIPS2017
2026년 6월 04일
TreeSHAP- Consistent Individualized Feature Attribution for Tree Ensembles
XAI
SHAP
TreeSHAP
ShapleyValues
TreeEnsembles
FeatureAttribution
Interpretability
Theory