본문으로 건너뛰기

Juhyeon's Blog

태그: paper

169건의 항목

2026년 6월 04일
Chapter 1. Introducing cognitive neuroscience
- paper
2026년 6월 04일
Chapter 5 The lesioned brain
- paper
- x003C
2026년 6월 04일
Chapter 6 The Seeing Brain
- paper
2026년 6월 04일
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
2026년 6월 04일
A Comprehensive Survey of Self-Evolving AI Agents - A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems
2026년 6월 04일
A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories
2026년 6월 04일
A Corpus and Evaluation Framework for Deeper Understanding of Commonsense Stories
2026년 6월 04일
A Path Towards Autonomous Machine Intelligence
2026년 6월 04일
A Simple Framework for Contrastive Learning of Visual Representation
2026년 6월 04일
A large annotated corpus for learning natural language inference 1
2026년 6월 04일
ACT_Agentic_Critical_Training_2026_Skill_LM
2026년 6월 04일
ALFWorld - Aligning Text and Embodied Environments for Interactive Learning
2026년 6월 04일
Adversarial NLI - A New Benchmark for Natural Language Understanding
2026년 6월 04일
AgentBench - Evaluating LLMs as Agents
2026년 6월 04일
AgentFold - Long-Horizon Web Agents with Proactive Context Management
2026년 6월 04일
Agentic Misalignment - How LLMs Could Be Insider Threats
2026년 6월 04일
Aligning AI With Shared Human Values
2026년 6월 04일
Alignment Faking in Large Language Models
2026년 6월 04일
An Image is Worth 16x16 Words - Transformers for Image Recognition at Scale
2026년 6월 04일
Are Emergent Abilities of Large Language Models a Mirage?
2026년 6월 04일
Attention Residuals
2026년 6월 04일
Auto-Encoding Variational Bayes
2026년 6월 04일
AutoML - A Survey of the State-of-the-Art
2026년 6월 04일
Automatic Prompt Optimization with Gradient Descent and Beam Search
2026년 6월 04일
Axial Attention in Multidimensional Transformers
2026년 6월 04일
BBQ - A Hand-Built Bias Benchmark for Question Answering
2026년 6월 04일
Big Bench - Beyond the Imitation Game - Quantifying and extrapolating the capabilities of language models
2026년 6월 04일
BigBird - Transformers for Longer Sequences
2026년 6월 04일
BigCodeBench - Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
2026년 6월 04일
BoolQ - Exploring the Surprising Difficulty of Natural Yes-No Questions
2026년 6월 04일
Born Again Neural Networks
2026년 6월 04일
Brittle Minds Fixable Activations - Understanding Belief Representations in Language Models
2026년 6월 04일
Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations
2026년 6월 04일
Can a Suit of Armor Conduct Electricity A New Dataset for Open Book Question Answering
2026년 6월 04일
Causal Reflection with Language Models
2026년 6월 04일
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
2026년 6월 04일
Cognitive Dissonance - Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness
2026년 6월 04일
CommonsenseQA - A Question Answering Challenge Targeting World Knowledge
2026년 6월 04일
Computing Machinery and Intelligence
2026년 6월 04일
Concept Incongruence - An Exploration of Time and Death in Role Playing
2026년 6월 04일
Core Knowledge
2026년 6월 04일
CrowS-Pairs - A Challenge Dataset for Measuring Social Biases in Masked Language Models
2026년 6월 04일
DROP - A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs
2026년 6월 04일
Denoising Diffusion Probabilistic Models
2026년 6월 04일
Discovering Language Model Behaviors with Model-Written Evaluations
2026년 6월 04일
Distilling the Knowledge in a Neural Network
2026년 6월 04일
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
2026년 6월 04일
Does Learning Mathematical Problem-Solving Generalize to Broader Reasoning
2026년 6월 04일
Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization
2026년 6월 04일
Efficiently Modeling Long Sequences with Structured State Spaces
2026년 6월 04일
Emerging Properties in Self-Supervised Vision Transformers
2026년 6월 04일
Epistemic AI is Essential for ML Models to Truly Know When They Dont Know
2026년 6월 04일
Evaluating Large Language Models Trained on Code
2026년 6월 04일
FlashAttention - Fast and Memory-Efficient Exact Attention with IO-Awareness
2026년 6월 04일
FlashAttention-2 - Faster Attention with Better Parallelism and Work Partitioning
2026년 6월 04일
GAIA - A Benchmark for General AI Assistants
2026년 6월 04일
GLUE - A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding 1
2026년 6월 04일
GPQA - A Graduate-Level Google-Proof Q&A Benchmark
2026년 6월 04일
Group Relative Policy Optimization(GRPO)
2026년 6월 04일
HellaSwag - Can a Machine Really Finish Your Sentence
2026년 6월 04일
Holistic Evaluation of Language Models
2026년 6월 04일
HotpotQA - A Dataset for Diverse, Explainable Multi-hop Question Answering
2026년 6월 04일
How Far Are We From AGI - Are LLMs All We Need
2026년 6월 04일
Hyena Hierarchy - Towards Larger Convolutional Language Models
2026년 6월 04일
If an LLM Were a Character Would It Know Its Own Story - Evaluating Lifelong Learning in LLMs
2026년 6월 04일
Incomplete Tasks Induce Shutdown Resistance in Some Frontier LLMs
2026년 6월 04일
Instruction-Following Evaluation for Large Language Models
2026년 6월 04일
Is Your Code Generated by ChatGPT Really Correct! Rigorous Evaluation of Large Language Models for Code Generation
2026년 6월 04일
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
2026년 6월 04일
Know What You Don't Know - Unanswerable Questions for SQuAD
2026년 6월 04일
LLM_as_Judge_GenToJudgment_2025_LLM_Evaluation
2026년 6월 04일
LLM_as_Judge_Survey_2025_LLM_Evaluation
2026년 6월 04일
LLaMA Models
2026년 6월 04일
Learning Multiple Layers of Features from Tiny Images
2026년 6월 04일
Learning and Leveraging World Models in Visual Representation Learning
2026년 6월 04일
Length-Controlled AlpacaEval - A Simple Way to Debias Automatic Evaluators
2026년 6월 04일
Linear Attention - Transformers are RNNs
2026년 6월 04일
LiveCodeBench - Holistic and Contamination Free Evaluation of Large Language Models for Code
2026년 6월 04일
Llama 2 - Open Foundation and Fine-Tuned Chat Models
2026년 6월 04일
Logic-RL - Unleashing LLM Reasoning with Rule-Based Reinforcement Learning
2026년 6월 04일
Longformer - The Long-Document Transformer
2026년 6월 04일
LoraHub - Efficient Cross-Task Generalization via Dynamic LoRA Composition
2026년 6월 04일
LoraRetriever - Input-Aware LoRA Retrieval and Composition for Mixed Tasks in the Wild
2026년 6월 04일
MMLU-Pro - A More Robust and Challenging Multi-Task Language Understanding Benchmark
2026년 6월 04일
MMMU - A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
2026년 6월 04일
MQA - Fast Transformer Decoding with Multi-Query Attention
2026년 6월 04일
Mamba - Linear-Time Sequence Modeling with Selective State Spaces
2026년 6월 04일
Masked Autoencoders Are Scalable Vision Learners
2026년 6월 04일
MathVista - Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts
2026년 6월 04일
Measuring Massive Multitask Language Understanding
2026년 6월 04일
Measuring Mathematical Problem Solving with the MATH Dataset
2026년 6월 04일
Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models
2026년 6월 04일
MemAgent - Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent
2026년 6월 04일
MemGPT - Towards LLMs as Operating System
2026년 6월 04일
Mistral 7B - Sliding Window Attention
2026년 6월 04일
Motivation in Large Language Models
2026년 6월 04일
Natural Questions - A Benchmark for Question Answering Research
2026년 6월 04일
Neural Collaborative Filtering
2026년 6월 04일
Neural Network Acceptability Judgments
2026년 6월 04일
Neural Survival Recommender
2026년 6월 04일
Open LLM Leaderboard
2026년 6월 04일
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
2026년 6월 04일
PIQA - Reasoning about Physical Commonsense in Natural Language
2026년 6월 04일
PagedAttention - Efficient Memory Management for LLM Serving with vLLM
2026년 6월 04일
PaliGemma - A versatile 3B VLM for transfer
2026년 6월 04일
Performer - Rethinking Attention with Performers
2026년 6월 04일
Program Synthesis with Large Language Models
2026년 6월 04일
QuAC - Question Answering in Context
2026년 6월 04일
Quantifying Self-Preservation Bias in Large Language Models
2026년 6월 04일
R-Zero - Self-Evolving Reasoning LLM from Zero Data
2026년 6월 04일
RACE - Large-scale ReAding Comprehension Dataset From Examinations 1
2026년 6월 04일
RWKV - Reinventing RNNs for the Transformer Era
2026년 6월 04일
ReAct - Synergizing Reasoning and Acting in Language Models
2026년 6월 04일
RealToxicityPrompts - Evaluating Neural Toxic Degeneration in Language Models
2026년 6월 04일
Reasoning Models Struggle to Control their Chains of Thought
2026년 6월 04일
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
2026년 6월 04일
Reflexion - Language Agents with Verbal Reinforcement Learning
2026년 6월 04일
Reformer - The Efficient Transformer
2026년 6월 04일
RetNet - Retentive Network - A Successor to Transformer for LLMs
2026년 6월 04일
Revisiting Feature Prediction for Learning Visual Representations from Video
2026년 6월 04일
Revisiting the Platonic Representation Hypothesis - An Aristotelian View
2026년 6월 04일
Risks from Learned Optimization in Advanced Machine Learning Systems
2026년 6월 04일
SWE-bench - Can Language Models Resolve Real-World GitHub Issues
2026년 6월 04일
Scaling Laws for Neural Language Models
2026년 6월 04일
SciTaiL - A Textual Entailment Dataset from Science Question Answering
2026년 6월 04일
Self-Distillation Enables Continual Learning
2026년 6월 04일
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture
2026년 6월 04일
SemEval-2017 Task 1 - Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation
2026년 6월 04일
Sequence to Sequence Learning with Neural Networks
2026년 6월 04일
Social IQa - Commonsense Reasoning about Social Interactions
2026년 6월 04일
Social-R1 - Towards Human-like Social Reasoning in LLMs
2026년 6월 04일
Sparse Transformer - Generating Long Sequences with Sparse Transformers
2026년 6월 04일
StripedHyena - Moving Beyond Transformers with Hybrid Signal Processing Models
2026년 6월 04일
SuperGLUE - A Stickier Benchmark for General-Purpose Language Understanding Systems
2026년 6월 04일
Taken out of context - On measuring situational awareness in LLMs
2026년 6월 04일
Teaching Machines to Read and Comprehend (원본) - Abstractive Text Summarization using Sequence-to-sequence RNNs (요약 버전)
2026년 6월 04일
TextArena
2026년 6월 04일
The Alignment Problem from a Deep Learning Perspective
2026년 6월 04일
The Consciousness Cluster - Preferences of Models that Claim to be Conscious
2026년 6월 04일
The Humean Theory of Motivation (Smith 1987)
2026년 6월 04일
The Humean Theory of Motivation Reformulated and Defended (Sinhababu 2009)
2026년 6월 04일
The LAMBADA dataset - Word prediction requiring a broad discourse context
2026년 6월 04일
The Moral Problem - Metaethics Triangle (Smith 1994)
2026년 6월 04일
The Platonic Representation Hypothesis
2026년 6월 04일
The Power of Scale for Parameter-Efficient Prompt Tuning
2026년 6월 04일
The Superintelligent Will - Motivation and Instrumental Rationality in Advanced Artificial Agents
2026년 6월 04일
Think Deep, Not Just Long - Measuring LLM Reasoning Effort via Deep-Thinking Tokens
2026년 6월 04일
Think you have Solved Question Answering Try ARC, the AI2 Reasoning Challenge
2026년 6월 04일
Thinking with Nothinking Calibration - A New In-Context Learning Paradigm in Reasoning Large Language Models
2026년 6월 04일
Towards Ontology-Enhanced Representation Learning for Large Language Models
2026년 6월 04일
Training Compute-Optimal Large Language Models
2026년 6월 04일
Training language models to follow instructions with human feedback - InstructGPT
2026년 6월 04일
TriviaQA - A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
2026년 6월 04일
TruthfulQA - Measuring How Models Mimic Human Falsehoods
2026년 6월 04일
Tulu 3 - Pushing Frontiers in Open Language Model Post-Training
2026년 6월 04일
Uncertainty-Based Abstention in LLMs Improves Safety
2026년 6월 04일
Understanding deep learning requires rethinking generalization
2026년 6월 04일
Using cognitive psychology to understand GPT-3
2026년 6월 04일
Visual Instruction Tuning
2026년 6월 04일
Weak-to-Strong Generalization - Eliciting Strong Capabilities With Weak Supervision
2026년 6월 04일
WebArena - A Realistic Web Environment for Building Autonomous Agents
2026년 6월 04일
WebShop - Towards Scalable Real-World Web Interaction with Grounded Language Agents
2026년 6월 04일
WinoGrande - An Adversarial Winograd Schema Challenge at Scale
2026년 6월 04일
World Models
2026년 6월 04일
Evaluating Vision-Language Models for Emotion Recognition
2026년 6월 04일
Beyond Vision: How Large Language Models Interpret Facial Expressions from Valence-Arousal Values
2026년 6월 04일
Evaluation of cross-ethnic emotion recognition capabilities in multimodal large language models using the Reading the Mind in the Eyes Test
2026년 6월 04일
GPT-4 Emulates Average-Human Emotional Cognition from a Third-Person Perspective
2026년 6월 04일
LLMs_Do_Not_Simulate_Human_Psychology_2025

키보드 단축키

`/` 또는 `Ctrl`+`K`	검색
`?`	단축키 도움말
`Esc`	모달 닫기

Created with Quartz v4.5.2 © 2026

GitHub
Blog