본문으로 건너뛰기

Juhyeon's Blog

❯

❯

❯

Attention Methods

Attention Methods

2026년 6월 04일1분 분량

attention
index
taxonomy

Attention 종류들 정리

각 항목은 대표 논문 노트로 wikilink 되어 있다.

HW-bottleneck (메모리/IO 효율)

Flash (I/O-aware):
- FlashAttention (Dao 2022)
- FlashAttention-2 (Dao 2023)
Paged (memory space):
- vLLM (Kwon et al. SOSP 2023)

Computational Efficiency (O(N²) → O(N))

Linear Attention (Katharopoulos ICML 2020)
FAVOR+) (Choromanski ICLR 2021)
State-Space-style (RWKV) (Peng 2023)
Retention (RetNet) (Sun 2023)
Selective SSM (Mamba) (Gu & Dao 2023)

Sparse (Pattern Constraint)

Longformer — Global + Local Window (Beltagy 2020)
BigBird — Random + Block Sparse (Zaheer NeurIPS 2020)
Reformer — LSH Bucketing (Kitaev ICLR 2020)
Sparse Transformer — Strided / Fixed Patterns (Child 2019)
Axial Attention (Ho 2019)

Multi-head 변형 (Head 수 조절)

Multi-Query Attention (MQA) (Shazeer 2019)
Grouped-Query Attention (GQA) (Ainslie EMNLP 2023)

Causal / Decoder 전용 (Autoregressive 효율)

Causal Attention (Masked Self-Attention) — 원조 Transformer (Vaswani NeurIPS 2017, Architecture/)
Sliding Window Attention (Mistral 7B) (Jiang 2023)

허브 / 크로스 레퍼런스

Transformer Attention Variants Survey — 통합 서베이 (다른 폴더 Architecture/)
Attention Is All You Need — 원조 Self-Attention 정의 (Architecture/)

공유하기

그래프 뷰

Attention 종류들 정리
HW-bottleneck (메모리/IO 효율)
Computational Efficiency (O(N²) → O(N))
Sparse (Pattern Constraint)
Multi-head 변형 (Head 수 조절)
Causal / Decoder 전용 (Autoregressive 효율)
허브 / 크로스 레퍼런스

Properties

Category: Architecture
Description: Attention 개량 버전 분류 인덱스
Linked Bases: [[Attention-methods.base]]
Type: paper

백링크

The Student's Guide to Cognitive NeuroScience
AI-Books
Agents
Architecture
Attention-methods
Axial Attention in Multidimensional Transformers
Benchmarks
Biology
Diffusion
FlashAttention - Fast and Memory-Efficient Exact Attention with IO-Awareness
Fundamentals
LLMs
Linear Attention - Transformers are RNNs
Longformer - The Long-Document Transformer
MQA - Fast Transformer Decoding with Multi-Query Attention
Memory
Mistral 7B - Sliding Window Attention
Model-Compression
Motivation
NLP
Neural Machine Translation by Jointly Learning to Align and Translate
Optimization
PagedAttention - Efficient Memory Management for LLM Serving with vLLM
Performer - Rethinking Attention with Performers
Psycholinguistics
RWKV - Reinventing RNNs for the Transformer Era
Reasoning
RecSys
Reformer - The Efficient Transformer
Representation-Learning
RetNet - Retentive Network - A Successor to Transformer for LLMs
Self-Evolving
Self-Preservation
Sparse Transformer - Generating Long Sequences with Sparse Transformers
Survival-Analysis
Theory of mind
Vision
World-Model
self-consciousness

Created with Quartz v4.5.2 © 2026

GitHub
Blog