Attention 종류들 정리
각 항목은 대표 논문 노트로 wikilink 되어 있다.
HW-bottleneck (메모리/IO 효율)
- Flash (I/O-aware):
- FlashAttention (Dao 2022)
- FlashAttention-2 (Dao 2023)
- Paged (memory space):
- vLLM (Kwon et al. SOSP 2023)
Computational Efficiency (O(N²) → O(N))
- Linear Attention (Katharopoulos ICML 2020)
- FAVOR+) (Choromanski ICLR 2021)
- State-Space-style (RWKV) (Peng 2023)
- Retention (RetNet) (Sun 2023)
- Selective SSM (Mamba) (Gu & Dao 2023)
Sparse (Pattern Constraint)
- Longformer — Global + Local Window (Beltagy 2020)
- BigBird — Random + Block Sparse (Zaheer NeurIPS 2020)
- Reformer — LSH Bucketing (Kitaev ICLR 2020)
- Sparse Transformer — Strided / Fixed Patterns (Child 2019)
- Axial Attention (Ho 2019)
Multi-head 변형 (Head 수 조절)
- Multi-Query Attention (MQA) (Shazeer 2019)
- Grouped-Query Attention (GQA) (Ainslie EMNLP 2023)
Causal / Decoder 전용 (Autoregressive 효율)
- Causal Attention (Masked Self-Attention) — 원조 Transformer (Vaswani NeurIPS 2017,
Architecture/) - Sliding Window Attention (Mistral 7B) (Jiang 2023)
허브 / 크로스 레퍼런스
- Transformer Attention Variants Survey — 통합 서베이 (다른 폴더
Architecture/) - Attention Is All You Need — 원조 Self-Attention 정의 (
Architecture/)