본문으로 건너뛰기

Juhyeon's Blog

❯

❯

❯

❯

Mamba Linear Time Sequence Modeling with Selective State Spaces

Mamba - Linear Time Sequence Modeling with Selective State Spaces

2026년 2월 11일1분 분량

I. Introduction

II. Background

III. Framework

III-I. Unsupervised Pre-training

III-II. Supervised Fine-tuning

III-III. Task-specific input Transformation

IV. Experiment

IV-I. Setup

Datasets

Model Specification

IV-II. Supervised Fine-tuning

Hyper-parameters

LR : $6.25 \times 1 0^{- 5}$
- lr-decay : .2 with warmup
- $λ :$ .5
batch-size : 32
dropout : .1
epochs : 3

Loss(Objective)

V. Analysis

V-I. Impact of number of layers transferred

V-II. Zero-shot Behavior

V-III. Ablation Study

VI. Conclusion

Contribution

공유하기

그래프 뷰

I. Introduction
II. Background
III. Framework
III-I. Unsupervised Pre-training
III-II. Supervised Fine-tuning
III-III. Task-specific input Transformation
IV. Experiment
IV-I. Setup
Datasets
Model Specification
IV-II. Supervised Fine-tuning
Hyper-parameters
Loss(Objective)
V. Analysis
V-I. Impact of number of layers transferred
V-II. Zero-shot Behavior
V-III. Ablation Study
VI. Conclusion
Contribution

Properties

Linked Bases: [[Architecture.base]]

백링크

Architecture
Fundamentals
LLMs
Memory
self-consciousness
Vision

Created with Quartz v4.5.2 © 2026

GitHub
Blog