본문으로 건너뛰기

Juhyeon's Blog

❯

❯

❯

❯

SelfControl of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller

SelfControl of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller

2026년 2월 11일1분 분량

Introduction

LLM의 자기평가(self-evaluation) gradient를 활용하여 behavior를 control하는 SelfControl 제안
Human annotation 없이 자연어 suffix로 desired behavior를 표현
Gradient를 latent representation에 직접 적용하여 auto-regressive generation 제어

Related Papers

Representation engineering
Inference-time intervention

Methods

Self-evaluation suffix의 gradient 계산
SelfControl: gradient를 직접 generation에 적용
SelfControl_Prefix: gradient의 learned representation을 compact module로 압축

Results

Detoxification 8.3%, truthfulness 3.1%, emotion control 4-10%, privacy protection 48.2% 개선
Data synthesis 및 reasoning ability 향상에도 활용 가능

Discussion

LLM의 self-evaluation 능력을 직접적으로 활용하는 방법
Self-awareness를 output control로 연결하는 실용적 접근

공유하기

그래프 뷰

Introduction
Related Papers
Methods
Results
Discussion

Properties

Author: Min Cai et al.
Comment: Self-evaluation gradient를 이용한 inference-time LLM behavior control, 여러 도메인에서 SOTA 달성
IsTargetPaper: true
Journal/Conference: arXiv
Published Year: 2024
Reading Status: Not Started
Review Date: 2026-02-01
Topic: LLM self-control, representation engineering, inference-time control
URL: https://arxiv.org/abs/2406.02721

백링크

Architecture
Fundamentals
LLMs
Memory
self-consciousness
Unlabeled
Vision

Created with Quartz v4.5.2 © 2026

GitHub
Blog