by Moonlight

  1. ๐Ÿง  ReAct๋Š” ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(LLM)์—์„œ ์ถ”๋ก (reasoning)๊ณผ ํ–‰๋™(acting)์„ ์ƒํ˜ธ ๊ต์ฐจํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ํ†ตํ•ฉํ•˜์—ฌ ์‹œ๋„ˆ์ง€๋ฅผ ์ฐฝ์ถœํ•˜๋Š” ์ƒˆ๋กœ์šด ํ”„๋กฌํ”„ํŠธ ๊ธฐ๋ฐ˜ ํŒจ๋Ÿฌ๋‹ค์ž„์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.
  2. ๐Ÿ’ก ์ด ์ ‘๊ทผ ๋ฐฉ์‹์€ ์ถ”๋ก  ํŠธ๋ ˆ์ด์Šค(reasoning traces)๋ฅผ ํ†ตํ•ด ํ–‰๋™ ๊ณ„ํš์„ ์œ ๋„, ์ถ”์  ๋ฐ ์—…๋ฐ์ดํŠธํ•˜๋ฉฐ, ํ–‰๋™์„ ํ†ตํ•ด ์™ธ๋ถ€ ํ™˜๊ฒฝ๊ณผ ์ƒํ˜ธ ์ž‘์šฉํ•˜์—ฌ CoT(Chain-of-Thought) ์ถ”๋ก ์˜ ํ™˜๊ฐ ๋ฐ ์˜ค๋ฅ˜ ์ „ํŒŒ ๋ฌธ์ œ๋ฅผ ๊ทน๋ณตํ•˜๊ณ  ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ์„ ๋†’์ž…๋‹ˆ๋‹ค.
  3. ๐Ÿš€ HotpotQA, ALFWorld, WebShop ๋“ฑ ๋‹ค์–‘ํ•œ ์–ธ์–ด ๋ฐ ์˜์‚ฌ๊ฒฐ์ • ๋ฒค์น˜๋งˆํฌ์—์„œ ReAct๋Š” ๋‹จ ํ•œ๋‘ ๊ฐœ์˜ in-context ์˜ˆ์‹œ๋งŒ์œผ๋กœ ์ตœ์‹ (state-of-the-art) ๊ธฐ์ค€์„ ์„ ํฌ๊ฒŒ ๋Šฅ๊ฐ€ํ•˜๋Š” ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

์ถ”๋ก ๊ณผ ํ–‰๋™์˜ ์‹œ๋„ˆ์ง€: ์–ธ์–ด ๋ชจ๋ธ์˜ ReAct ํŒจ๋Ÿฌ๋‹ค์ž„

Digest: ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ(LLM)์˜ ์ถ”๋ก  ๋Šฅ๋ ฅ(Chain-of-Thought, ๋‹จ๊ณ„์  ์‚ฌ๊ณ  ์ „๊ฐœ)๊ณผ ํ–‰๋™ ๋Šฅ๋ ฅ(์™ธ๋ถ€ ํ™˜๊ฒฝ๊ณผ์˜ ์ƒํ˜ธ์ž‘์šฉ)์€ ์„œ๋กœ ๋ถ„๋ฆฌ๋œ ์ฑ„ ์—ฐ๊ตฌ๋˜์–ด ์™”๋‹ค. CoT๋Š” ๋ชจ๋ธ ๋‚ด๋ถ€์—์„œ๋งŒ ์ถ”๋ก ํ•˜๋ฏ€๋กœ **์‚ฌ์‹ค ํ™˜๊ฐ(hallucination)**์ด ์‹ฌ๊ฐํ•˜๊ณ , ํ–‰๋™ ์ „์šฉ ๋ฐฉ์‹์€ ๊ณ ์ˆ˜์ค€ ๊ณ„ํš ์—†์ด ๋‹จ์ˆœํžˆ ์•ก์…˜๋งŒ ๋‚˜์—ดํ•˜์—ฌ ๋ณต์žกํ•œ ํƒœ์Šคํฌ์—์„œ ์‹คํŒจํ•œ๋‹ค. ReAct๋Š” ์ด ๋‘ ๊ฐ€์ง€๋ฅผ ํ•˜๋‚˜์˜ ํ† ํฐ ์ŠคํŠธ๋ฆผ ์•ˆ์—์„œ ๊ต์ฐจ ์ƒ์„ฑํ•จ์œผ๋กœ์จ ์‹œ๋„ˆ์ง€๋ฅผ ๋งŒ๋“ค์–ด๋‚ธ๋‹ค. ํ•ต์‹ฌ ํ†ต์ฐฐ์€ โ€œ์‚ฌ๊ณ (thought)โ€œ๋ฅผ ํ™˜๊ฒฝ์— ์˜ํ–ฅ์„ ์ฃผ์ง€ ์•Š๋Š” ํŠน์ˆ˜ํ•œ ํ–‰๋™์œผ๋กœ ์ •์‹ํ™”(ร‚ = A โˆช L)ํ•˜์—ฌ, ์ถ”๋ก ์ด ํ–‰๋™ ๊ณ„ํš์„ ์•ˆ๋‚ดํ•˜๊ณ  ํ–‰๋™์˜ ๊ด€์ฐฐ ๊ฒฐ๊ณผ๊ฐ€ ๋‹ค์‹œ ์ถ”๋ก ์„ ๊ฐฑ์‹ ํ•˜๋Š” ๋ฃจํ”„๋ฅผ ๊ตฌ์ถ•ํ•œ ๊ฒƒ์ด๋‹ค. ๊ทธ ๊ฒฐ๊ณผ HotpotQA์—์„œ ReActโ†’CoT-SC ์กฐํ•ฉ์ด 35.1 EM(Table 1)์„, ALFWorld์—์„œ 71% ์„ฑ๊ณต๋ฅ (Table 3)์„ ๋‹ฌ์„ฑํ–ˆ์œผ๋ฉฐ, **์‹คํŒจ ์‚ฌ๋ก€์—์„œ ํ™˜๊ฐ ๋น„์œจ์ด 0%(CoT๋Š” 56%)**๋กœ ๊ทผ๋ณธ์ ์ธ ์‹ ๋ขฐ์„ฑ ํ–ฅ์ƒ์„ ๋ณด์˜€๋‹ค. ๋‹จ 16๊ฐœ์˜ in-context ์˜ˆ์‹œ๋งŒ์œผ๋กœ 10ยณ10โต๊ฐœ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•œ ๋ชจ๋ฐฉํ•™์Šต/๊ฐ•ํ™”ํ•™์Šต ๊ธฐ์ค€์„ ์„ ๋Šฅ๊ฐ€ํ•˜๋ฉฐ, 3,000๊ฐœ ์˜ˆ์‹œ๋กœ ํŒŒ์ธํŠœ๋‹ํ•œ PaLM-62B๊ฐ€ PaLM-540B ํ”„๋กฌํ”„ํŒ… ์ „์ฒด๋ฅผ ์ƒํšŒํ•˜์—ฌ ๋ฐ์ดํ„ฐ ํšจ์œจ์„ฑ๊นŒ์ง€ ์ž…์ฆํ–ˆ๋‹ค.


์„น์…˜๋ณ„ ์š”์•ฝ

Introduction

์ธ๊ฐ„์€ ์ถ”๋ก ๊ณผ ํ–‰๋™์„ ๊ธด๋ฐ€ํ•˜๊ฒŒ ๊ฒฐํ•ฉํ•˜์—ฌ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•œ๋‹ค. Vygotsky(1987)์˜ ๋‚ด์  ์–ธ์–ด(inner speech)์™€ Baddeley(1992)์˜ ์ž‘์—…๊ธฐ์–ต ๋ชจ๋ธ์—์„œ ์˜๊ฐ์„ ์–ป์–ด, ์ €์ž๋“ค์€ LLM์ด ์ถ”๋ก  ํŠธ๋ ˆ์ด์Šค์™€ ํƒœ์Šคํฌ๋ณ„ ํ–‰๋™์„ ๊ต์ฐจ ์ƒ์„ฑํ•˜๋„๋ก ํ•˜๋Š” ReAct ํŒจ๋Ÿฌ๋‹ค์ž„์„ ์ œ์•ˆํ•œ๋‹ค. ๊ธฐ์กด CoT(Wei et al., 2022)๋Š” ์™ธ๋ถ€ ์„ธ๊ณ„์™€ ๋‹จ์ ˆ๋œ ์ •์  ์ถ”๋ก ์ด๋ผ ํ™˜๊ฐ๊ณผ ์˜ค๋ฅ˜ ์ „ํŒŒ๊ฐ€ ์‹ฌ๊ฐํ•˜๊ณ , ํ–‰๋™ ์ „์šฉ ์ ‘๊ทผ(WebGPT, SayCan ๋“ฑ)์€ ๊ณ ์ˆ˜์ค€ ์ถ”๋ก  ์—†์ด ํ–‰๋™๋งŒ ์ƒ์„ฑํ•œ๋‹ค. ReAct๋Š” ์ด ์–‘์ชฝ์˜ ํ•œ๊ณ„๋ฅผ ๋™์‹œ์— ํ•ด๊ฒฐํ•˜๋ฉด์„œ, ์ถ”๋ก  ํŠธ๋ ˆ์ด์Šค์˜ ๊ฐ€์‹œ์„ฑ์„ ํ†ตํ•ด ํ•ด์„๊ฐ€๋Šฅ์„ฑ๊ณผ human-in-the-loop ์ œ์–ด ๊ฐ€๋Šฅ์„ฑ๊นŒ์ง€ ํ™•๋ณดํ•œ๋‹ค.

Methods

ReAct์˜ ํ•ต์‹ฌ์€ ์—์ด์ „ํŠธ์˜ ํ–‰๋™ ๊ณต๊ฐ„์„ ํ™•์žฅํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ๊ธฐ์กด ๋„๋ฉ”์ธ ํ–‰๋™ ์ง‘ํ•ฉ A์— ์ž์—ฐ์–ด ๊ณต๊ฐ„ L์„ ์ถ”๊ฐ€ํ•˜์—ฌ ร‚ = A โˆช L๋กœ ์ •์˜ํ•œ๋‹ค.

์ •์ฑ… ๋Š” ์ปจํ…์ŠคํŠธ ๋ฅผ ์กฐ๊ฑด์œผ๋กœ, ๊ฐ ํƒ€์ž„์Šคํ…์—์„œ ๋‘ ์ข…๋ฅ˜์˜ ํ–‰๋™ ์ค‘ ํ•˜๋‚˜๋ฅผ ์ƒ์„ฑํ•œ๋‹ค:

  • ์‚ฌ๊ณ  : ํ™˜๊ฒฝ์— ์–ด๋–ค ํ”ผ๋“œ๋ฐฑ๋„ ์ผ์œผํ‚ค์ง€ ์•Š์œผ๋ฉฐ, ์ปจํ…์ŠคํŠธ๋ฅผ ๋กœ ์—…๋ฐ์ดํŠธํ•œ๋‹ค. ๋ชฉํ‘œ ๋ถ„ํ•ด, ์ง„ํ–‰ ์ถ”์ , ์ƒ์‹ ์ฃผ์ž…, ์˜ˆ์™ธ ์ฒ˜๋ฆฌ ๋“ฑ์˜ ์—ญํ• ์„ ์ˆ˜ํ–‰ํ•œ๋‹ค.
  • ํ–‰๋™ : ์™ธ๋ถ€ ํ™˜๊ฒฝ๊ณผ ์ƒํ˜ธ์ž‘์šฉํ•˜์—ฌ ์ƒˆ๋กœ์šด ๊ด€์ฐฐ ์„ ์ƒ์„ฑํ•œ๋‹ค.

ํƒœ์Šคํฌ ์œ ํ˜•์— ๋”ฐ๋ผ ์‚ฌ๊ณ ์˜ ๋ฐ€๋„๊ฐ€ ๋‹ฌ๋ผ์ง„๋‹ค:

ํŠน์„ฑ์ง€์‹ ์ง‘์•ฝํ˜• (HotpotQA, FEVER)์˜์‚ฌ๊ฒฐ์ • (ALFWorld, WebShop)
์‚ฌ๊ณ  ๋ฐ€๋„๋ฐ€์ง‘(dense) โ€” ๋งค ํ–‰๋™ ์‚ฌ์ดํฌ์†Œ(sparse) โ€” ๊ด€๋ จ์„ฑ ๋†’์€ ์œ„์น˜์—๋งŒ
in-context ์˜ˆ์‹œ3-6๊ฐœ1-3๊ฐœ (ํƒœ์Šคํฌ ์œ ํ˜•๋ณ„)
ํ–‰๋™ ๊ณต๊ฐ„search[entity], lookup[string], finish[answer]ํ™˜๊ฒฝ๋ณ„ ์ธํ„ฐํŽ˜์ด์Šค (ํ…์ŠคํŠธ ๊ฒŒ์ž„, ์›น ์‡ผํ•‘)
์‚ฌ๊ณ  ๋ฐฐ์น˜๋งค ๋‹จ๊ณ„๋ชจ๋ธ์ด ๋น„๋™๊ธฐ์ ์œผ๋กœ ์ž์œจ ๊ฒฐ์ •

ํ”„๋กฌํ”„ํŒ…์€ frozen LLM(PaLM-540B)์— ์ธ๊ฐ„์ด ์ž‘์„ฑํ•œ ReAct ํ˜•์‹ ๊ถค์ ์„ few-shot์œผ๋กœ ์ œ๊ณตํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค. ๋ณ„๋„์˜ ํ•™์Šต์ด๋‚˜ ๊ฐ•ํ™”ํ•™์Šต ์—†์ด๋„ ๋™์ž‘ํ•˜๋ฉฐ, ํŒŒ์ธํŠœ๋‹ ์‹œ์—๋Š” ๋ชจ๋ธ์ด ์ƒ์„ฑํ•œ ๊ถค์ ์„ ๋ถ€ํŠธ์ŠคํŠธ๋žฉ ๋ฐ์ดํ„ฐ๋กœ ํ™œ์šฉํ•œ๋‹ค.

Results

์ง€์‹ ์ง‘์•ฝํ˜• ํƒœ์Šคํฌ (PaLM-540B ํ”„๋กฌํ”„ํŒ…)

๋ฐฉ๋ฒ•HotpotQA (EM)FEVER (Acc)
Standard28.757.1
CoT29.456.3
CoT-SC (self-consistency)33.460.4
Act (ํ–‰๋™๋งŒ)25.758.9
ReAct27.460.9
CoT-SC โ†’ ReAct34.264.6
ReAct โ†’ CoT-SC35.162.0
Supervised SoTA67.589.5

ReAct ๋‹จ๋…์€ HotpotQA์—์„œ CoT(29.4)๋ณด๋‹ค ์•ฝ๊ฐ„ ๋‚ฎ์€ 27.4 EM์„ ๊ธฐ๋กํ–ˆ์ง€๋งŒ, ์ด๋Š” ํ–‰๋™ ๊ณต๊ฐ„์˜ ์ œ์•ฝ(Wikipedia API์˜ ํ•œ๊ณ„) ๋•Œ๋ฌธ์ด๋‹ค. ๋‘ ๋ฐฉ๋ฒ•์˜ ๊ฐ•์ ์ด ์ƒ๋ณด์ ์ด์–ด์„œ, ReActโ†’CoT-SC ์กฐํ•ฉ์ด 35.1๋กœ ์ตœ๊ณ  ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ–ˆ๋‹ค.

์˜ค๋ฅ˜ ๋ถ„์„(50 ์ƒ˜ํ”Œ)์—์„œ ReAct๋Š” ์„ฑ๊ณต ์‚ฌ๋ก€ ์ค‘ 94%๊ฐ€ ์˜ฌ๋ฐ”๋ฅธ ์ถ”๋ก , ์‹คํŒจ ์‚ฌ๋ก€์—์„œ ํ™˜๊ฐ 0% (vs CoT 56%)๋ฅผ ๊ธฐ๋กํ–ˆ๋‹ค. ReAct์˜ ์ฃผ์š” ์‹คํŒจ ์›์ธ์€ ์ถ”๋ก  ์˜ค๋ฅ˜(47%)์™€ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ ๋ถ€์žฌ(23%)์˜€๋‹ค.

์˜์‚ฌ๊ฒฐ์ • ํƒœ์Šคํฌ

๋ฐฉ๋ฒ•ALFWorld ์„ฑ๊ณต๋ฅ WebShop Score / SR
BUTLER (best of 8)37%โ€”
Act (best of 6)45%62.3 / 30.1%
IL+RLโ€”62.4 / 28.7%
ReAct (best of 6)71%66.6 / 40.0%
Human Expertโ€”82.1 / 59.6%

ALFWorld์—์„œ ReAct๋Š” BUTLER(10ยณ10โตํ›ˆ๋ จ ๋ฐ์ดํ„ฐ ์‚ฌ์šฉ) ๋Œ€๋น„ 34%p ๋†’์€ ์„ฑ๊ณต๋ฅ ์„, WebShop์—์„œ IL+RL ๋Œ€๋น„ 10%p ์ด์ƒ ๋†’์€ ์„ฑ๊ณต๋ฅ ์„ ๋‹ฌ์„ฑํ–ˆ๋‹ค. ๋‹จ 12๊ฐœ์˜ in-context ์˜ˆ์‹œ๋งŒ์œผ๋กœ ์ด ๊ฒฐ๊ณผ๋ฅผ ์–ป์—ˆ๋‹ค๋Š” ์ ์ด ํ•ต์‹ฌ์ด๋‹ค.

ALFWorld ํƒœ์Šคํฌ๋ณ„ ์ƒ์„ธ ์„ฑ๊ณต๋ฅ  (Table 3):

๋ฐฉ๋ฒ•PickCleanHeatCoolLookPick2์ „์ฒด
Act (best of 6)88427467724145
ReAct (best of 6)92589686784171
BUTLER (best of 8)463974100222437

ํŒŒ์ธํŠœ๋‹ ๊ฒฐ๊ณผ: PaLM-62B๋ฅผ 3,000๊ฐœ ReAct ๊ถค์ ์œผ๋กœ ํŒŒ์ธํŠœ๋‹ํ•œ ๊ฒฐ๊ณผ, PaLM-540B์˜ ๋ชจ๋“  ํ”„๋กฌํ”„ํŒ… ๋ฐฉ๋ฒ•(Standard/CoT/Act/ReAct)์„ ์ƒํšŒํ•˜์—ฌ ์†Œ๊ทœ๋ชจ ๋ชจ๋ธ์˜ ๋ฐ์ดํ„ฐ ํšจ์œจ์  ํ•™์Šต ๊ฐ€๋Šฅ์„ฑ์„ ์ž…์ฆํ–ˆ๋‹ค.

GPT-3 ์‹คํ—˜: GPT-3(text-davinci-002)์—์„œ๋„ HotpotQA 30.8 EM, ALFWorld 78.4% ์„ฑ๊ณต๋ฅ ์„ ๊ธฐ๋กํ•˜์—ฌ ๋ชจ๋ธ ๋…๋ฆฝ์  ์ผ๋ฐ˜ํ™”๋ฅผ ํ™•์ธํ–ˆ๋‹ค.

Discussion

์ €์ž๋“ค์€ ReAct์˜ ํ•œ๊ณ„๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ๋…ผ์˜ํ•œ๋‹ค: (1) ํ”„๋กฌํ”„ํŒ… ๊ธฐ๋ฐ˜ ์ ‘๊ทผ์€ ๊ธด ํ–‰๋™ ์‹œํ€€์Šค์—์„œ ์ปจํ…์ŠคํŠธ ๊ธธ์ด ์ œํ•œ์— ๋ถ€๋”ชํžŒ๋‹ค, (2) in-context ์˜ˆ์‹œ์˜ ํ’ˆ์งˆ์— ์„ฑ๋Šฅ์ด ํฌ๊ฒŒ ์˜์กดํ•œ๋‹ค, (3) ์ง€๋„ํ•™์Šต SoTA(HotpotQA 67.5, FEVER 89.5)์™€๋Š” ์—ฌ์ „ํžˆ ํฐ ๊ฒฉ์ฐจ๊ฐ€ ์กด์žฌํ•œ๋‹ค. ํ–ฅํ›„ ๋ฐฉํ–ฅ์œผ๋กœ (a) ๊ฐ•ํ™”ํ•™์Šต๊ณผ์˜ ๊ฒฐํ•ฉ, (b) ๋Œ€๊ทœ๋ชจ ๋ฉ€ํ‹ฐํƒœ์Šคํฌ ํ•™์Šต, (c) ๋” ์ •๊ตํ•œ ๊ฒ€์ƒ‰ ๋„๊ตฌ ํ†ตํ•ฉ์„ ์ œ์•ˆํ•œ๋‹ค.

Insights

  • ์ฃผ๋ชฉํ•  ์  โ€” ์–ธ์–ด ๊ณต๊ฐ„์˜ ํ–‰๋™ํ™”: ReAct์˜ ๊ฐ€์žฅ ํ˜์‹ ์ ์ธ ์•„์ด๋””์–ด๋Š” โ€œ์‚ฌ๊ณ โ€๋ฅผ ๋‹จ์ˆœํ•œ ํ”„๋กฌํ”„ํŠธ ์žฅ์น˜๊ฐ€ ์•„๋‹ˆ๋ผ **๊ณต์‹์ ์ธ ํ–‰๋™ ๊ณต๊ฐ„์˜ ์ผ๋ถ€(a_t โˆˆ L)**๋กœ ์ •์˜ํ•œ ๊ฒƒ์ด๋‹ค. ์ด๋Š” ์ถ”๋ก ์„ ํ™˜๊ฒฝ ๋ฃจํ”„ ๋‚ด๋ถ€์— ํŽธ์ž…์‹œํ‚ค๋Š” ์ˆ˜ํ•™์ ์œผ๋กœ ๊น”๋”ํ•œ ์ •์‹ํ™”์ด๋ฉฐ, ์ดํ›„ LLM ์—์ด์ „ํŠธ ์—ฐ๊ตฌ ์ „๋ฐ˜์˜ ํ‘œ์ค€ ํ”„๋ ˆ์ž„์›Œํฌ๋กœ ์ž๋ฆฌ ์žก์•˜๋‹ค.
  • ์ฃผ๋ชฉํ•  ์  โ€” ์ธ์ง€๊ณผํ•™์  ๋™๊ธฐ: Vygotsky์˜ ๋‚ด์  ์–ธ์–ด์™€ Baddeley์˜ ์ž‘์—…๊ธฐ์–ต์—์„œ ์ถœ๋ฐœํ•˜์—ฌ, ์ถ”๋ก -ํ–‰๋™ ๊ฒฐํ•ฉ์ด ์ธ๊ฐ„ ๊ณ ์œ ์˜ ์ธ์ง€ ๋Šฅ๋ ฅ์ž„์„ AI ์‹œ์Šคํ…œ ์„ค๊ณ„์— ๋ช…์‹œ์ ์œผ๋กœ ์ฑ„์šฉํ•œ ์‚ฌ๋ก€๋‹ค.
  • ์—ฐ๊ฒฐ ๊ณ ๋ฆฌ: CoT(Wei et al. 2022)์™€ WebGPT(Nakano et al. 2021)์˜ ๊ต์ฐจ์ ์— ์œ„์น˜ํ•˜๋ฉฐ, ์ดํ›„ Toolformer, AutoGPT, LangChain์˜ Tool-use ํŒจํ„ด, OpenAI์˜ function calling API ์„ค๊ณ„์— ์ง์ ‘์  ์˜ํ–ฅ์„ ์ฃผ์—ˆ๋‹ค.
  • ์‹œ์‚ฌ์ : Human-in-the-loop ์‹คํ—˜(์‚ฌ๊ณ  2๊ฐœ ํŽธ์ง‘์œผ๋กœ ํƒœ์Šคํฌ ๋ฐฉํ–ฅ ๊ต์ •)์€ AI ์•ˆ์ „์„ฑ ๋ฐ ์ œ์–ด๊ฐ€๋Šฅ์„ฑ ์—ฐ๊ตฌ์— ์‹ค์ฒœ์  ํ•จ์˜๋ฅผ ์ œ๊ณตํ•œ๋‹ค.
  • ๋น„ํŒ์  ์ฝ”๋ฉ˜ํŠธ: HotpotQA์—์„œ ReAct ๋‹จ๋…(27.4)์ด CoT(29.4)๋ณด๋‹ค ๋‚ฎ๋‹ค๋Š” ๊ฒƒ์€, ํ–‰๋™ ๊ณต๊ฐ„์˜ ๋น„์œ ์—ฐ์„ฑ์ด ๋ณต์žกํ•œ ๋‹ค๋‹จ๊ณ„ ์ถ”๋ก  ๊ตฌ์กฐ ํ˜•์„ฑ์„ ๋ฐฉํ•ดํ•  ์ˆ˜ ์žˆ์Œ์„ ์‹œ์‚ฌํ•œ๋‹ค. Wikipedia API์˜ ๋‹จ์ˆœ์„ฑ(์ •ํ™• ๊ฒ€์ƒ‰๋งŒ ๊ฐ€๋Šฅ)๋„ ์‹ค์šฉ์  ํ•œ๊ณ„๋กœ ์ง€์ ๋œ๋‹ค.

Discussion Points

  • ๋…ผ์Ÿ์  โ€” ์ถ”๋ก  ํŠธ๋ ˆ์ด์Šค์˜ ์‹ ์‹ค์„ฑ(faithfulness): ReAct์˜ ์‚ฌ๊ณ ๋Š” ์‹ค์ œ๋กœ ํ–‰๋™์„ ์ธ๊ณผ์ ์œผ๋กœ ๊ฒฐ์ •ํ•˜๋Š”๊ฐ€, ์•„๋‹ˆ๋ฉด ์‚ฌํ›„ ํ•ฉ๋ฆฌํ™”(post-hoc rationalization)์ธ๊ฐ€? ๋ชจ๋ธ์ด ํ–‰๋™์„ ๋จผ์ € ๊ฒฐ์ •ํ•˜๊ณ  ์‚ฌ๊ณ ๋ฅผ ๋ผ์›Œ ๋„ฃ๋Š”๋‹ค๋ฉด, ํ•ด์„๊ฐ€๋Šฅ์„ฑ ์ฃผ์žฅ์€ ๊ทผ๋ณธ์ ์œผ๋กœ ํ”๋“ค๋ฆฐ๋‹ค. ์ด๋Š” ์ดํ›„ โ€œReasoning Theaterโ€ ๋ฅ˜์˜ ์—ฐ๊ตฌ๊ฐ€ ์ •๋ฉด์œผ๋กœ ์ œ๊ธฐํ•˜๋Š” ๋ฌธ์ œ์ด๋‹ค.
  • ๊ฒ€์ฆ ํ•„์š” ๊ฐ€์ • โ€” ํฌ์†Œ ์‚ฌ๊ณ ์˜ ๋ฐฐ์น˜: ALFWorld/WebShop์—์„œ ๋ชจ๋ธ์ด ์‚ฌ๊ณ  ์ƒ์„ฑ ํƒ€์ด๋ฐ์„ ์ž์œจ ๊ฒฐ์ •ํ•˜๋Š” ๋ฉ”์ปค๋‹ˆ์ฆ˜์ด ๋ช…์‹œ๋˜์ง€ ์•Š์•˜์œผ๋ฉฐ, ์‚ฌ๊ณ  ๋ฐฐ์น˜ ์ „๋žต์ด ์„ฑ๋Šฅ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์˜ ์ฒด๊ณ„์  ์ ˆ์ œ ์‹คํ—˜์ด ๋ถ€์žฌํ•˜๋‹ค.
  • ํ›„์† ์—ฐ๊ตฌ ๋ฐฉํ–ฅ: (1) ReAct + ๊ฐ•ํ™”ํ•™์Šต ํ†ตํ•ฉ, (2) ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ReAct (์‹œ๊ฐยท์Œ์„ฑ ๊ด€์ฐฐ๋กœ ํ™•์žฅ), (3) ๋ฉ”ํƒ€-์ธ์ง€์  ReAct (์‚ฌ๊ณ ์˜ ์‹ ๋ขฐ๋„๋ฅผ ์ž๊ธฐํ‰๊ฐ€ํ•˜์—ฌ ๊ฒ€์ƒ‰ ์—ฌ๋ถ€๋ฅผ ๋™์  ๊ฒฐ์ •)

๋ฉ”ํƒ€๋ฐ์ดํ„ฐ

ํ•ญ๋ชฉ๋‚ด์šฉ
์ œ๋ชฉReAct: Synergizing Reasoning and Acting in Language Models
์ €์žShunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao
์†Œ์†Princeton University, Google Research (Brain Team)
์—ฐ๋„2023 (v1: 2022.10, v3 camera-ready: 2023.03)
๋ฐœํ‘œICLR 2023
๋งํฌarXiv, GitHub
ํ‚ค์›Œ๋“œReasoning, Acting, LLM Agent, Prompting, Chain-of-Thought, Tool Use
@inproceedings{yao2023react,
  title={ReAct: Synergizing Reasoning and Acting in Language Models},
  author={Yao, Shunyu and Zhao, Jeffrey and Yu, Dian and Du, Nan and Shafran, Izhak and Narasimhan, Karthik and Cao, Yuan},
  booktitle={International Conference on Learning Representations},
  year={2023}
}

์™œ ์ด ์—ฐ๊ตฌ๋ฅผ ํ•˜๋Š”๊ฐ€?

ํ•ต์‹ฌ ์งˆ๋ฌธ

์ถ”๋ก (reasoning)๊ณผ ํ–‰๋™(acting)์„ ํ•˜๋‚˜์˜ LLM ๋‚ด์—์„œ ๊ต์ฐจ ์ƒ์„ฑํ•จ์œผ๋กœ์จ, ๊ฐ๊ฐ์˜ ๊ณ ์œ ํ•œ ํ•œ๊ณ„(ํ™˜๊ฐ, ๊ณ„ํš ๋ถ€์žฌ)๋ฅผ ๋™์‹œ์— ๊ทน๋ณตํ•  ์ˆ˜ ์žˆ๋Š”๊ฐ€?

๊ธฐ์กด ์ ‘๊ทผ๋ฒ•์˜ ํ•œ๊ณ„

ํ•œ๊ณ„์„ค๋ช…
CoT์˜ ํ™˜๊ฐ ๋ฌธ์ œ์™ธ๋ถ€ ์ •๋ณด ์—†์ด ๋‚ด๋ถ€ ์ง€์‹๋งŒ์œผ๋กœ ์ถ”๋ก ํ•˜๋ฏ€๋กœ, ์ž˜๋ชป๋œ ์‚ฌ์‹ค์„ ํ™•์‹  ์žˆ๊ฒŒ ์ƒ์„ฑ (์‹คํŒจ ์ค‘ 56%๊ฐ€ ํ™˜๊ฐ)
CoT์˜ ์˜ค๋ฅ˜ ์ „ํŒŒํ•œ ๋‹จ๊ณ„์˜ ์ถ”๋ก  ์˜ค๋ฅ˜๊ฐ€ ์ดํ›„ ์ „์ฒด ์ฒด์ธ์„ ์˜ค์—ผ์‹œํ‚ด
ํ–‰๋™ ์ „์šฉ์˜ ๊ณ„ํš ๋ถ€์žฌ์„œ๋ธŒ๊ณจ ๋ถ„ํ•ด, ์ง„ํ–‰ ์ถ”์ , ์˜ˆ์™ธ ์ฒ˜๋ฆฌ ๋“ฑ ๊ณ ์ˆ˜์ค€ ์ถ”๋ก  ์—†์ด ๋‹จ์ˆœ ํ–‰๋™๋งŒ ๋‚˜์—ด
๊ธฐ์กด ์—์ด์ „ํŠธ์˜ ๋ฐ์ดํ„ฐ ๋น„ํšจ์œจ๋ชจ๋ฐฉํ•™์Šต/๊ฐ•ํ™”ํ•™์Šต์€ 10ยณ~10โต๊ฐœ ๋ฐ์ดํ„ฐ ํ•„์š”, ์ƒˆ ํƒœ์Šคํฌ ์ „์ด์— ๋น„์šฉ์ด ํผ

ํ•ต์‹ฌ ํ†ต์ฐฐ

์ธ๊ฐ„์˜ ์ธ์ง€ ๊ณผ์ •์—์„œ ์ถ”๋ก ๊ณผ ํ–‰๋™์€ ๋ถ„๋ฆฌ๋˜์ง€ ์•Š๋Š”๋‹ค. ๋‚ด์  ์–ธ์–ด(inner speech)๊ฐ€ ํ–‰๋™์„ ์•ˆ๋‚ดํ•˜๊ณ , ํ–‰๋™์˜ ๊ฒฐ๊ณผ๊ฐ€ ๋‹ค์‹œ ์ถ”๋ก ์„ ๊ฐฑ์‹ ํ•˜๋Š” ํ”ผ๋“œ๋ฐฑ ๋ฃจํ”„๊ฐ€ ์กด์žฌํ•œ๋‹ค. ์ด๋ฅผ LLM์—์„œ ๊ตฌํ˜„ํ•˜๋ ค๋ฉด, ์‚ฌ๊ณ ๋ฅผ ํ™˜๊ฒฝ์— ์˜ํ–ฅ์„ ์ฃผ์ง€ ์•Š๋Š” ํŠน์ˆ˜ ํ–‰๋™์œผ๋กœ ์ •์‹ํ™”ํ•˜์—ฌ ๋™์ผํ•œ ์ •์ฑ… ํ•จ์ˆ˜ ๋‚ด์—์„œ ์ถ”๋ก ๊ณผ ํ–‰๋™์„ ํ†ตํ•ฉํ•ด์•ผ ํ•œ๋‹ค.


๋ฐฉ๋ฒ• (Method)

ํ”„๋ ˆ์ž„์›Œํฌ ๊ฐœ์š”

graph TD
    A["์‚ฌ์šฉ์ž ์ž…๋ ฅ<br>(์งˆ๋ฌธ / ํƒœ์Šคํฌ)"] --> B["์ปจํ…์ŠคํŠธ ๊ตฌ์„ฑ<br>c_t = (oโ‚, aโ‚, ..., oโ‚œ)"]
    B --> C{"์ •์ฑ… ฯ€(aโ‚œ|cโ‚œ)<br>์‚ฌ๊ณ  vs ํ–‰๋™?"}
    C -->|"์‚ฌ๊ณ  aโ‚œ โˆˆ L"| D["์ถ”๋ก  ํŠธ๋ ˆ์ด์Šค ์ƒ์„ฑ<br>โ€ข ๋ชฉํ‘œ ๋ถ„ํ•ด<br>โ€ข ์ง„ํ–‰ ์ถ”์ <br>โ€ข ์ƒ์‹ ์ฃผ์ž…<br>โ€ข ์˜ˆ์™ธ ์ฒ˜๋ฆฌ"]
    C -->|"ํ–‰๋™ aโ‚œ โˆˆ A"| E["์™ธ๋ถ€ ํ™˜๊ฒฝ ์ƒํ˜ธ์ž‘์šฉ"]
    D -->|"์ปจํ…์ŠคํŠธ๋งŒ ์—…๋ฐ์ดํŠธ<br>(ํ™˜๊ฒฝ ํ”ผ๋“œ๋ฐฑ ์—†์Œ)"| B
    E -->|"๊ด€์ฐฐ oโ‚œโ‚Šโ‚ ์ˆ˜์‹ "| B
    B --> F{"ํƒœ์Šคํฌ ์™„๋ฃŒ?"}
    F -->|"Yes"| G["์ตœ์ข… ๋‹ต๋ณ€ ์ถœ๋ ฅ"]
    F -->|"No"| C

    style D fill:#e8f4fd,stroke:#2196F3
    style E fill:#fff3e0,stroke:#FF9800

ํ•ต์‹ฌ ๊ตฌ์„ฑ์š”์†Œ

1. ํ™•์žฅ๋œ ํ–‰๋™ ๊ณต๊ฐ„ (ร‚ = A โˆช L)

ReAct์˜ ์ˆ˜ํ•™์  ํ•ต์‹ฌ์€ ํ–‰๋™ ๊ณต๊ฐ„์˜ ํ™•์žฅ์ด๋‹ค. ๊ธฐ์กด ๋„๋ฉ”์ธ ํ–‰๋™ ์ง‘ํ•ฉ A(์˜ˆ: ๊ฒ€์ƒ‰, ํด๋ฆญ ๋“ฑ)์— ์ž์—ฐ์–ด ๊ณต๊ฐ„ L์„ ํ•ฉ์ง‘ํ•ฉ์œผ๋กœ ์ถ”๊ฐ€ํ•œ๋‹ค. L์— ์†ํ•˜๋Š” ํ–‰๋™(์‚ฌ๊ณ )์€ ํ™˜๊ฒฝ์— ์–ด๋–ค ๋ถ€์ˆ˜ํšจ๊ณผ(side effect)๋„ ์ผ์œผํ‚ค์ง€ ์•Š์œผ๋ฉฐ, ์˜ค์ง ์—์ด์ „ํŠธ์˜ ๋‚ด๋ถ€ ์ปจํ…์ŠคํŠธ๋งŒ ๊ฐฑ์‹ ํ•œ๋‹ค.

2. Thought-Action-Observation ๋ฃจํ”„

๊ฐ ํƒ€์ž„์Šคํ… t์—์„œ ์—์ด์ „ํŠธ๋Š” ์ปจํ…์ŠคํŠธ c_t๋ฅผ ์กฐ๊ฑด์œผ๋กœ ๋‹ค์Œ ์ค‘ ํ•˜๋‚˜๋ฅผ ์ƒ์„ฑํ•œ๋‹ค:

  • Thought: โ€œ๋‚˜๋Š” X๋ฅผ ์ฐพ์•„์•ผ ํ•œ๋‹คโ€ โ†’ ์ปจํ…์ŠคํŠธ์— ์ถ”๊ฐ€๋˜์ง€๋งŒ ํ™˜๊ฒฝ ๋ณ€ํ™” ์—†์Œ
  • Action: โ€œsearch[X]โ€ โ†’ Wikipedia API ํ˜ธ์ถœ โ†’ ๊ด€์ฐฐ(Observation) ์ˆ˜์‹ 
  • Observation: ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๊ฐ€ ์ปจํ…์ŠคํŠธ์— ์ถ”๊ฐ€๋จ

3. ํƒœ์Šคํฌ๋ณ„ ํ”„๋กฌํ”„ํŒ… ์ „๋žต

์ง€์‹ ํƒœ์Šคํฌ์—์„œ๋Š” ์‚ฌ๊ณ ๋ฅผ ๋งค ํ–‰๋™ ์‚ฌ์ด์— ๋ฐ€์ง‘ ๋ฐฐ์น˜ํ•˜์—ฌ ๋‹ค๋‹จ๊ณ„ ์ •๋ณด ๊ฒ€์ƒ‰์„ ์•ˆ๋‚ดํ•œ๋‹ค. ์˜์‚ฌ๊ฒฐ์ • ํƒœ์Šคํฌ์—์„œ๋Š” ์„œ๋ธŒ๊ณจ ์ „ํ™˜ ์‹œ์ ์ด๋‚˜ ์˜ˆ์™ธ ๋ฐœ์ƒ ์‹œ์—๋งŒ ํฌ์†Œํ•˜๊ฒŒ ๋ฐฐ์น˜ํ•˜์—ฌ, ์žฅ๊ธฐ ์ง€ํ‰(long-horizon) ๊ณ„ํš์— ๋ถˆํ•„์š”ํ•œ ํ† ํฐ ๋‚ญ๋น„๋ฅผ ๋ฐฉ์ง€ํ•œ๋‹ค.


๋ฐœ๊ฒฌ (Findings)

์ฃผ์š” ๊ฒฐ๊ณผ

ReAct์™€ CoT๋Š” ์ƒ๋ณด์  ๊ฐ•์ ์„ ๊ฐ€์ง„๋‹ค. ReAct๋Š” ์™ธ๋ถ€ ์ •๋ณด ์ ‘๊ทผ์œผ๋กœ ํ™˜๊ฐ์„ ์ œ๊ฑฐํ•˜๊ณ , CoT๋Š” ์œ ์—ฐํ•œ ๋‹ค๋‹จ๊ณ„ ์ถ”๋ก ์— ๊ฐ•ํ•˜๋‹ค. ์ด ๋‘˜์˜ ์กฐํ•ฉ(ReActโ†’CoT-SC, CoT-SCโ†’ReAct)์ด ์ตœ๊ณ  ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜๋ฉฐ, ์ด๋Š” โ€œ๋‚ด๋ถ€ ์ถ”๋ก  vs ์™ธ๋ถ€ ํ–‰๋™โ€์˜ ์ด๋ถ„๋ฒ•์ด ํ—ˆ๊ตฌ์ž„์„ ์‹ค์ฆํ•œ๋‹ค.

ํ•ต์‹ฌ ๋ฐœ๊ฒฌ

ํ™˜๊ฐ ๊ทผ์ ˆ: ReAct์˜ ๊ฐ€์žฅ ๊ฐ•๋ ฅํ•œ ๊ฒฐ๊ณผ๋Š” ์„ฑ๋Šฅ ์ˆ˜์น˜๊ฐ€ ์•„๋‹ˆ๋ผ, ์‹คํŒจ ์‚ฌ๋ก€์—์„œ์˜ ํ™˜๊ฐ ๋น„์œจ 0%์ด๋‹ค(Table 2). CoT๋Š” ์‹คํŒจ์˜ 56%๊ฐ€ ํ™˜๊ฐ์ธ ๋ฐ˜๋ฉด, ReAct๋Š” ์™ธ๋ถ€ ๊ฒ€์ƒ‰์œผ๋กœ ์‚ฌ์‹ค์„ ํ™•์ธํ•˜๋ฏ€๋กœ ํ™˜๊ฐ์ด ์›์ฒœ์ ์œผ๋กœ ๋ฐœ์ƒํ•˜์ง€ ์•Š๋Š”๋‹ค. ์ด๋Š” โ€œ์™œ ํ‹€๋ ธ๋Š”๊ฐ€โ€์˜ ์งˆ์  ์ฐจ์ด๋ฅผ ๋ณด์—ฌ์ค€๋‹ค.

๋ฐ์ดํ„ฐ ํšจ์œจ์„ฑ: ALFWorld์—์„œ 12๊ฐœ ์˜ˆ์‹œ๋งŒ์œผ๋กœ 10ยณ10โต๊ฐœ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•œ BUTLER๋ฅผ 34%p ์ƒํšŒํ•œ๋‹ค. ์ด๋Š” LLM์˜ in-context learning ๋Šฅ๋ ฅ๊ณผ ์ถ”๋ก  ํŠธ๋ ˆ์ด์Šค์˜ ์‹œ๋„ˆ์ง€๊ฐ€ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์˜ ํ•„์š”์„ฑ์„ ๋Œ€์ฒดํ•  ์ˆ˜ ์žˆ์Œ์„ ์‹œ์‚ฌํ•œ๋‹ค.

Human-in-the-loop ์ œ์–ด: ์ธ๊ฐ„์ด ์‚ฌ๊ณ  2๊ฐœ๋งŒ ํŽธ์ง‘ํ•˜๋ฉด ํƒœ์Šคํฌ ๋ฐฉํ–ฅ์ด ๊ทผ๋ณธ์ ์œผ๋กœ ๊ต์ •๋œ๋‹ค. ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜์ • ์—†์ด ์ถ”๋ก  ํŠธ๋ ˆ์ด์Šค ์ˆ˜์ค€์—์„œ ํ–‰๋™์„ ์ œ์–ดํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์€, AI ์•ˆ์ „์„ฑ๊ณผ ์ •๋ ฌ(alignment)์— ์‹ค์งˆ์  ๊ฐ€์น˜๋ฅผ ๊ฐ–๋Š”๋‹ค.


์ด๋ก ์  ์˜์˜

LLM ์—์ด์ „ํŠธ ์—ฐ๊ตฌ์˜ ๊ธฐ์ดˆ ํŒจ๋Ÿฌ๋‹ค์ž„ ํ™•๋ฆฝ

ReAct๋Š” ์ดํ›„ ๋“ฑ์žฅํ•œ ๊ฑฐ์˜ ๋ชจ๋“  LLM ์—์ด์ „ํŠธ ํ”„๋ ˆ์ž„์›Œํฌ(Toolformer, AutoGPT, LangChain, OpenAI function calling ๋“ฑ)์˜ ๊ฐœ๋…์  ํ† ๋Œ€๋ฅผ ์ œ๊ณตํ–ˆ๋‹ค. โ€œ์‚ฌ๊ณ -ํ–‰๋™-๊ด€์ฐฐโ€ ๋ฃจํ”„๋Š” ์—์ด์ „ํŠธ ์•„ํ‚คํ…์ฒ˜์˜ ์‚ฌ์‹ค์ƒ ํ‘œ์ค€(de facto standard)์ด ๋˜์—ˆ์œผ๋ฉฐ, ์ด ๊ตฌ์กฐ ์œ„์— ๋„๊ตฌ ์‚ฌ์šฉ, ๋ฉ€ํ‹ฐ์—์ด์ „ํŠธ ํ˜‘์—…, ์žฅ๊ธฐ ๊ณ„ํš ๋“ฑ์˜ ํ™•์žฅ ์—ฐ๊ตฌ๊ฐ€ ์ „๊ฐœ๋˜๊ณ  ์žˆ๋‹ค.

์ถ”๋ก ๊ณผ ํ–‰๋™์˜ ์ƒ๋ณด์„ฑ ์‹ค์ฆ

ReAct ๋‹จ๋…์ด CoT๋ณด๋‹ค ํ•ญ์ƒ ์šฐ์›”ํ•˜์ง€ ์•Š๋‹ค๋Š” ๊ฒฐ๊ณผ(HotpotQA 27.4 vs 29.4)๋Š” ์˜คํžˆ๋ ค ์ค‘์š”ํ•œ ๋ฐœ๊ฒฌ์ด๋‹ค. ์ด๋Š” โ€œ๋‚ด๋ถ€ ์ถ”๋ก โ€๊ณผ โ€œ์™ธ๋ถ€ ํ–‰๋™โ€ ๊ฐ๊ฐ์— ๊ณ ์œ ํ•œ ๊ฐ•์ ์ด ์žˆ์œผ๋ฉฐ, ์ตœ์  ์ „๋žต์€ ์ƒํ™ฉ์— ๋”ฐ๋ผ ๋‘ ๋ชจ๋“œ๋ฅผ ๋™์ ์œผ๋กœ ์ „ํ™˜ํ•˜๋Š” ๊ฒƒ์ž„์„ ๋ณด์—ฌ์ค€๋‹ค. CoT-SCโ†’ReAct/ReActโ†’CoT-SC ์กฐํ•ฉ์˜ ์„ฑ๊ณต์€ ์ด ์ƒ๋ณด์„ฑ์˜ ์ง์ ‘์  ์ฆ๊ฑฐ๋‹ค.

ํ•ด์„๊ฐ€๋Šฅ์„ฑ์—์„œ ์ œ์–ด๊ฐ€๋Šฅ์„ฑ์œผ๋กœ

์ถ”๋ก  ํŠธ๋ ˆ์ด์Šค์˜ ๊ฐ€์‹œ์„ฑ์€ ๋‹จ์ˆœํ•œ ์‚ฌํ›„ ์„ค๋ช…์ด ์•„๋‹ˆ๋ผ, ์‚ฌ์šฉ์ž๊ฐ€ ์‹ค์‹œ๊ฐ„์œผ๋กœ ๊ฐœ์ž…ยท์ˆ˜์ •ํ•  ์ˆ˜ ์žˆ๋Š” ์ธํ„ฐํŽ˜์ด์Šค๋กœ ๊ธฐ๋Šฅํ•œ๋‹ค. ์ด๋Š” ํ•ด์„๊ฐ€๋Šฅ์„ฑ(interpretability) ์—ฐ๊ตฌ๊ฐ€ ์ œ์–ด๊ฐ€๋Šฅ์„ฑ(controllability)์œผ๋กœ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ํ™•์žฅ๋  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ฃผ๋Š” ์ดˆ๊ธฐ ์‚ฌ๋ก€๋‹ค.


๊ด€๋ จ ์—ฐ๊ตฌ

  • Chain-of-Thought Prompting โ€” ReAct๊ฐ€ ๊ทน๋ณตํ•˜๊ณ ์ž ํ•œ ๋‚ด๋ถ€ ์ถ”๋ก  ์ „์šฉ ์ ‘๊ทผ. CoT์™€ ReAct์˜ ์กฐํ•ฉ์ด ์ตœ๊ณ  ์„ฑ๋Šฅ.
  • WebGPT (Nakano et al. 2021) โ€” ์™ธ๋ถ€ ๊ฒ€์ƒ‰๊ณผ LLM์„ ๊ฒฐํ•ฉํ•œ ์ดˆ๊ธฐ ์—ฐ๊ตฌ. ReAct๋Š” ํ–‰๋™์— ์ถ”๋ก  ํŠธ๋ ˆ์ด์Šค๋ฅผ ์ถ”๊ฐ€.
  • Inner Monologue (Huang et al. 2022b) โ€” ReAct-IM ์ ˆ์ œ ์‹คํ—˜์—์„œ ์ง์ ‘ ๋น„๊ต. ํ™˜๊ฒฝ ํ”ผ๋“œ๋ฐฑ๋งŒ์œผ๋กœ๋Š” ๋ถˆ์ถฉ๋ถ„ํ•จ์„ ์‹ค์ฆ.
  • SayCan (Ahn et al. 2022) โ€” ๋กœ๋ด‡ ํ–‰๋™ ๊ณ„ํš์— LLM ํ™œ์šฉ. ReAct๋Š” ๋” ์ผ๋ฐ˜์ ์ธ ํ”„๋ ˆ์ž„์›Œํฌ.
  • Toolformer (Schick et al. 2023) โ€” ReAct์˜ ๋„๊ตฌ ์‚ฌ์šฉ ๊ฐœ๋…์„ ์ž๊ธฐ์ง€๋„ํ•™์Šต์œผ๋กœ ํ™•์žฅ.

ํ•ต์‹ฌ ์šฉ์–ด ์ •๋ฆฌ

์šฉ์–ด์ •์˜
ReActReasoning + Acting์˜ ํ•ฉ์„ฑ์–ด. ์ถ”๋ก  ํŠธ๋ ˆ์ด์Šค์™€ ํ–‰๋™์„ ๊ต์ฐจ ์ƒ์„ฑํ•˜๋Š” LLM ํ”„๋กฌํ”„ํŒ… ํŒจ๋Ÿฌ๋‹ค์ž„
Reasoning trace (์ถ”๋ก  ํŠธ๋ ˆ์ด์Šค)๋ชจ๋ธ์ด ์ƒ์„ฑํ•˜๋Š” ์ž์—ฐ์–ด ์‚ฌ๊ณ . ํ™˜๊ฒฝ์— ์˜ํ–ฅ์„ ์ฃผ์ง€ ์•Š๊ณ  ์ปจํ…์ŠคํŠธ๋งŒ ์—…๋ฐ์ดํŠธ
Action space (ํ–‰๋™ ๊ณต๊ฐ„)์—์ด์ „ํŠธ๊ฐ€ ์ทจํ•  ์ˆ˜ ์žˆ๋Š” ํ–‰๋™์˜ ์ง‘ํ•ฉ. ReAct์—์„œ๋Š” ร‚ = A โˆช L๋กœ ํ™•์žฅ
Trajectory (๊ถค์ )์‚ฌ๊ณ -ํ–‰๋™-๊ด€์ฐฐ์˜ ์—ฐ์‡„๋กœ ๊ตฌ์„ฑ๋œ ํƒœ์Šคํฌ ํ•ด๊ฒฐ ๊ฒฝ๋กœ
Chain-of-Thought (CoT)์ค‘๊ฐ„ ์ถ”๋ก  ๋‹จ๊ณ„๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ์ƒ์„ฑํ•˜์—ฌ ์ตœ์ข… ๋‹ต์— ๋„๋‹ฌํ•˜๋Š” ํ”„๋กฌํ”„ํŒ… ๊ธฐ๋ฒ•
Self-Consistency (SC)๋™์ผ ์งˆ๋ฌธ์— ๋Œ€ํ•ด ์—ฌ๋Ÿฌ ์ถ”๋ก  ๊ฒฝ๋กœ๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ๋‹ค์ˆ˜๊ฒฐ๋กœ ๋‹ต์„ ์„ ํƒํ•˜๋Š” ์•™์ƒ๋ธ” ๊ธฐ๋ฒ•
In-context learning๋ณ„๋„ ํ•™์Šต ์—†์ด, ํ”„๋กฌํ”„ํŠธ์— ํฌํ•จ๋œ ์†Œ์ˆ˜ ์˜ˆ์‹œ๋งŒ์œผ๋กœ ํƒœ์Šคํฌ๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” LLM ๋Šฅ๋ ฅ
Hallucination (ํ™˜๊ฐ)๋ชจ๋ธ์ด ์‚ฌ์‹ค๊ณผ ๋‹ค๋ฅธ ์ •๋ณด๋ฅผ ํ™•์‹  ์žˆ๊ฒŒ ์ƒ์„ฑํ•˜๋Š” ํ˜„์ƒ
Inner Monologue (IM)ํ™˜๊ฒฝ ํ”ผ๋“œ๋ฐฑ์„ ์ž์—ฐ์–ด๋กœ ์š”์•ฝํ•˜์—ฌ ๋ชจ๋ธ์— ์ œ๊ณตํ•˜๋Š” ๋ฐฉ์‹. ReAct์™€ ๋‹ฌ๋ฆฌ ์ž์ฒด ์ถ”๋ก ์€ ์—†์Œ
Dense vs Sparse thought๋ฐ€์ง‘ ์‚ฌ๊ณ : ๋งค ํ–‰๋™ ์‚ฌ์ด์— ๋ฐฐ์น˜. ํฌ์†Œ ์‚ฌ๊ณ : ํ•„์š” ์‹œ์—๋งŒ ๋น„๋™๊ธฐ์  ๋ฐฐ์น˜

ํƒœ๊ทธ

paper #2023 Reasoning Acting LLM_Agent Prompting CoT Tool_Use ICLR