Abstract [Write a concise abstract: (1) problem context and motivation, (2) limitation
of existing approaches, (3) your proposed method and key idea, (4) main experimental
results with specific numbers. Keep it within one paragraph, roughly 150–200 words.]
We address the problem of [PROBLEM] in LLM post-training. Existing methods such as
[RLHF/DPO/GRPO] suffer from [LIMITATION 1] and [LIMITATION 2]. In this work,
we propose YourMethod, a [training/inference-time] framework that [KEY IDEA]. Our
method enables the model to [CAPABILITY] by [MECHANISM]. Extensive experiments
on [N] benchmarks demonstrate that YourMethod achieves [RESULT 1] and [RESULT 2]
compared to [N] baselines. Specifically, our method (a) [Property 1]: [specific number],
(b) [Property 2]: [specific number], and (c) [Property 3]: [specific number].