KDD2026-Reviewing

Reviewing papers - KDD 2026

In academia, we submit our own papers and review papers from other researchers. This time, I have been asked to review six papers and I would like to do these with you. Please take the time to analyze the paper from various perspectives and write your opinions. The basic format is provided in each subtab (click the toggle on the left). You can write them together. Read the paper and fill out each section, then discuss it with me on March 6 (Fri) morning. I will finalize and submit the review by March 9 (Mon) after discussion.

Discussion time with Sundong: March 6 (Fri), Room 204 (Prof’s office)

  • 8:00am-8:30am, Heejun, Jiwon, Azamat (3509: SOAR)
  • 8:30am-9:00am, Mohammad, Geonwoo, Jeong Jun (1806: Dual-EV)
  • 9:00am-10:00am (during meeting), Jihwan, Hojun, Jaegyun (2317: n-step value est)
  • 10:00am-10:30am, Jaehyun, Minseo, Jinseo (1208: RationalSearch)
  • 10:30am-11:00am**, Yunho, Hyunseok, Seongsoo, Juhyeon** (2786: Context IRL, Twin)
  • 11:00am-11:30am, Jumyung, Kyungmin, Seokki (1815: COMET)

Paper and supplementary link (Download your paper here): KDD2026-Research

Reviewing Tips, Examples:

30 min for each team

  • Summarize the paper
  • Explain your review based on your thoughts
  • Clarify misunderstood parts
  • (To-do) After group discussion - est. 30 min
    • Brush your review: Add logics, transform it into a constructive questions so that authors can prepare their rebuttal
    • (Sundong: I will sum them up, add my thoughts, and submit it before the deadline)

3509: SOAR

3509: SOAR: Source-Aware Reinforcement Learning with Precise Credit Assignment for Search Agents (Heejun, Minseo, Jinseo)

Summary*
Please briefly summarize the main points and contributions of this paper.

Paper Strengths*

Please provide a list of the strengths of this paper, including but not limited to: innovative and practical methodology, insightful empirical findings or in-depth theoretical analysis, well-structured review of relevant literature, and any other factors that may make the paper valuable to readers.

Paper Weaknesses*
Please provide a list of the weaknesses of this paper, including but not limited to: inadequate implementation details for reproducing the study, limited evaluation and ablation studies for the proposed method, correctness of the theoretical analysis or experimental results, lack of comparisons or discussions with widely-known baselines in the field, lack of clarity in exposition, or any other factors that may impede the reader’s understanding or benefit from the paper. Please kindly refrain from providing a general assessment of the paper’s novelty without providing detailed explanations.

Questions And Suggestions For Rebuttal* (Math command with TeX is fine)

Please provide a numbered list of specific and clear questions that pertain to the details of the proposed method, evaluation setting, or additional results that would aid in supporting the authors’ claims. The questions should be formulated in a manner that, after the authors have answered them during the rebuttal, it would enable a more thorough assessment of the paper’s quality.

Relevance*
4: High - The work is relevant to the Research track of KDD and is of broad interest to the community
3: Moderate - The work is somewhat relevant to the Research track of KDD and is of narrow interest to a sub-community
2: Low - The connection to KDD is weak
1: Poor - The work is irrelevant to KDD

Novelty*

4: High - The paper offers groundbreaking and transformative ideas or approaches that substantially advance the field or open up entirely new areas of research. The level of innovation is high, leading to major advancements and potentially inspiring further research and development.
3: Moderate - The paper introduces a new and interesting idea or approach that adds value to the field. The contribution is original and represents an advancement of existing knowledge, demonstrating solid innovation and creativity.
2: Low - The ideas are relatively minor and largely incremental. The work builds heavily on existing research.
1: Poor - The paper presents ideas and results that are well-known and have been extensively covered in previous research. There are no new contributions or unique perspectives.

Technical Quality*

4: High - The paper exhibits a high level of technical quality with a rigorous and well-executed methodology and analysis. The results are highly reliable, well-supported, and thorough. The work demonstrates technical excellence and sets a high standard for quality in the field.
3: Moderate - The paper demonstrates solid technical quality with a sound methodology and thorough analysis. The results are reliable and well-supported. There may be minor issues, but they do not significantly undermine the overall quality. The work is competently executed and meets acceptable standards.
2: Low - The paper has several technical weaknesses, such as minor methodological flaws, insufficient analysis, or unsupported conclusions. While the work shows some level of competence, it lacks thoroughness and precision. Improvements are necessary for it to be considered robust.
1: Poor - The paper has significant technical errors, methodological flaws, or incorrect conclusions. The work lacks rigor, and the results are unreliable. The overall quality is below acceptable standards, and the technical execution is weak.

Presentation*

4: High - The paper is well-organized and very clear. The writing is precise, engaging, and free of errors. Figures and tables are well-designed and seamlessly integrated into the text, enhancing the reader’s comprehension. The presentation is polished and professional, making the paper a pleasure to read and understand.
3: Moderate - The paper is organized and generally clear. The writing is mostly free of grammatical and typographical errors, making it easy to read. Figures and tables are effectively used to support the text. The presentation facilitates understanding and conveys the key points effectively.
2: Low - The paper has noticeable issues with clarity and coherence. The writing may contain several grammatical and typographical errors. Figures and tables are present but may not be well-integrated or effectively used. The presentation allows for understanding but requires effort from the reader.
1: Poor - The paper is poorly organized and difficult to follow. The writing is unclear, with numerous grammatical and typographical errors. Figures and tables, if present, are poorly designed or hard to understand. Overall, the presentation detracts significantly from the readability and comprehension of the work.

Reproducibility*

4: High - The paper offers a comprehensive and precise description of the methods, data, and procedures. Supplementary materials, including datasets and code, are complete, well-documented, and easily accessible. Reproducing the results would be straightforward and require minimal additional effort, ensuring high reproducibility.
3: Moderate - The paper provides a clear and detailed description of the methods, data, and procedures used. Supplementary materials, such as datasets and code, are available and sufficiently documented. Reproducing the results would be feasible with the provided information, though some effort may still be required.
2: Low - The paper includes some information about the methods, data, and procedures, but key details are missing. There may be supplementary materials, but they are incomplete or unclear. Reproducing the results would require significant effort and additional information.
1: Poor - The paper provides insufficient details about the methods, data, and procedures used. There are no available supplementary materials, and the description is so vague that reproducing the results would be extremely difficult or impossible.

Reviewer Confidence*

4: High - The reviewer is an expert in the subject area and has extensive knowledge of the research methods and context of the paper. They are highly confident in their ability to provide an accurate and thorough assessment. Their evaluation is based on deep expertise and a comprehensive understanding of the work.
3: Moderate - The reviewer has a good understanding of the subject area and is familiar with the research methods and context of the paper. They feel confident in their ability to accurately assess the quality and significance of the work. Their evaluation is based on a solid grasp of the content and context.
2: Low - The reviewer has some knowledge of the subject area and is somewhat familiar with the research methods and context of the paper. They understand the main points but may lack depth in certain areas. The reviewer is reasonably confident in their assessment but acknowledges some limitations in their expertise.
1: Poor - The reviewer has limited knowledge of the subject area and is not very familiar with the specific research methods or context of the paper. The reviewer is unsure about their ability to accurately assess the paper and may have had difficulty understanding key aspects of the work. Their evaluation should be considered with caution.

Ethics Review Flag*

Please select Yes if there are ethical issues with this paper, and specify the issue in the text box below. For guidance on when this is appropriate, refer to ACM Code of Ethics (https://www.acm.org/code-of-ethics), ACM Policy on Plagiarism, Misrepresentation, and Falsification (https://www.acm.org/publications/policies/plagiarism-overview), ACM Publications Policy on Research Involving Human Participants and Subjects (https://www.acm.org/publications/policies/research-involving-human-participants-and-subjects). If in doubt, please enquire with the Program Chairs.
Yes/No

Ethics Review Description*

If you select Yes to the ethics review flag above please describe the issue.

Llm Usage Description*

Please describe in what ways you have used LLMs in this review. This is not disclosed to authors.

1806: Dual-EV

1806: Dual-EV: Decoupling Exploration and Verification for Generalizable Reinforcement Learning (Mohammad, Geonwoo, JeongJun)

Summary*
Please briefly summarize the main points and contributions of this paper.

Paper Strengths*

Please provide a list of the strengths of this paper, including but not limited to: innovative and practical methodology, insightful empirical findings or in-depth theoretical analysis, well-structured review of relevant literature, and any other factors that may make the paper valuable to readers.

Paper Weaknesses*
Please provide a list of the weaknesses of this paper, including but not limited to: inadequate implementation details for reproducing the study, limited evaluation and ablation studies for the proposed method, correctness of the theoretical analysis or experimental results, lack of comparisons or discussions with widely-known baselines in the field, lack of clarity in exposition, or any other factors that may impede the reader’s understanding or benefit from the paper. Please kindly refrain from providing a general assessment of the paper’s novelty without providing detailed explanations.

Questions And Suggestions For Rebuttal* (Math command with TeX is fine)

Please provide a numbered list of specific and clear questions that pertain to the details of the proposed method, evaluation setting, or additional results that would aid in supporting the authors’ claims. The questions should be formulated in a manner that, after the authors have answered them during the rebuttal, it would enable a more thorough assessment of the paper’s quality.

Relevance*
4: High - The work is relevant to the Research track of KDD and is of broad interest to the community
3: Moderate - The work is somewhat relevant to the Research track of KDD and is of narrow interest to a sub-community
2: Low - The connection to KDD is weak
1: Poor - The work is irrelevant to KDD

Novelty*

4: High - The paper offers groundbreaking and transformative ideas or approaches that substantially advance the field or open up entirely new areas of research. The level of innovation is high, leading to major advancements and potentially inspiring further research and development.
3: Moderate - The paper introduces a new and interesting idea or approach that adds value to the field. The contribution is original and represents an advancement of existing knowledge, demonstrating solid innovation and creativity.
2: Low - The ideas are relatively minor and largely incremental. The work builds heavily on existing research.
1: Poor - The paper presents ideas and results that are well-known and have been extensively covered in previous research. There are no new contributions or unique perspectives.

Technical Quality*

4: High - The paper exhibits a high level of technical quality with a rigorous and well-executed methodology and analysis. The results are highly reliable, well-supported, and thorough. The work demonstrates technical excellence and sets a high standard for quality in the field.
3: Moderate - The paper demonstrates solid technical quality with a sound methodology and thorough analysis. The results are reliable and well-supported. There may be minor issues, but they do not significantly undermine the overall quality. The work is competently executed and meets acceptable standards.
2: Low - The paper has several technical weaknesses, such as minor methodological flaws, insufficient analysis, or unsupported conclusions. While the work shows some level of competence, it lacks thoroughness and precision. Improvements are necessary for it to be considered robust.
1: Poor - The paper has significant technical errors, methodological flaws, or incorrect conclusions. The work lacks rigor, and the results are unreliable. The overall quality is below acceptable standards, and the technical execution is weak.

Presentation*

4: High - The paper is well-organized and very clear. The writing is precise, engaging, and free of errors. Figures and tables are well-designed and seamlessly integrated into the text, enhancing the reader’s comprehension. The presentation is polished and professional, making the paper a pleasure to read and understand.
3: Moderate - The paper is organized and generally clear. The writing is mostly free of grammatical and typographical errors, making it easy to read. Figures and tables are effectively used to support the text. The presentation facilitates understanding and conveys the key points effectively.
2: Low - The paper has noticeable issues with clarity and coherence. The writing may contain several grammatical and typographical errors. Figures and tables are present but may not be well-integrated or effectively used. The presentation allows for understanding but requires effort from the reader.
1: Poor - The paper is poorly organized and difficult to follow. The writing is unclear, with numerous grammatical and typographical errors. Figures and tables, if present, are poorly designed or hard to understand. Overall, the presentation detracts significantly from the readability and comprehension of the work.

Reproducibility*

4: High - The paper offers a comprehensive and precise description of the methods, data, and procedures. Supplementary materials, including datasets and code, are complete, well-documented, and easily accessible. Reproducing the results would be straightforward and require minimal additional effort, ensuring high reproducibility.
3: Moderate - The paper provides a clear and detailed description of the methods, data, and procedures used. Supplementary materials, such as datasets and code, are available and sufficiently documented. Reproducing the results would be feasible with the provided information, though some effort may still be required.
2: Low - The paper includes some information about the methods, data, and procedures, but key details are missing. There may be supplementary materials, but they are incomplete or unclear. Reproducing the results would require significant effort and additional information.
1: Poor - The paper provides insufficient details about the methods, data, and procedures used. There are no available supplementary materials, and the description is so vague that reproducing the results would be extremely difficult or impossible.

Reviewer Confidence*

4: High - The reviewer is an expert in the subject area and has extensive knowledge of the research methods and context of the paper. They are highly confident in their ability to provide an accurate and thorough assessment. Their evaluation is based on deep expertise and a comprehensive understanding of the work.
3: Moderate - The reviewer has a good understanding of the subject area and is familiar with the research methods and context of the paper. They feel confident in their ability to accurately assess the quality and significance of the work. Their evaluation is based on a solid grasp of the content and context.
2: Low - The reviewer has some knowledge of the subject area and is somewhat familiar with the research methods and context of the paper. They understand the main points but may lack depth in certain areas. The reviewer is reasonably confident in their assessment but acknowledges some limitations in their expertise.
1: Poor - The reviewer has limited knowledge of the subject area and is not very familiar with the specific research methods or context of the paper. The reviewer is unsure about their ability to accurately assess the paper and may have had difficulty understanding key aspects of the work. Their evaluation should be considered with caution.

Ethics Review Flag*

Please select Yes if there are ethical issues with this paper, and specify the issue in the text box below. For guidance on when this is appropriate, refer to ACM Code of Ethics (https://www.acm.org/code-of-ethics), ACM Policy on Plagiarism, Misrepresentation, and Falsification (https://www.acm.org/publications/policies/plagiarism-overview), ACM Publications Policy on Research Involving Human Participants and Subjects (https://www.acm.org/publications/policies/research-involving-human-participants-and-subjects). If in doubt, please enquire with the Program Chairs.
Yes/No

Ethics Review Description*

If you select Yes to the ethics review flag above please describe the issue.

Llm Usage Description*

Please describe in what ways you have used LLMs in this review. This is not disclosed to authors.

2317: n-step value estimation

2317: Offline Reinforcement Learning with n-Step Categorical Value Estimation (Jihwan, Hojun, Jaegyun)

Summary*
Please briefly summarize the main points and contributions of this paper.

Paper Strengths*

Please provide a list of the strengths of this paper, including but not limited to: innovative and practical methodology, insightful empirical findings or in-depth theoretical analysis, well-structured review of relevant literature, and any other factors that may make the paper valuable to readers.

Paper Weaknesses*
Please provide a list of the weaknesses of this paper, including but not limited to: inadequate implementation details for reproducing the study, limited evaluation and ablation studies for the proposed method, correctness of the theoretical analysis or experimental results, lack of comparisons or discussions with widely-known baselines in the field, lack of clarity in exposition, or any other factors that may impede the reader’s understanding or benefit from the paper. Please kindly refrain from providing a general assessment of the paper’s novelty without providing detailed explanations.

Questions And Suggestions For Rebuttal* (Math command with TeX is fine)

Please provide a numbered list of specific and clear questions that pertain to the details of the proposed method, evaluation setting, or additional results that would aid in supporting the authors’ claims. The questions should be formulated in a manner that, after the authors have answered them during the rebuttal, it would enable a more thorough assessment of the paper’s quality.

Relevance*
4: High - The work is relevant to the Research track of KDD and is of broad interest to the community
3: Moderate - The work is somewhat relevant to the Research track of KDD and is of narrow interest to a sub-community
2: Low - The connection to KDD is weak
1: Poor - The work is irrelevant to KDD

Novelty*

4: High - The paper offers groundbreaking and transformative ideas or approaches that substantially advance the field or open up entirely new areas of research. The level of innovation is high, leading to major advancements and potentially inspiring further research and development.
3: Moderate - The paper introduces a new and interesting idea or approach that adds value to the field. The contribution is original and represents an advancement of existing knowledge, demonstrating solid innovation and creativity.
2: Low - The ideas are relatively minor and largely incremental. The work builds heavily on existing research.
1: Poor - The paper presents ideas and results that are well-known and have been extensively covered in previous research. There are no new contributions or unique perspectives.

Technical Quality*

4: High - The paper exhibits a high level of technical quality with a rigorous and well-executed methodology and analysis. The results are highly reliable, well-supported, and thorough. The work demonstrates technical excellence and sets a high standard for quality in the field.
3: Moderate - The paper demonstrates solid technical quality with a sound methodology and thorough analysis. The results are reliable and well-supported. There may be minor issues, but they do not significantly undermine the overall quality. The work is competently executed and meets acceptable standards.
2: Low - The paper has several technical weaknesses, such as minor methodological flaws, insufficient analysis, or unsupported conclusions. While the work shows some level of competence, it lacks thoroughness and precision. Improvements are necessary for it to be considered robust.
1: Poor - The paper has significant technical errors, methodological flaws, or incorrect conclusions. The work lacks rigor, and the results are unreliable. The overall quality is below acceptable standards, and the technical execution is weak.

Presentation*

4: High - The paper is well-organized and very clear. The writing is precise, engaging, and free of errors. Figures and tables are well-designed and seamlessly integrated into the text, enhancing the reader’s comprehension. The presentation is polished and professional, making the paper a pleasure to read and understand.
3: Moderate - The paper is organized and generally clear. The writing is mostly free of grammatical and typographical errors, making it easy to read. Figures and tables are effectively used to support the text. The presentation facilitates understanding and conveys the key points effectively.
2: Low - The paper has noticeable issues with clarity and coherence. The writing may contain several grammatical and typographical errors. Figures and tables are present but may not be well-integrated or effectively used. The presentation allows for understanding but requires effort from the reader.
1: Poor - The paper is poorly organized and difficult to follow. The writing is unclear, with numerous grammatical and typographical errors. Figures and tables, if present, are poorly designed or hard to understand. Overall, the presentation detracts significantly from the readability and comprehension of the work.

Reproducibility*

4: High - The paper offers a comprehensive and precise description of the methods, data, and procedures. Supplementary materials, including datasets and code, are complete, well-documented, and easily accessible. Reproducing the results would be straightforward and require minimal additional effort, ensuring high reproducibility.
3: Moderate - The paper provides a clear and detailed description of the methods, data, and procedures used. Supplementary materials, such as datasets and code, are available and sufficiently documented. Reproducing the results would be feasible with the provided information, though some effort may still be required.
2: Low - The paper includes some information about the methods, data, and procedures, but key details are missing. There may be supplementary materials, but they are incomplete or unclear. Reproducing the results would require significant effort and additional information.
1: Poor - The paper provides insufficient details about the methods, data, and procedures used. There are no available supplementary materials, and the description is so vague that reproducing the results would be extremely difficult or impossible.

Reviewer Confidence*

4: High - The reviewer is an expert in the subject area and has extensive knowledge of the research methods and context of the paper. They are highly confident in their ability to provide an accurate and thorough assessment. Their evaluation is based on deep expertise and a comprehensive understanding of the work.
3: Moderate - The reviewer has a good understanding of the subject area and is familiar with the research methods and context of the paper. They feel confident in their ability to accurately assess the quality and significance of the work. Their evaluation is based on a solid grasp of the content and context.
2: Low - The reviewer has some knowledge of the subject area and is somewhat familiar with the research methods and context of the paper. They understand the main points but may lack depth in certain areas. The reviewer is reasonably confident in their assessment but acknowledges some limitations in their expertise.
1: Poor - The reviewer has limited knowledge of the subject area and is not very familiar with the specific research methods or context of the paper. The reviewer is unsure about their ability to accurately assess the paper and may have had difficulty understanding key aspects of the work. Their evaluation should be considered with caution.

Ethics Review Flag*

Please select Yes if there are ethical issues with this paper, and specify the issue in the text box below. For guidance on when this is appropriate, refer to ACM Code of Ethics (https://www.acm.org/code-of-ethics), ACM Policy on Plagiarism, Misrepresentation, and Falsification (https://www.acm.org/publications/policies/plagiarism-overview), ACM Publications Policy on Research Involving Human Participants and Subjects (https://www.acm.org/publications/policies/research-involving-human-participants-and-subjects). If in doubt, please enquire with the Program Chairs.
Yes/No

Ethics Review Description*

If you select Yes to the ethics review flag above please describe the issue.

Llm Usage Description*

Please describe in what ways you have used LLMs in this review. This is not disclosed to authors.

1208: RationalSearch

1208: RationalSearch: Need-Aware Search Triggering with Reinforcement Learning (Jaehyun, Minseo, Seokki)

Summary*
Please briefly summarize the main points and contributions of this paper.

Paper Strengths*

Please provide a list of the strengths of this paper, including but not limited to: innovative and practical methodology, insightful empirical findings or in-depth theoretical analysis, well-structured review of relevant literature, and any other factors that may make the paper valuable to readers.

Paper Weaknesses*
Please provide a list of the weaknesses of this paper, including but not limited to: inadequate implementation details for reproducing the study, limited evaluation and ablation studies for the proposed method, correctness of the theoretical analysis or experimental results, lack of comparisons or discussions with widely-known baselines in the field, lack of clarity in exposition, or any other factors that may impede the reader’s understanding or benefit from the paper. Please kindly refrain from providing a general assessment of the paper’s novelty without providing detailed explanations.

Questions And Suggestions For Rebuttal* (Math command with TeX is fine)

Please provide a numbered list of specific and clear questions that pertain to the details of the proposed method, evaluation setting, or additional results that would aid in supporting the authors’ claims. The questions should be formulated in a manner that, after the authors have answered them during the rebuttal, it would enable a more thorough assessment of the paper’s quality.

Relevance*
4: High - The work is relevant to the Research track of KDD and is of broad interest to the community
3: Moderate - The work is somewhat relevant to the Research track of KDD and is of narrow interest to a sub-community
2: Low - The connection to KDD is weak
1: Poor - The work is irrelevant to KDD

Novelty*

4: High - The paper offers groundbreaking and transformative ideas or approaches that substantially advance the field or open up entirely new areas of research. The level of innovation is high, leading to major advancements and potentially inspiring further research and development.
3: Moderate - The paper introduces a new and interesting idea or approach that adds value to the field. The contribution is original and represents an advancement of existing knowledge, demonstrating solid innovation and creativity.
2: Low - The ideas are relatively minor and largely incremental. The work builds heavily on existing research.
1: Poor - The paper presents ideas and results that are well-known and have been extensively covered in previous research. There are no new contributions or unique perspectives.

Technical Quality*

4: High - The paper exhibits a high level of technical quality with a rigorous and well-executed methodology and analysis. The results are highly reliable, well-supported, and thorough. The work demonstrates technical excellence and sets a high standard for quality in the field.
3: Moderate - The paper demonstrates solid technical quality with a sound methodology and thorough analysis. The results are reliable and well-supported. There may be minor issues, but they do not significantly undermine the overall quality. The work is competently executed and meets acceptable standards.
2: Low - The paper has several technical weaknesses, such as minor methodological flaws, insufficient analysis, or unsupported conclusions. While the work shows some level of competence, it lacks thoroughness and precision. Improvements are necessary for it to be considered robust.
1: Poor - The paper has significant technical errors, methodological flaws, or incorrect conclusions. The work lacks rigor, and the results are unreliable. The overall quality is below acceptable standards, and the technical execution is weak.

Presentation*

4: High - The paper is well-organized and very clear. The writing is precise, engaging, and free of errors. Figures and tables are well-designed and seamlessly integrated into the text, enhancing the reader’s comprehension. The presentation is polished and professional, making the paper a pleasure to read and understand.
3: Moderate - The paper is organized and generally clear. The writing is mostly free of grammatical and typographical errors, making it easy to read. Figures and tables are effectively used to support the text. The presentation facilitates understanding and conveys the key points effectively.
2: Low - The paper has noticeable issues with clarity and coherence. The writing may contain several grammatical and typographical errors. Figures and tables are present but may not be well-integrated or effectively used. The presentation allows for understanding but requires effort from the reader.
1: Poor - The paper is poorly organized and difficult to follow. The writing is unclear, with numerous grammatical and typographical errors. Figures and tables, if present, are poorly designed or hard to understand. Overall, the presentation detracts significantly from the readability and comprehension of the work.

Reproducibility*

4: High - The paper offers a comprehensive and precise description of the methods, data, and procedures. Supplementary materials, including datasets and code, are complete, well-documented, and easily accessible. Reproducing the results would be straightforward and require minimal additional effort, ensuring high reproducibility.
3: Moderate - The paper provides a clear and detailed description of the methods, data, and procedures used. Supplementary materials, such as datasets and code, are available and sufficiently documented. Reproducing the results would be feasible with the provided information, though some effort may still be required.
2: Low - The paper includes some information about the methods, data, and procedures, but key details are missing. There may be supplementary materials, but they are incomplete or unclear. Reproducing the results would require significant effort and additional information.
1: Poor - The paper provides insufficient details about the methods, data, and procedures used. There are no available supplementary materials, and the description is so vague that reproducing the results would be extremely difficult or impossible.

Reviewer Confidence*

4: High - The reviewer is an expert in the subject area and has extensive knowledge of the research methods and context of the paper. They are highly confident in their ability to provide an accurate and thorough assessment. Their evaluation is based on deep expertise and a comprehensive understanding of the work.
3: Moderate - The reviewer has a good understanding of the subject area and is familiar with the research methods and context of the paper. They feel confident in their ability to accurately assess the quality and significance of the work. Their evaluation is based on a solid grasp of the content and context.
2: Low - The reviewer has some knowledge of the subject area and is somewhat familiar with the research methods and context of the paper. They understand the main points but may lack depth in certain areas. The reviewer is reasonably confident in their assessment but acknowledges some limitations in their expertise.
1: Poor - The reviewer has limited knowledge of the subject area and is not very familiar with the specific research methods or context of the paper. The reviewer is unsure about their ability to accurately assess the paper and may have had difficulty understanding key aspects of the work. Their evaluation should be considered with caution.

Ethics Review Flag*

Please select Yes if there are ethical issues with this paper, and specify the issue in the text box below. For guidance on when this is appropriate, refer to ACM Code of Ethics (https://www.acm.org/code-of-ethics), ACM Policy on Plagiarism, Misrepresentation, and Falsification (https://www.acm.org/publications/policies/plagiarism-overview), ACM Publications Policy on Research Involving Human Participants and Subjects (https://www.acm.org/publications/policies/research-involving-human-participants-and-subjects). If in doubt, please enquire with the Program Chairs.
Yes/No

Ethics Review Description*

If you select Yes to the ethics review flag above please describe the issue.

Llm Usage Description*

Please describe in what ways you have used LLMs in this review. This is not disclosed to authors.

2786: Context engineering

2786: A Context Engineering Framework for Improving Enterprise AI Agents based on Digital-Twin MDP (Yunho, Hyunseok, Seongsoo)

Summary*
Please briefly summarize the main points and contributions of this paper.

본 논문은 enterprise LLM agent의 inference-time 성능 향상을 위한 DT-MDP-CE(Digital-Twin MDP Context Engineering) framework을 제안한다. 핵심 아이디어는 fine-tuning 없이 context engineering만으로 agent 행동을 개선하는 것이며, 세 가지 컴포넌트로 구성된다: (1) Digital-Twin MDP — deterministic abstraction을 통해 POMDP를 finite MDP로 근사, (2) Contrastive IRL — T-REX(Trajectory-ranked Reward EXtrapolation) 기반으로 mixed-quality offline trajectory에서 reward model 학습, (3) RL-guided Context Engineering — CQL(Contrastive Q-Learning) policy를 통해 Suggest/Prune/Prioritize 세 전략으로 LLM 프롬프트에 개입. ITBench SRE 진단 도메인에서 12개 시나리오(819 trajectories)로 실험하여 baseline(BC, RL-Sparse) 대비 개선을 보였으며, Strategy III(Prioritizing)가 단독으로 가장 효과적이고 medium-sized 모델에서 최대 개선폭을 확인했다.
This paper proposes a Digital-Twin MDP Context Engineering (DT-MDP-CE) framework for improving the interference-time performance of enterprise LLM agents. The key idea is to improve agent behavior only with context engineering without fine-tuning, and consists of three components: (1) Digital-Twin MDP — Approximate POMDP to finite MDP through deterministic abstraction, (2) Contrastive IRL — Learning a reward model in mixed-quality off-line trajectory based on T-REX (Trajectory-ranked Reward Extrapolation), (3) RL-guided Context Engineering — CQL (Contrastive Q-Learning) policy to intervene in LLM prompts with three strategies: Sugest/Prune/Prioritize. In the ITBench SRE diagnostic domain, we experimented with 12 scenarios (819 trajectories) and showed improvement over baseline (BC, RL-Sparse), and Strategy III (Prioritizing) was the most effective alone and confirmed the maximum improvement in medium-sized models.
Paper Strengths*

Please provide a list of the strengths of this paper, including but not limited to: innovative and practical methodology, insightful empirical findings or in-depth theoretical analysis, well-structured review of relevant literature, and any other factors that may make the paper valuable to readers.

1. Defined a timely problem: The SRE environment has been primarily addressed in the paper, but it meets the realistic needs of various domains called time improvement (by RL policy) of enterprise AI agents, and a practical setting that only intervenes with context engineering without fine-tuning is considered efficient. Systemic pipeline: IRL → Offline RL (CQL) → OPE (FQE) → The pipeline leading to context engineering is technically well established.
1. 시의적절한 문제 정의: SRE 환경은 주로 논문에서 다루어져 왔지만, enterprise AI 에이전트의 intervention(RL policy에 의해) time improvement라는 다양한 domain의 현실적인 요구를 충족하며, Fine-Tuning 없이 Context Engineering만으로 개입하는 실용적인 세팅이 효율적인 것으로 평가된다 2. 체계적 파이프라인: IRL → 오프라인 RL(CQL) → OPE(FQE) → 컨텍스트 엔지니어링으로 이어지는 파이프라인은 기술적으로 잘 확립되어 있다.

Paper Weaknesses*
Please provide a list of the weaknesses of this paper, including but not limited to: inadequate implementation details for reproducing the study, limited evaluation and ablation studies for the proposed method, correctness of the theoretical analysis or experimental results, lack of comparisons or discussions with widely-known baselines in the field, lack of clarity in exposition, or any other factors that may impede the reader’s understanding or benefit from the paper. Please kindly refrain from providing a general assessment of the paper’s novelty without providing detailed explanations.

1. POMDP→MDP 근사의 이론적 보장 부재: DT-MDP의 deterministic abstraction이 원래 POMDP 대비 얼마나의 optimality loss를 초래하는지 bound 및 이론적 근거가 제시되지 않음. 2. Hand-crafted representation의 일반화 한계: 세 가지 state representation(Name/Name-type/Topology)이 모두 수작업 설계. 새로운 도메인 적용 시 expert 개입이 필수적이며, “Enterprise AI” 일반화 주장과 모순됨. 3. Compute cost 미보고: IRL + CQL + FQE 파이프라인의 총 연산 비용이 보고되지 않아 fine-tuning 대비 실용성 판단 불가. 코드/데이터 공개 계획도 미언급.
1. No theoretical guarantees of POMDP→MDP approximation: No bound and theoretical basis is given for how much optimality loss DT-MDP causes compared to the original POMDP. 2. Limited generalization of hand-crafted representation: All three state representations (Name/Name-type/Topology) are hand-designed. Expert intervention is essential when applying a new domain, contradicting the “Enterprise AI” generalization claim. 3. Compute cost not reported: IRL + CQL + FQE pipeline’s total computational cost is not reported, making it impossible to judge its practicality versus fine-tuning. Code/data disclosure plan also not mentioned.
Questions And Suggestions For Rebuttal* (Math command with TeX is fine)

Please provide a numbered list of specific and clear questions that pertain to the details of the proposed method, evaluation setting, or additional results that would aid in supporting the authors’ claims. The questions should be formulated in a manner that, after the authors have answered them during the rebuttal, it would enable a more thorough assessment of the paper’s quality.

1.ITBench 102개 시나리오 중 12개만 선택한 기준은 무엇인가? 나머지 시나리오에서도 유사한 성능 향상이 관찰되는가?
2. Suggest/Prune/Prioritize 세 전략의 조합이 추가적 이득을 주지 못하는 근본적 원인은 무엇인가? 전략 간 interference가 발생하는가?
3. 전체 파이프라인(IRL + CQL + FQE + CE)의 computational overhead는 fine-tuning 대비 어떠하며, cost-performance trade-off를 정량적으로 제시할 수 있는가?
How is the computational overhead of the entire pipeline (IRL + CQL + FQE + CE) compared to fine-tuning, and can cost-performance trade-off be quantitatively presented?

Relevance*
4: High - The work is relevant to the Research track of KDD and is of broad interest to the community
3: Moderate - The work is somewhat relevant to the Research track of KDD and is of narrow interest to a sub-community
2: Low - The connection to KDD is weak
1: Poor - The work is irrelevant to KDD

→ 3: Moderate — Modern AI & Big Data track 안의 context engineering 혹은 Data Science Applications 쪽이랑 잘 align된다고 평가한다. 또한, Enterprise AI agent 최적화 주제는 broad한 enterprise에서 흥미를 느낄 법하다고 생각된다.

Novelty*

4: High - The paper offers groundbreaking and transformative ideas or approaches that substantially advance the field or open up entirely new areas of research. The level of innovation is high, leading to major advancements and potentially inspiring further research and development.
3: Moderate - The paper introduces a new and interesting idea or approach that adds value to the field. The contribution is original and represents an advancement of existing knowledge, demonstrating solid innovation and creativity.
2: Low - The ideas are relatively minor and largely incremental. The work builds heavily on existing research.
1: Poor - The paper presents ideas and results that are well-known and have been extensively covered in previous research. There are no new contributions or unique perspectives.

→ 2: Low — RL policy로 inference-time context 개입을 체계화하려는 conceptual direction은 흥미로우나, T-REX(2019) + CQL(2020) + FQE 등 기존 알고리즘의 조합이 주된 contribution임.

Technical Quality*

4: High - The paper exhibits a high level of technical quality with a rigorous and well-executed methodology and analysis. The results are highly reliable, well-supported, and thorough. The work demonstrates technical excellence and sets a high standard for quality in the field.
3: Moderate - The paper demonstrates solid technical quality with a sound methodology and thorough analysis. The results are reliable and well-supported. There may be minor issues, but they do not significantly undermine the overall quality. The work is competently executed and meets acceptable standards.
2: Low - The paper has several technical weaknesses, such as minor methodological flaws, insufficient analysis, or unsupported conclusions. While the work shows some level of competence, it lacks thoroughness and precision. Improvements are necessary for it to be considered robust.
1: Poor - The paper has significant technical errors, methodological flaws, or incorrect conclusions. The work lacks rigor, and the results are unreliable. The overall quality is below acceptable standards, and the technical execution is weak.

→ 2: Low — Introduction 및 limitation에서 언급한 것처럼, sft가 힘든 상황을 고려해서 논문에서 프레임워크를 제안하는 것은 맞지만, enterprise라는 도메인 특성 상 performance가 중요할 텐데, baseline으로 sft 모델들이 없어서 얼마나 실용적일지는 판단하기 어렵다.

Presentation*

4: High - The paper is well-organized and very clear. The writing is precise, engaging, and free of errors. Figures and tables are well-designed and seamlessly integrated into the text, enhancing the reader’s comprehension. The presentation is polished and professional, making the paper a pleasure to read and understand.
3: Moderate - The paper is organized and generally clear. The writing is mostly free of grammatical and typographical errors, making it easy to read. Figures and tables are effectively used to support the text. The presentation facilitates understanding and conveys the key points effectively.
2: Low - The paper has noticeable issues with clarity and coherence. The writing may contain several grammatical and typographical errors. Figures and tables are present but may not be well-integrated or effectively used. The presentation allows for understanding but requires effort from the reader.
1: Poor - The paper is poorly organized and difficult to follow. The writing is unclear, with numerous grammatical and typographical errors. Figures and tables, if present, are poorly designed or hard to understand. Overall, the presentation detracts significantly from the readability and comprehension of the work.

→ 3: Moderate — 전반적으로 method의 구조가 명확하나, “Digital-Twin”이라는 용어가 제조업 분야의 물리적 시뮬레이션 개념과 혼동되기 쉬우며, 실제로는 “Learned/Abstracted MDP Model”에 가까움.

Reproducibility*

4: High - The paper offers a comprehensive and precise description of the methods, data, and procedures. Supplementary materials, including datasets and code, are complete, well-documented, and easily accessible. Reproducing the results would be straightforward and require minimal additional effort, ensuring high reproducibility.
3: Moderate - The paper provides a clear and detailed description of the methods, data, and procedures used. Supplementary materials, such as datasets and code, are available and sufficiently documented. Reproducing the results would be feasible with the provided information, though some effort may still be required.
2: Low - The paper includes some information about the methods, data, and procedures, but key details are missing. There may be supplementary materials, but they are incomplete or unclear. Reproducing the results would require significant effort and additional information.
1: Poor - The paper provides insufficient details about the methods, data, and procedures used. There are no available supplementary materials, and the description is so vague that reproducing the results would be extremely difficult or impossible.

→ 2: Low — 코드 및 데이터 공개 계획이 미언급되어 있고, compute cost가 보고되지 않음. DT-MDP 구축을 위한 domain-specific abstraction function의 구현 세부사항이 부족하여 재현에 상당한 추가 노력이 필요함.

Reviewer Confidence*

4: High - The reviewer is an expert in the subject area and has extensive knowledge of the research methods and context of the paper. They are highly confident in their ability to provide an accurate and thorough assessment. Their evaluation is based on deep expertise and a comprehensive understanding of the work.
3: Moderate - The reviewer has a good understanding of the subject area and is familiar with the research methods and context of the paper. They feel confident in their ability to accurately assess the quality and significance of the work. Their evaluation is based on a solid grasp of the content and context.
2: Low - The reviewer has some knowledge of the subject area and is somewhat familiar with the research methods and context of the paper. They understand the main points but may lack depth in certain areas. The reviewer is reasonably confident in their assessment but acknowledges some limitations in their expertise.
1: Poor - The reviewer has limited knowledge of the subject area and is not very familiar with the specific research methods or context of the paper. The reviewer is unsure about their ability to accurately assess the paper and may have had difficulty understanding key aspects of the work. Their evaluation should be considered with caution.

→ 2: Low — LLM agent 및 context engineering 분야에 대한 이해는 있으나, IRL(Inverse RL) 및 offline RL(CQL, FQE) 알고리즘의 세부 이론에 대한 전문성은 제한적임.

Ethics Review Flag*

Please select Yes if there are ethical issues with this paper, and specify the issue in the text box below. For guidance on when this is appropriate, refer to ACM Code of Ethics (https://www.acm.org/code-of-ethics), ACM Policy on Plagiarism, Misrepresentation, and Falsification (https://www.acm.org/publications/policies/plagiarism-overview), ACM Publications Policy on Research Involving Human Participants and Subjects (https://www.acm.org/publications/policies/research-involving-human-participants-and-subjects). If in doubt, please enquire with the Program Chairs.
→ No

Ethics Review Description*

If you select Yes to the ethics review flag above please describe the issue.

N/A
Llm Usage Description*

Please describe in what ways you have used LLMs in this review. This is not disclosed to authors.

Claude를 활용하여 논문의 다양한 관점(representation learning, causality, technical soundness)에서의 초기 분석을 수행하고, 이를 바탕으로 리뷰 초안을 작성하는 데 사용하였음. 최종 판단과 점수는 리뷰어 본인의 의견임.

1815: COMET

1815: COMET: Correlated tasks via Orthogonal experts and Multi-head world models for Efficient Teacher distillation (Jumyung, Kyungmin, Azamat)

Summary*
Please briefly summarize the main points and contributions of this paper.

Paper Strengths*

Please provide a list of the strengths of this paper, including but not limited to: innovative and practical methodology, insightful empirical findings or in-depth theoretical analysis, well-structured review of relevant literature, and any other factors that may make the paper valuable to readers.

Paper Weaknesses*
Please provide a list of the weaknesses of this paper, including but not limited to: inadequate implementation details for reproducing the study, limited evaluation and ablation studies for the proposed method, correctness of the theoretical analysis or experimental results, lack of comparisons or discussions with widely-known baselines in the field, lack of clarity in exposition, or any other factors that may impede the reader’s understanding or benefit from the paper. Please kindly refrain from providing a general assessment of the paper’s novelty without providing detailed explanations.

Questions And Suggestions For Rebuttal* (Math command with TeX is fine)

Please provide a numbered list of specific and clear questions that pertain to the details of the proposed method, evaluation setting, or additional results that would aid in supporting the authors’ claims. The questions should be formulated in a manner that, after the authors have answered them during the rebuttal, it would enable a more thorough assessment of the paper’s quality.

Relevance*
4: High - The work is relevant to the Research track of KDD and is of broad interest to the community
3: Moderate - The work is somewhat relevant to the Research track of KDD and is of narrow interest to a sub-community
2: Low - The connection to KDD is weak
1: Poor - The work is irrelevant to KDD

Novelty*

4: High - The paper offers groundbreaking and transformative ideas or approaches that substantially advance the field or open up entirely new areas of research. The level of innovation is high, leading to major advancements and potentially inspiring further research and development.
3: Moderate - The paper introduces a new and interesting idea or approach that adds value to the field. The contribution is original and represents an advancement of existing knowledge, demonstrating solid innovation and creativity.
2: Low - The ideas are relatively minor and largely incremental. The work builds heavily on existing research.
1: Poor - The paper presents ideas and results that are well-known and have been extensively covered in previous research. There are no new contributions or unique perspectives.

Technical Quality*

4: High - The paper exhibits a high level of technical quality with a rigorous and well-executed methodology and analysis. The results are highly reliable, well-supported, and thorough. The work demonstrates technical excellence and sets a high standard for quality in the field.
3: Moderate - The paper demonstrates solid technical quality with a sound methodology and thorough analysis. The results are reliable and well-supported. There may be minor issues, but they do not significantly undermine the overall quality. The work is competently executed and meets acceptable standards.
2: Low - The paper has several technical weaknesses, such as minor methodological flaws, insufficient analysis, or unsupported conclusions. While the work shows some level of competence, it lacks thoroughness and precision. Improvements are necessary for it to be considered robust.
1: Poor - The paper has significant technical errors, methodological flaws, or incorrect conclusions. The work lacks rigor, and the results are unreliable. The overall quality is below acceptable standards, and the technical execution is weak.

Presentation*

4: High - The paper is well-organized and very clear. The writing is precise, engaging, and free of errors. Figures and tables are well-designed and seamlessly integrated into the text, enhancing the reader’s comprehension. The presentation is polished and professional, making the paper a pleasure to read and understand.
3: Moderate - The paper is organized and generally clear. The writing is mostly free of grammatical and typographical errors, making it easy to read. Figures and tables are effectively used to support the text. The presentation facilitates understanding and conveys the key points effectively.
2: Low - The paper has noticeable issues with clarity and coherence. The writing may contain several grammatical and typographical errors. Figures and tables are present but may not be well-integrated or effectively used. The presentation allows for understanding but requires effort from the reader.
1: Poor - The paper is poorly organized and difficult to follow. The writing is unclear, with numerous grammatical and typographical errors. Figures and tables, if present, are poorly designed or hard to understand. Overall, the presentation detracts significantly from the readability and comprehension of the work.

Reproducibility*

4: High - The paper offers a comprehensive and precise description of the methods, data, and procedures. Supplementary materials, including datasets and code, are complete, well-documented, and easily accessible. Reproducing the results would be straightforward and require minimal additional effort, ensuring high reproducibility.
3: Moderate - The paper provides a clear and detailed description of the methods, data, and procedures used. Supplementary materials, such as datasets and code, are available and sufficiently documented. Reproducing the results would be feasible with the provided information, though some effort may still be required.
2: Low - The paper includes some information about the methods, data, and procedures, but key details are missing. There may be supplementary materials, but they are incomplete or unclear. Reproducing the results would require significant effort and additional information.
1: Poor - The paper provides insufficient details about the methods, data, and procedures used. There are no available supplementary materials, and the description is so vague that reproducing the results would be extremely difficult or impossible.

Reviewer Confidence*

4: High - The reviewer is an expert in the subject area and has extensive knowledge of the research methods and context of the paper. They are highly confident in their ability to provide an accurate and thorough assessment. Their evaluation is based on deep expertise and a comprehensive understanding of the work.
3: Moderate - The reviewer has a good understanding of the subject area and is familiar with the research methods and context of the paper. They feel confident in their ability to accurately assess the quality and significance of the work. Their evaluation is based on a solid grasp of the content and context.
2: Low - The reviewer has some knowledge of the subject area and is somewhat familiar with the research methods and context of the paper. They understand the main points but may lack depth in certain areas. The reviewer is reasonably confident in their assessment but acknowledges some limitations in their expertise.
1: Poor - The reviewer has limited knowledge of the subject area and is not very familiar with the specific research methods or context of the paper. The reviewer is unsure about their ability to accurately assess the paper and may have had difficulty understanding key aspects of the work. Their evaluation should be considered with caution.

Ethics Review Flag*

Please select Yes if there are ethical issues with this paper, and specify the issue in the text box below. For guidance on when this is appropriate, refer to ACM Code of Ethics (https://www.acm.org/code-of-ethics), ACM Policy on Plagiarism, Misrepresentation, and Falsification (https://www.acm.org/publications/policies/plagiarism-overview), ACM Publications Policy on Research Involving Human Participants and Subjects (https://www.acm.org/publications/policies/research-involving-human-participants-and-subjects). If in doubt, please enquire with the Program Chairs.
Yes/No

Ethics Review Description*

If you select Yes to the ethics review flag above please describe the issue.

Llm Usage Description*

Please describe in what ways you have used LLMs in this review. This is not disclosed to authors.