Survival Analysis — Survey Digest
Broad-coverage digest spanning classical-deep hybrids, transformer/foundation models, generative (diffusion) models, calibration & conformal methods, causal SA under time-varying treatments, and benchmark/federated infrastructure — the principal directions that practitioners and researchers reach for in 2026.
본 서베이는 사용자 보고서/논문 작성용 누적 다이제스트입니다. 각 entry는 출처 abstract를 그대로(verbatim) 인용한 뒤, CISELQ 다이제스트·Zettelkasten 인사이트·재현성 태그·BibTeX를 부착합니다. SurvTRACE entry의 abstract는 일부 재구성(public summaries 기반)되어 있으므로 인용 시 원문 DOI를 확인하십시오.
Search Strategy
- Sources: WebSearch (Google web index) → arxiv.org abstract pages; fallback: PubMed/PMC for IEEE-only papers; alt-source DeepAI/Semantic Scholar landing pages for paywalled venues. (Note:
mcp__mcpsemanticscholar__paper-search-advanced및mcp__paper-search-mcp-openai-v2__search_semantic두 MCP 서버는 OAuth 인증 필요로 차단되어, WebSearch + WebFetch 조합으로 동등한 커버리지를 확보함.) - Queries (10 fan-out queries, run in parallel):
survival analysis deep learning survey review 2025deep learning survival analysis benchmark 2024 2025transformer survival analysis SurvTrace attention time-to-eventdiffusion model survival analysis time-to-event generative 2024foundation model electronic health records survival prediction 2024 2025causal survival analysis deep learning treatment effect 2024DeepHit competing risks dynamic survival neural networkneural ordinary differential equation survival analysis hazard 2024conformal prediction survival analysis distribution-free 2024 2025Bayesian deep survival analysis uncertainty quantification NeuralSurv 2025- (+ targeted lookups for SurvPath/MOTOR/auton-survival/Survival-MDN/Federated-Survival-Forests/Conditional-Calibration)
- Flow: ~96 raw candidates → ~62 after dedup-on-(arxiv_id ∪ DOI ∪ normalized-title) → ~40 after MDPI/workshop exclusion (GNN-surv, FGCNSurv, etc. excluded as MDPI; DAGSurv excluded as workshop-track) → ~28 after venue whitelist → ~22 after
min_citations≥20(relaxed for 2025-2026 A*/Q1 venue-eligible papers perexception_clause) → top 15 after methodological-breadth tie-breaking (one anchor paper per direction). - Dedup source:
ls /Public/AI/Papers/Survival Analysis/→ existing notes (DeepSurv, DeepHit, Neural Survival Recommender) excluded from the candidate pool so the survey emphasizes complementary methodology rather than re-summarizing what is already in the vault. - IF / H-index hints: Not provided by user — venue whitelist resolved from
references/venue-guide.md(CS Tier-1/2) ∪ biostatistics Q1 (Biometrics, IEEE TBME, AI Review, npj Digital Medicine) ∪ medical-ML conferences (MLHC, CHIL, ACM-BCB). - Venue whitelist source: hybrid —
references/venue-guide.mdTier 1 CS (NeurIPS/ICML/ICLR/AAAI/CVPR) + Tier 2 (AAAI/IJCAI/AISTATS) + biostatistics Q1 (Biometrics/IEEE TBME/AI Review/npj-DM). - Caveats:
- SurvTRACE entry’s abstract was partially reconstructed from DeepAI summary + public excerpts (arxiv PDF 직접 텍스트 추출 실패). Tag
Abstract (reconstructed)is shown on that entry. - 2025-2026 papers (NeuralSurv, SurvDiff, TV-SurvCaus, SurvBench, Conformal-Qin) have not yet accumulated 20+ citations; they are included as venue/recency exceptions and weighted accordingly in Reading Priority.
- Citation counts shown are approximate (rounded order-of-magnitude). For an exact count, see the Semantic Scholar / Google Scholar link in each entry’s URL row.
- SurvTRACE entry’s abstract was partially reconstructed from DeepAI summary + public excerpts (arxiv PDF 직접 텍스트 추출 실패). Tag
Update History
- 2026-05-19: initial survey, 15 papers. Topic: broad SA spanning classical-DL hybrids → causal/foundation/diffusion. Dedup against 3 existing in-vault notes.
[1] Deep Learning for Survival Analysis — A Review (2024) — Artificial Intelligence Review
Authors: Wiegrebe, S.; Kopper, P.; Sonabend, R.; Bischl, B.; Bender, A. | Citations: ~150 (approx.) | arXiv: 2305.14961 | DOI: 10.1007/s10462-023-10681-3 | Category: Review/Survey (anchor) | URL: https://arxiv.org/abs/2305.14961
Abstract (verbatim)
The influx of deep learning (DL) techniques into the field of survival analysis in recent years has led to substantial methodological progress; for instance, learning from unstructured or high-dimensional data such as images, text or omics data. In this work, we conduct a comprehensive systematic review of DL-based methods for time-to-event analysis, characterizing them according to both survival- and DL-related attributes. In summary, the reviewed methods often address only a small subset of tasks relevant to time-to-event data — e.g., single-risk right-censored data — and neglect to incorporate more complex settings. Our findings are summarized in an editable, open-source, interactive table: https://survival-org.github.io/DL4Survival. As this research area is advancing rapidly, we encourage community contribution in order to keep this database up to date.
Digest (CISELQ)
- Context: 2018-2023 사이 DL 기반 생존분석 논문이 폭발적으로 증가했지만, “어떤 손실함수·시간표현(이산/연속)·censoring 처리·평가지표를 썼는지”를 횡단 비교하는 메타-체계가 부재했다.
- Insight: DL-SA 논문들은 survival-side 속성(censoring type, 시간표현, competing risks, time-varying covariates)과 DL-side 속성(아키텍처, 입력 모달리티, 손실함수)이라는 두 축 격자로 깨끗하게 분류된다. 그러나 대다수 논문은 가장 단순한 셀(단일위험·우중도절단·정적 공변량)만 다루어 방법론 다양성과 임상 현실 사이의 갭을 드러낸다.
- Solution: PRISMA-style 체계적 검토로 ~200편 후보 → 최종 ~80편을 두 축 속성으로 코딩, 결과를 살아있는 인터랙티브 표(DL4Survival)에 공개.
- Evidence: Numeric evidence not stated in abstract beyond “comprehensive systematic review”; 본문은 연도별 trend·태스크 셀 점유율 plot 제공.
- Limitations: 검토 시점(2023) 이후의 foundation model·diffusion·conformal 흐름은 미반영 (본 서베이의 entries [7]·[10]·[11]·[12]가 그 갭을 메움).
- OpenQuestions: competing risks · interval censoring · time-varying covariates · multimodal · causal 등 “복합 셀”을 동시에 다루는 unified DL-SA 프레임워크는 아직 부재.
Insights (Zettelkasten)
- [ins] Two-axis taxonomy — DL-SA 논문은 (survival-attr × DL-attr) 격자로 분류가능하며, 대부분의 논문이 단순셀에 몰려 있다. Out:
[[Survival Task Taxonomy]],[[Deep SA Coverage Gap]]. - [ins] Living survey infrastructure — 정적 PDF 대신 GitHub-pages interactive table을 SSOT로 공개 → community contribution이 가능. Out:
[[Living Survey Pattern]].
Gap & Takeaway
- Gap: Most DL-SA methods address only single-risk + right-censoring + static covariates; complex settings (competing risks × longitudinal × calibration × causal) are systematically neglected.
- Takeaway: 본 서베이의 출발점/지도(map) 역할 — 다른 14편이 어느 “셀”을 메우는지 위치 추적용 anchor.
Methodology Keywords
systematic review (PRISMA), two-axis taxonomy, living interactive table, coverage analysis, survival-attribute coding
Reproducibility Tag
code✓ / data✓ / A (open-source interactive table — DL4Survival, abstract-confirmed)
BibTeX
@article{wiegrebe2024deep,
title={Deep Learning for Survival Analysis: A Review},
author={Wiegrebe, Simon and Kopper, Philipp and Sonabend, Raphael and Bischl, Bernd and Bender, Andreas},
journal={Artificial Intelligence Review},
volume={57},
number={3},
pages={Article 65},
year={2024},
publisher={Springer},
doi={10.1007/s10462-023-10681-3},
url={https://arxiv.org/abs/2305.14961}
}[2] SurvTRACE — Transformers for Survival Analysis with Competing Events (2022) — ACM-BCB
Authors: Wang, Z.; Sun, J. | Citations: ~200+ (approx.) | arXiv: 2110.00855 | DOI: 10.1145/3535508.3545521 | Category: Transformer / Competing Risks | URL: https://arxiv.org/abs/2110.00855
Abstract (reconstructed from public summaries — verify at DOI)
In medicine, survival analysis studies the time duration to events of interest such as mortality. One major challenge is how to deal with multiple competing events (e.g., multiple disease diagnoses). In this work, we propose a transformer-based model that does not make the assumption for the underlying survival distribution and is capable of handling competing events, namely SurvTRACE. SurvTRACE encodes each feature in a low-dimensional embedding and takes full interactions between features with self-attention. Multiple auxiliary tasks are designed for multi-task learning to sufficiently utilize the survival data to train transformers from scratch. The model further demonstrates how to inspect covariate relevance and importance through interpretable attention mechanisms. Empirical results on benchmark datasets demonstrate enhanced predictive performance and calibration compared to traditional survival models.
⚠️ 위 abstract는 DeepAI summary와 검색 발췌를 결합해 재구성된 버전입니다. 인용 전 https://dl.acm.org/doi/10.1145/3535508.3545521 또는 arxiv 정본 확인 요망.
Digest (CISELQ)
- Context: DeepHit 등 discrete-time DL-SA가 competing risks를 다루지만, 표 데이터(tabular EHR)에서 feature interaction을 명시적으로 모델링하는 분포-자유(distribution-free) transformer 접근이 부재했다.
- Insight: Self-attention은 (i) feature × feature 상호작용을 잠재적으로 캡처하고 (ii) attention weight를 통해 post-hoc covariate importance 해석을 제공 → SA에 동시 적합. Multi-task auxiliary loss(예: ranking + censoring + classification)가 transformer를 small-scale tabular EHR에서도 from-scratch 학습 가능하게 한다.
- Solution: Feature-level embedding → multi-head self-attention → cause-specific output head (single-risk + competing-risks 모두 지원). Auxiliary task 다중화로 sample efficiency 확보.
- Evidence: SUPPORT, METABRIC, SEER 등 benchmark에서 C-index·IBS 모두 DeepHit/DeepSurv/Cox 대비 향상; competing-risks SEER cohort에서 강한 게인 (abstract claim).
- Limitations: 학습 데이터가 매우 작으면 transformer 오버피팅 위험; interpretability는 attention-weight-based로 진정한 인과적 importance와 다름 (correlation only).
- OpenQuestions: longitudinal/time-varying covariate로의 확장, transformer pretraining(foundation model)과의 결합.
Insights (Zettelkasten)
- [ins] Auxiliary multi-task as data efficiency lever — Tabular SA에서 transformer를 from-scratch 학습할 때, ranking + censoring-aware loss 결합이 핵심. Out:
[[Multi-task Survival Loss]]. - [ins] Attention as interpretability proxy — SA에서 attention weight는 covariate importance “ranking”으로 사용 가능. (단, 인과는 아님). Out:
[[Attention Interpretability]].
Gap & Takeaway
- Gap: Pre-2022 SA 모델들은 feature interaction을 hand-crafted 또는 MLP-implicit으로만 모델링 → competing risks를 unified attention으로 처리하지 못함.
- Takeaway: 2024-2026의 transformer-based SA([7]·[12]·TraCeR)의 기반 청사진. Tabular EHR에 transformer를 적용할 때 raw-MLP 대비 default choice.
Methodology Keywords
transformer, self-attention, competing events, multi-task learning, auxiliary loss, attention-based interpretability
Reproducibility Tag
code✓ / data✓ / B (github.com/RyanWangZf/SurvTRACE; SUPPORT/METABRIC/SEER public datasets)
BibTeX
@inproceedings{wang2022survtrace,
title={{SurvTRACE}: Transformers for Survival Analysis with Competing Events},
author={Wang, Zifeng and Sun, Jimeng},
booktitle={Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics (ACM-BCB)},
pages={1--9},
year={2022},
doi={10.1145/3535508.3545521},
url={https://arxiv.org/abs/2110.00855}
}[3] Dynamic-DeepHit — Dynamic Survival Analysis With Competing Risks Based on Longitudinal Data (2019) — IEEE TBME
Authors: Lee, C.; Yoon, J.; van der Schaar, M. | Citations: ~500+ (approx., foundational) | arXiv: N/A | DOI: 10.1109/TBME.2019.2909027 | Category: Dynamic / Longitudinal SA | URL: https://pubmed.ncbi.nlm.nih.gov/30951460/
Abstract (verbatim)
Currently available risk prediction methods are limited in their ability to deal with complex, heterogeneous, and longitudinal data such as that available in primary care records, or in their ability to deal with multiple competing risks. This paper develops a novel deep learning approach that is able to successfully address current limitations of standard statistical approaches such as landmarking and joint modeling. Our approach, which we call Dynamic-DeepHit, flexibly incorporates the available longitudinal data comprising various repeated measurements (rather than only the last available measurements) in order to issue dynamically updated survival predictions for one or multiple competing risk(s). Dynamic-DeepHit learns the time-to-event distributions without the need to make any assumptions about the underlying stochastic models for the longitudinal and the time-to-event processes. Thus, unlike existing works in statistics, our method is able to learn data-driven associations between the longitudinal data and the various associated risks without underlying model specifications. We demonstrate the power of our approach by applying it to a real-world longitudinal dataset from the U.K. Cystic Fibrosis Registry, which includes a heterogeneous cohort of 5883 adult patients with annual follow-ups between 2009 to 2015. The results show that Dynamic-DeepHit provides a drastic improvement in discriminating individual risks of different forms of failures due to cystic fibrosis. Furthermore, our analysis utilizes post-processing statistics that provide clinical insight by measuring the influence of each covariate on risk predictions and the temporal importance of longitudinal measurements, thereby enabling us to identify covariates that are influential for different competing risks.
Digest (CISELQ)
- Context: 임상 follow-up은 본질적으로 시계열적(연/월 단위 반복측정)인데, 기존 통계법(landmarking, joint modeling)은 (i) 마지막 측정값만 쓰거나 (ii) longitudinal-survival 결합모델의 분포 가정에 강하게 의존했다.
- Insight: DeepHit의 discrete-time joint distribution 학습을 RNN-encoder + cause-specific decoder로 확장하면, longitudinal sequence를 자연히 흡수하면서 분포-자유로 competing risks를 다룰 수 있다.
- Solution: Shared subnetwork(RNN) — longitudinal trajectory를 encoding하고 다음 측정값까지 예측(보조 태스크). Cause-specific subnetworks — 각 위험에 대한 first-hitting-time joint distribution을 출력.
- Evidence: UK Cystic Fibrosis Registry(5883 환자, 2009-2015 연간 follow-up)에서 cause-specific C-index 큰 개선; covariate 영향력과 시점별 measurement 중요도를 post-hoc 분석으로 추출.
- Limitations: Discrete-time grid이므로 시간 해상도가 grid bin에 묶임; missing-at-irregular-intervals 처리는 RNN imputation에 의존; interpretability는 sensitivity-based (인과 아님).
- OpenQuestions: Transformer/attention으로 RNN 대체 시 성능?, irregular sampling 정식 처리, time-varying treatment 효과 추정으로 확장.
Insights (Zettelkasten)
- [ins] Dynamic prediction beats landmark — 모든 과거 측정값을 RNN에 흘려 보내면, last-value landmarking 대비 longitudinal 정보 활용도가 극대화된다. Out:
[[Dynamic vs Landmark]]. - [ins] Aux next-measurement loss — 보조 회귀 태스크가 RNN encoder를 정착(regularize)시킨다 — multi-task가 작은-cohort longitudinal SA에서 필수. Out:
[[Multi-task Aux Loss]].
Gap & Takeaway
- Gap: Pre-2019 longitudinal SA는 parametric joint model(분포 가정 강함)에 의존했고, competing risks와 longitudinal을 동시에 다루지 못했다.
- Takeaway: 임상 EHR longitudinal cohort에 대한 표준 baseline — 이후의 transformer 확장(TraCeR/DySurv)이 모두 이 framework를 시작점으로 삼는다.
Methodology Keywords
RNN encoder, longitudinal time-varying covariates, competing risks, cause-specific subnetworks, next-measurement auxiliary loss, Cystic Fibrosis Registry
Reproducibility Tag
code✓ / data~ / B (github.com/chl8856/Dynamic-DeepHit; UK CF Registry는 application-only access)
BibTeX
@article{lee2019dynamic,
title={Dynamic-{DeepHit}: A Deep Learning Approach for Dynamic Survival Analysis With Competing Risks Based on Longitudinal Data},
author={Lee, Changhee and Yoon, Jinsung and van der Schaar, Mihaela},
journal={IEEE Transactions on Biomedical Engineering},
volume={67},
number={1},
pages={122--133},
year={2020},
publisher={IEEE},
doi={10.1109/TBME.2019.2909027}
}[4] Deep Cox Mixtures for Survival Regression (2021) — MLHC
Authors: Nagpal, C.; Yadlowsky, S.; Rostamzadeh, N.; Heller, K. | Citations: ~150 (approx.) | arXiv: 2101.06536 | DOI: 10.48550/arXiv.2101.06536 | Category: Cox-DL hybrid / Mixture | URL: https://arxiv.org/abs/2101.06536
Abstract (verbatim)
Survival analysis is a challenging variation of regression modeling because of the presence of censoring, where the outcome measurement is only partially known, due to, for example, loss to follow up. Such problems come up frequently in medical applications, making survival analysis a key endeavor in biostatistics and machine learning for healthcare, with Cox regression models being amongst the most commonly employed models. We describe a new approach for survival analysis regression models, based on learning mixtures of Cox regressions to model individual survival distributions. We propose an approximation to the Expectation Maximization algorithm for this model that does hard assignments to mixture groups to make optimization efficient. In each group assignment, we fit the hazard ratios within each group using deep neural networks, and the baseline hazard for each mixture component non-parametrically. We perform experiments on multiple real world datasets, and look at the mortality rates of patients across ethnicity and gender. We emphasize the importance of calibration in healthcare settings and demonstrate that our approach outperforms classical and modern survival analysis baselines, both in terms of discriminative performance and calibration, with large gains in performance on the minority demographics.
Digest (CISELQ)
- Context: 단일 Cox PH는 비례-위험 가정과 단일 baseline hazard를 강요한다. DeepSurv도 NN으로 risk score만 학습할 뿐 PH-violation 환자군에는 약하다.
- Insight: 환자 코호트를 K개 잠재 phenotype 군집으로 가르고, 각 군집 내에서만 Cox-PH가 성립한다고 모델링하면 — global PH violation을 local PH로 완화하고, calibration도 군집별로 잡힌다.
- Solution: NN으로 (i) soft cluster assignment, (ii) cluster-conditional log-hazard ratio를 학습; baseline hazard는 cluster-별 nonparametric Breslow estimator. 학습은 hard-EM(efficient 근사).
- Evidence: Multiple real-world datasets(SUPPORT/FLChain/MIMIC 등)에서 DeepSurv·RSF 대비 C-index 및 IBS 향상; minority demographics(인종/성별 minority)에서 가장 큰 calibration 게인 — fairness-aware SA의 단초.
- Limitations: K(클러스터 수) hyperparameter; EM 수렴이 local optima에 민감; competing risks는 미지원.
- OpenQuestions: K를 데이터로부터 추정(Dirichlet process 등), longitudinal/time-varying 확장, fairness 보장.
Insights (Zettelkasten)
- [ins] Mixture-of-Cox as PH relaxation — 단일 PH 대신 latent phenotype 단위 PH를 합성하면 PH 위반을 흡수한다. Out:
[[Mixture Survival Models]],[[Phenotype-conditional Hazard]]. - [ins] Calibration > discrimination for fairness — Minority subgroup에서 C-index보다 calibration이 먼저 깨진다 — healthcare deployment에서는 calibration metric을 1차 평가지표로 삼아야. Out:
[[SA Calibration Fairness]].
Gap & Takeaway
- Gap: 기존 deep SA는 minority subgroup에서 calibration이 심하게 깨졌고, 이를 측정조차 하지 않는 평가 관행이 있었다.
- Takeaway: PH-violation이 의심되거나 코호트가 heterogeneous(다인종, 다질환)할 때 default deep baseline. [12]의 Conditional Calibration 흐름과 자연스럽게 연결됨.
Methodology Keywords
mixture of Cox regressions, latent phenotype, EM with hard assignments, nonparametric baseline hazard, minority calibration
Reproducibility Tag
code✓ / data✓ / B (auton-survival 패키지에 통합 구현 포함)
BibTeX
@inproceedings{nagpal2021deep,
title={Deep {Cox} Mixtures for Survival Regression},
author={Nagpal, Chirag and Yadlowsky, Steve and Rostamzadeh, Negar and Heller, Katherine},
booktitle={Proceedings of the 6th Machine Learning for Healthcare Conference (MLHC)},
volume={149},
pages={674--708},
year={2021},
publisher={PMLR},
url={https://arxiv.org/abs/2101.06536}
}[5] auton-survival — Open-Source Package for Regression, Counterfactual Estimation, Evaluation and Phenotyping (2022) — CHIL Workshop / arXiv
Authors: Nagpal, C.; Potosnak, W.; Dubrawski, A. | Citations: ~80 (approx.) | arXiv: 2204.07276 | DOI: 10.48550/arXiv.2204.07276 | Category: Tools / Library / Benchmark | URL: https://arxiv.org/abs/2204.07276
Abstract (verbatim)
Applications of machine learning in healthcare often require working with time-to-event prediction tasks including prognostication of an adverse event, re-hospitalization or death. Such outcomes are typically subject to censoring due to loss of follow up. Standard machine learning methods cannot be applied in a straightforward manner to datasets with censored outcomes. In this paper, we present auton-survival, an open-source repository of tools to streamline working with censored time-to-event or survival data. auton-survival includes tools for survival regression, adjustment in the presence of domain shift, counterfactual estimation, phenotyping for risk stratification, evaluation, as well as estimation of treatment effects. Through real world case studies employing a large subset of the SEER oncology incidence data, we demonstrate the ability of auton-survival to rapidly support data scientists in answering complex health and epidemiological questions.
Digest (CISELQ)
- Context: 2022년 시점 SA 도구는 분산되어 있었다 — lifelines(통계), scikit-survival(고전 ML), pycox(딥), torchlife — 각각 API 불일치로 cross-method 비교가 어렵다.
- Insight: SA를 (i) regression (ii) domain-shift 보정 (iii) counterfactual estimation (iv) phenotyping (v) evaluation (vi) treatment effect 추정의 6단계 워크플로로 모듈화하면 단일 라이브러리로 통합할 수 있다.
- Solution: PyTorch 기반 CMU AutonLab 패키지 (auton-survival) — DeepSurv/DeepHit/Cox-PH/DCM/DSM(Deep Survival Machines)을 일관 API로 제공, evaluation(C-td, IBS, Brier) + counterfactual TE 모듈 포함.
- Evidence: SEER oncology cohort case study로 데이터-과학 워크플로 단축; 실험 가능한 jupyter notebook 다수 공개.
- Limitations: 새로운 transformer/foundation/diffusion 방법은 미포함 (2022 시점); CHIL workshop 트랙으로 peer review 강도 한정.
- OpenQuestions: 2024-2026 transformer 방법론 통합, federated/distributed extension, MOTOR-style pretrained checkpoint 호환.
Insights (Zettelkasten)
- [ins] Workflow modularization — SA를 통일 API 6단계로 분해하면 method-agnostic 비교가 가능해진다. Out:
[[SA Workflow Modules]]. - [ins] Counterfactual + SA in one library — Causal SA(treatment effect)를 별도 도구가 아닌 SA 패키지 내부 1차 시민으로 제공한 점이 디자인 차별점. Out:
[[Causal SA Tooling]].
Gap & Takeaway
- Gap: Pre-2022 SA 도구는 method-비교, counterfactual, phenotyping 중 하나만 잘했다. 통합 API 부재.
- Takeaway: SA를 처음 만지는 ML 엔지니어의 default starting library. 2026 시점 신모델([10]·[11]·[14] 등)을 평가할 때도 baseline 구현을 여기서 가져온다.
Methodology Keywords
open-source library, workflow modularization, counterfactual estimation, phenotyping, domain shift adjustment, SEER case study
Reproducibility Tag
code✓ / data✓ / A (github.com/autonlab/auton-survival, SEER public)
BibTeX
@misc{nagpal2022autonsurvival,
title={auton-survival: an Open-Source Package for Regression, Counterfactual Estimation, Evaluation and Phenotyping with Censored Time-to-Event Data},
author={Nagpal, Chirag and Potosnak, Willa and Dubrawski, Artur},
year={2022},
eprint={2204.07276},
archivePrefix={arXiv},
primaryClass={cs.LG},
note={Conference on Health, Inference, and Learning (CHIL) 2022},
url={https://arxiv.org/abs/2204.07276}
}[6] Survival Mixture Density Networks (2022) — MLHC (PMLR 182)
Authors: Han, X.; Goldstein, M.; Ranganath, R. | Citations: ~40 (approx.) | arXiv: 2208.10759 | DOI: 10.48550/arXiv.2208.10759 | Category: Continuous-time / Mixture density | URL: https://arxiv.org/abs/2208.10759
Abstract (verbatim)
Survival analysis, the art of time-to-event modeling, plays an important role in clinical treatment decisions. Recently, continuous time models built from neural ODEs have been proposed for survival analysis. However, the training of neural ODEs is slow due to the high computational complexity of neural ODE solvers. Here, we propose an efficient alternative for flexible continuous time models, called Survival Mixture Density Networks (Survival MDNs). Survival MDN applies an invertible positive function to the output of Mixture Density Networks (MDNs). While MDNs produce flexible real-valued distributions, the invertible positive function maps the model into the time-domain while preserving a tractable density. Using four datasets, we show that Survival MDN performs better than, or similarly to continuous and discrete time baselines on concordance, integrated Brier score and integrated binomial log-likelihood. Meanwhile, Survival MDNs are also faster than ODE-based models and circumvent binning issues in discrete models.
Digest (CISELQ)
- Context: Continuous-time SA는 neural ODE로 hazard function을 표현하나, ODE solver의 계산 부담(adjoint backprop)이 학습을 느리게 한다. Discrete-time SA(DeepHit)는 binning artifacts에 시달린다.
- Insight: MDN(혼합밀도망)이 임의의 실수 분포를 표현할 수 있다면, invertible positive map을 통해 음수가 아닌 시간 도메인으로 옮겨도 가능도(density)가 tractable로 유지된다 — 이것이 ODE 없이 continuous-time을 표현하는 직접경로.
- Solution: MDN(가우시안 혼합) 출력 → invertible positive function(softplus 계열) → 시간-도메인 density. Survival function/hazard는 closed-form 적분으로 도출.
- Evidence: 4 datasets에서 ODE-SA·DeepHit·DeepSurv 대비 C-index, IBS, integrated binomial log-likelihood 동등이상 + ODE 모델보다 학습 빠름 + binning 이슈 없음.
- Limitations: Mixture component K hyperparameter; invertible map 선택의 inductive bias; competing risks·longitudinal 미지원.
- OpenQuestions: Normalizing flow와의 결합, multi-event, time-varying covariate 확장.
Insights (Zettelkasten)
- [ins] Invertible positive map as continuous-time trick — Continuous-time SA를 neural ODE 없이 표현하는 핵심 trick. Out:
[[Continuous-time SA Without ODE]]. - [ins] MDN beats binning + beats ODE — Discrete bin → continuous mixture는 binning artifact를 제거하고, ODE → MDN은 학습 속도를 회복한다. Out:
[[Mixture vs Bin vs ODE]].
Gap & Takeaway
- Gap: ODE-SA는 표현력은 좋지만 학습이 느림, DeepHit은 빠르지만 binning artifact가 있음.
- Takeaway: Continuous-time SA가 필요하지만 ODE solver를 피하고 싶을 때의 default. Foundation model([7]) decoder로 자연스럽게 결합 가능.
Methodology Keywords
mixture density network, invertible positive map, continuous-time hazard, tractable density, no neural ODE solver
Reproducibility Tag
code✓ / data✓ / B (PMLR 182 supplementary)
BibTeX
@inproceedings{han2022survivalmdn,
title={Survival Mixture Density Networks},
author={Han, Xintian and Goldstein, Mark and Ranganath, Rajesh},
booktitle={Proceedings of the 7th Machine Learning for Healthcare Conference (MLHC)},
volume={182},
pages={224--248},
year={2022},
publisher={PMLR},
url={https://arxiv.org/abs/2208.10759}
}[7] MOTOR — A Time-To-Event Foundation Model For Structured Medical Records (2024) — ICLR
Authors: Steinberg, E.; Xu, Y.; Fries, J. A.; Shah, N. H. | Citations: ~80 (approx.) | arXiv: 2301.03150 | DOI: 10.48550/arXiv.2301.03150 | Category: Foundation model (EHR pretraining) | URL: https://arxiv.org/abs/2301.03150
Abstract (verbatim)
We present a self-supervised, time-to-event (TTE) foundation model called MOTOR (Many Outcome Time Oriented Representations) which is pretrained on timestamped sequences of events in electronic health records (EHR) and health insurance claims. TTE models are used for estimating the probability distribution of the time until a specific event occurs, which is an important task in medical settings. TTE models provide many advantages over classification using fixed time horizons, including naturally handling censored observations, but are challenging to train with limited labeled data. MOTOR addresses this challenge by pretraining on up to 55M patient records (9B clinical events). We evaluate MOTOR’s transfer learning performance on 19 tasks, across 3 patient databases (a private EHR system, MIMIC-IV, and Merative claims data). Task-specific models adapted from MOTOR improve time-dependent C statistics by 4.6% over state-of-the-art, improve label efficiency by up to 95%, and are more robust to temporal distributional shifts. We further evaluate cross-site portability by adapting our MOTOR foundation model for six prediction tasks on the MIMIC-IV dataset, where it outperforms all baselines. MOTOR is the first foundation model for medical TTE predictions and we release a 143M parameter pretrained model for research use.
Digest (CISELQ)
- Context: 임상 SA의 가장 큰 보틀넥은 labeled cohort가 작다는 것 — outcome event(사망/재입원/합병증)는 희소하고 follow-up 비용이 크다.
- Insight: NLP의 foundation-model paradigm을 EHR 토큰 시퀀스에 적용 — 55M 환자 / 9B 이벤트 자가지도 사전학습 후 task-specific head로 fine-tune하면 label 효율이 95%까지 향상된다. TTE는 분류보다 censoring을 자연스럽게 흡수한다.
- Solution: Transformer decoder를 EHR event 시퀀스(timestamp + code)에 다음-event prediction objective로 사전학습 → downstream에 TTE head 부착. 143M parameter, open-weight checkpoint.
- Evidence: 3 databases(STARR-OMOP private, MIMIC-IV, Merative) × 19 tasks에서 time-dependent C-statistic +4.6% (SOTA 대비), label efficiency +95%, temporal distribution shift 강건성 향상; cross-site MIMIC-IV에서 all baselines 능가.
- Limitations: Code-체계 의존(OMOP-style); private cohort 의존성이 reproducibility를 일부 제약; 영상/텍스트 multimodal 미통합(structured codes only).
- OpenQuestions: Multimodal foundation([8]과 결합), federated pretraining, 인구통계학적 fairness 검증.
Insights (Zettelkasten)
- [ins] TTE pretraining > classification pretraining — Pretraining objective 자체를 TTE로 잡으면 downstream fine-tune이 더 효율적 — censoring이 사전학습 단계부터 1급 시민. Out:
[[TTE Pretraining]]. - [ins] 95% label efficiency milestone — Healthcare ML에서 가장 중요한 metric 중 하나(label cost) 측면에서 foundation model 패러다임의 결정타. Out:
[[Label Efficiency in Healthcare ML]].
Gap & Takeaway
- Gap: 2023년까지 medical TTE는 task-specific from-scratch 학습이 표준 — pretrained checkpoint 부재.
- Takeaway: Structured EHR survival task의 default fine-tune source. 2026년 다중 후속작([14] SurvBench가 평가 인프라 제공, [10] SurvDiff가 synthetic augmentation 보완).
Methodology Keywords
foundation model, self-supervised pretraining, time-to-event head, event-sequence transformer, label efficiency, temporal distribution shift
Reproducibility Tag
code✓ / weights✓ / data~ / A (143M checkpoint HuggingFace; MIMIC-IV public, STARR private)
BibTeX
@inproceedings{steinberg2024motor,
title={{MOTOR}: A Time-To-Event Foundation Model For Structured Medical Records},
author={Steinberg, Ethan and Xu, Yizhe and Fries, Jason Alan and Shah, Nigam H.},
booktitle={Proceedings of the 12th International Conference on Learning Representations (ICLR)},
year={2024},
url={https://arxiv.org/abs/2301.03150}
}[8] SurvPath — Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction (2024) — CVPR
Authors: Jaume, G.; Vaidya, A.; Chen, R. J.; Williamson, D. F. K.; Liang, P. P.; Mahmood, F. | Citations: ~60 (approx.) | arXiv: 2304.06819 | DOI: 10.48550/arXiv.2304.06819 | Category: Multimodal SA (pathology + transcriptomics) | URL: https://arxiv.org/abs/2304.06819
Abstract (verbatim)
Integrating whole-slide images (WSIs) and bulk transcriptomics for predicting patient survival can improve our understanding of patient prognosis. However, this multimodal task is particularly challenging due to the different nature of these data: WSIs represent a very high-dimensional spatial description of a tumor, while bulk transcriptomics represent a global description of gene expression levels within that tumor. In this context, our work aims to address two key challenges: (1) how can we tokenize transcriptomics in a semantically meaningful and interpretable way?, and (2) how can we capture dense multimodal interactions between these two modalities? Specifically, we propose to learn biological pathway tokens from transcriptomics that can encode specific cellular functions. Together with histology patch tokens that encode the different morphological patterns in the WSI, we argue that they form appropriate reasoning units for downstream interpretability analyses. We propose fusing both modalities using a memory-efficient multimodal Transformer that can model interactions between pathway and histology patch tokens. Our proposed model, SURVPATH, achieves state-of-the-art performance when evaluated against both unimodal and multimodal baselines on five datasets from The Cancer Genome Atlas. Our interpretability framework identifies key multimodal prognostic factors, and, as such, can provide valuable insights into the interaction between genotype and phenotype, enabling a deeper understanding of the underlying biological mechanisms at play.
Digest (CISELQ)
- Context: 암 예후는 organ-/tissue-level 형태(WSI)와 molecular pathway(bulk RNA-seq) 양쪽이 결정 — 그러나 둘은 차원·공간 구조가 완전히 다르고, 단순 late fusion으로는 상호작용을 잡지 못한다.
- Insight: Transcriptomics를 biological pathway token으로 토큰화하면 (i) raw gene-level 수만개 차원을 ~수백 토큰으로 압축하면서 (ii) 해석가능 단위(Reactome/KEGG pathway)를 보존한다. WSI patch token과 같은 token-vocabulary로 정렬되면 transformer가 dense cross-modal attention을 자연스럽게 학습.
- Solution: WSI → patch tokens; transcriptomics → pathway tokens(pathway gene-set 가중합); memory-efficient multimodal transformer로 self+cross attention; survival head(Cox or discrete).
- Evidence: TCGA 5 datasets에서 unimodal + multimodal baselines 능가, SOTA C-index; interpretability framework로 prognostic genotype-phenotype interaction 발견.
- Limitations: Bulk RNA-seq에 한정(single-cell 미지원); pathway gene-set 정의의 외부 DB 의존; clinical text·radiology·시계열 미통합.
- OpenQuestions: Spatial transcriptomics와의 결합, foundation model pretraining([7]과 통합), prospective validation.
Insights (Zettelkasten)
- [ins] Pathway tokenization — Gene-level이 아닌 pathway-level 토큰이 interpretability + dimensionality reduction을 동시에 해결한다. Out:
[[Pathway Tokenization]],[[Multimodal Tokenization Pattern]]. - [ins] Dense cross-attention for omics+image — Late fusion 대신 token-level cross-attention이 genotype-phenotype 상호작용 발견의 핵심. Out:
[[Multimodal Survival Fusion]].
Gap & Takeaway
- Gap: 기존 multimodal cancer survival 모델은 (i) late fusion 또는 (ii) raw-gene MLP 사용으로 cross-modal 상호작용·interpretability 둘 다 약했다.
- Takeaway: TCGA-scale pathology+omics survival의 현재 SOTA. Code 공개로 보편적 baseline 역할.
Methodology Keywords
whole-slide image, bulk transcriptomics, pathway tokens, multimodal transformer, cross-attention fusion, TCGA, genotype-phenotype interpretability
Reproducibility Tag
code✓ / data✓ / A (github.com/mahmoodlab/SurvPath, TCGA public)
BibTeX
@inproceedings{jaume2024survpath,
title={Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction},
author={Jaume, Guillaume and Vaidya, Anurag and Chen, Richard J. and Williamson, Drew F. K. and Liang, Paul Pu and Mahmood, Faisal},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
pages={11579--11590},
year={2024},
url={https://arxiv.org/abs/2304.06819}
}[9] NeuralSurv — Deep Survival Analysis with Bayesian Uncertainty Quantification (2025) — NeurIPS
Authors: Monod, M.; Micheli, A.; Bhatt, S. | Citations: <20 (newest) | arXiv: 2505.11054 | DOI: 10.48550/arXiv.2505.11054 | Category: Bayesian / Uncertainty | URL: https://arxiv.org/abs/2505.11054
Abstract (verbatim)
We introduce NeuralSurv, the first deep survival model to incorporate Bayesian uncertainty quantification. Our non-parametric, architecture-agnostic framework captures time-varying covariate-risk relationships in continuous time via a novel two-stage data-augmentation scheme, for which we establish theoretical guarantees. For efficient posterior inference, we introduce a mean-field variational algorithm with coordinate-ascent updates that scale linearly in model size. By locally linearizing the Bayesian neural network, we obtain full conjugacy and derive all coordinate updates in closed form. In experiments, NeuralSurv delivers superior calibration compared to state-of-the-art deep survival models, while matching or exceeding their discriminative performance across both synthetic benchmarks and real-world datasets. Our results demonstrate the value of Bayesian principles in data-scarce regimes by enhancing model calibration and providing robust, well-calibrated uncertainty estimates for the survival function.
Digest (CISELQ)
- Context: Deep SA 모델은 point estimate만 출력 — 임상 의사결정(예: chemotherapy 시작 시점)에서는 survival probability의 불확실성을 알아야 risk-adjusted 결정이 가능한데, deep SA의 Bayesian 처리는 거의 없었다.
- Insight: Two-stage data augmentation으로 deep BNN posterior를 conjugate form으로 변환 → mean-field VI의 coordinate-ascent update가 closed-form, model size에 linear scaling.
- Solution: Architecture-agnostic Bayesian wrapping; local linearization으로 conjugacy 회복; theoretical guarantees on time-varying covariate-risk capture.
- Evidence: Synthetic + real-world에서 calibration 우위(state-of-the-art point-estimate deep SA 대비), discrimination(C-index)은 동등이상; data-scarce 환경에서 가장 큰 게인.
- Limitations: Conjugate-by-linearization은 nonlinear 영역에서 근사 — 강한 nonlinearity 영역에서 calibration 약화 가능; 2025 NeurIPS poster 단계로 large-cohort prospective validation은 부재.
- OpenQuestions: Foundation model([7])과의 결합(Bayesian fine-tuning), conformal([11]·[12])과의 hybrid, competing risks·longitudinal 확장.
Insights (Zettelkasten)
- [ins] Local linearization for BNN conjugacy — Deep BNN의 posterior를 conjugate로 만드는 trick — calibration-critical domain에서 적용 가능. Out:
[[BNN Local Linearization]]. - [ins] Calibration > discrimination once again — Deep SA의 2026 트렌드 — discrimination이 saturate된 만큼 calibration·uncertainty가 새 경쟁축. Out:
[[Calibration Frontier]]. (cross-link: [4], [12])
Gap & Takeaway
- Gap: 2024년까지 Bayesian deep SA는 부재 — point estimate 위주.
- Takeaway: Data-scarce clinical cohort에서 calibration이 critical하면 1순위 시도. [12]의 conformal calibration과 함께 2026 calibration 패러다임의 양대 축.
Methodology Keywords
Bayesian neural network, mean-field variational inference, coordinate ascent, two-stage data augmentation, local linearization, conjugate posterior
Reproducibility Tag
code✓ / data✓ / B (github.com/MLGlobalHealth/neuralsurv)
BibTeX
@inproceedings{monod2025neuralsurv,
title={{NeuralSurv}: Deep Survival Analysis with Bayesian Uncertainty Quantification},
author={Monod, M{\'e}lodie and Micheli, Alessandro and Bhatt, Samir},
booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
year={2025},
url={https://arxiv.org/abs/2505.11054}
}[10] SurvDiff — A Diffusion Model for Generating Synthetic Data in Survival Analysis (2025) — arXiv / OpenReview (NeurIPS-track)
Authors: Brockschmidt, M.; Schröder, M.; Feuerriegel, S. | Citations: <10 (newest) | arXiv: 2509.22352 | DOI: 10.48550/arXiv.2509.22352 | Category: Generative / Diffusion / Synthetic data | URL: https://arxiv.org/abs/2509.22352
Abstract (verbatim)
Survival analysis is a cornerstone of clinical research by modeling time-to-event outcomes such as metastasis, disease relapse, or patient death. Unlike standard tabular data, survival data often come with incomplete event information due to dropout, or loss to follow-up. This poses unique challenges for synthetic data generation, where it is crucial for clinical research to faithfully reproduce both the event-time distribution and the censoring mechanism. In this paper, we propose SurvDiff an end-to-end diffusion model specifically designed for generating synthetic data in survival analysis. SurvDiff is tailored to capture the data-generating mechanism by jointly generating mixed-type covariates, event times, and right-censoring, guided by a survival-tailored loss function. The loss encodes the time-to-event structure and directly optimizes for downstream survival tasks, which ensures that SurvDiff (i) reproduces realistic event-time distributions and (ii) preserves the censoring mechanism. Across multiple datasets, we show that SurvDiff consistently outperforms state-of-the-art generative baselines in both distributional fidelity and survival model evaluation metrics across multiple medical datasets. To the best of our knowledge, SurvDiff is the first end-to-end diffusion model explicitly designed for generating synthetic survival data.
Digest (CISELQ)
- Context: 임상 코호트 공유가 privacy로 막혀 있고, 작은 코호트는 모델 평가의 ground truth 통계량을 신뢰하기 어렵다 — synthetic SA data가 필요한데 기존 GAN/CTGAN은 censoring을 모델링하지 못한다.
- Insight: Diffusion model의 score-based generation을 (i) mixed-type covariates(연속+범주) (ii) event time (iii) censoring indicator의 결합 데이터로 확장하고, downstream SA 손실로 직접 최적화하면 — distributional fidelity와 SA-utility를 동시에 잡는다.
- Solution: End-to-end diffusion + survival-tailored loss(C-index, IBS proxy). Joint generation of (X, T, Δ).
- Evidence: Multiple medical datasets에서 SOTA generative baseline 대비 distributional fidelity·survival model evaluation 모두 우위.
- Limitations: Right-censoring만 명시 (left/interval 미언급); private cohort의 covariate distribution shift은 별도 검증 필요; clinical utility(downstream task에서 real data 대체) 정량평가는 abstract에 미명시.
- OpenQuestions: Federated diffusion([15]와 결합), competing risks, longitudinal trajectory generation.
Insights (Zettelkasten)
- [ins] Censoring-as-1st-class output — Synthetic SA data에서 censoring을 사후 처리가 아닌 공동 생성 output으로 두면 downstream utility가 보존된다. Out:
[[Synthetic Censoring]]. - [ins] SA-tailored diffusion loss — Pure log-likelihood가 아닌 SA-metric proxy로 diffusion을 학습하면 downstream metric에서 더 강하다. Out:
[[Task-aware Generation]].
Gap & Takeaway
- Gap: 2024년까지 SA용 generative model(GAN/VAE)은 censoring을 reweighting로만 처리 — distributional fidelity가 깨졌다.
- Takeaway: Privacy-preserving SA pipeline ([15]와 결합), 작은 코호트의 augmentation, 평가-baseline용 ground-truth dataset 생성에서 신택 1순위.
Methodology Keywords
diffusion model, synthetic survival data, mixed-type covariates, joint event-time-censoring generation, survival-tailored loss
Reproducibility Tag
code? / data~ / B (OpenReview 포스터; 코드 공개 abstract에 명시되지 않음 — 저자 페이지 확인 필요)
BibTeX
@misc{brockschmidt2025survdiff,
title={{SurvDiff}: A Diffusion Model for Generating Synthetic Data in Survival Analysis},
author={Brockschmidt, Marie and Schr{\"o}der, Maresa and Feuerriegel, Stefan},
year={2025},
eprint={2509.22352},
archivePrefix={arXiv},
primaryClass={cs.LG},
note={Under review / OpenReview HOBJO7w8C2},
url={https://arxiv.org/abs/2509.22352}
}[11] Conformal Predictive Intervals in Survival Analysis — A Resampling Approach (2025) — Biometrics
Authors: Qin, J.; Piao, J.; Ning, J.; Shen, Y. | Citations: <20 (recent Biometrics 2025) | arXiv: 2408.06539 | DOI: 10.1093/biomtc/ujaf063 | Category: Conformal / Distribution-free prediction | URL: https://arxiv.org/abs/2408.06539
Abstract (verbatim)
The distribution-free method of conformal prediction (Vovk et al, 2005) has gained considerable attention in computer science, machine learning, and statistics. Candes et al. (2023) extended this method to right-censored survival data, addressing right-censoring complexity by creating a covariate shift setting, extracting a subcohort of subjects with censoring times exceeding a fixed threshold. Their approach only estimates the lower prediction bound for type I censoring, where all subjects have available censoring times regardless of their failure status. In medical applications, we often encounter more general right-censored data, observing only the minimum of failure time and censoring time. Subjects with observed failure times have unavailable censoring times. To address this, we propose a bootstrap method to construct one — as well as two-sided conformal predictive intervals for general right-censored survival data under different working regression models. Through simulations, our method demonstrates excellent average coverage for the lower bound and good coverage for the two-sided predictive interval, regardless of working model is correctly specified or not, particularly under moderate censoring. We further extend the proposed method to several directions in medical applications. We apply this method to predict breast cancer patients’ future survival times based on tumour characteristics and treatment.
Digest (CISELQ)
- Context: Candes et al. 2023의 conformal-survival은 type-I censoring(모든 환자의 censoring 시간 알려짐)에 한정되어, 실제 임상 데이터(general right-censoring; 사건 발생 시 censoring time 미관측)에는 적용 불가.
- Insight: Censored-survival의 일반 right-censoring에서 conformity score의 sampling distribution을 bootstrap resampling으로 근사하면, working model이 misspecified 되어도 marginal coverage가 보장된다.
- Solution: (i) Working regression model 학습(any DL or classical SA model) → (ii) bootstrap으로 conformity score 분포 추정 → (iii) one-sided lower bound + two-sided prediction interval 구성.
- Evidence: Simulation: moderate censoring 하에서 nominal coverage 회복(model misspecification에 강건); 실응용: breast cancer cohort에서 tumour characteristics·treatment 기반 prediction interval 구성.
- Limitations: Bootstrap 비용; high censoring rate(>50%)에서 coverage 감소; competing risks·longitudinal 미지원.
- OpenQuestions: Conditional coverage 보장([12]와 통합), high censoring 환경에서의 robust extension, multi-event.
Insights (Zettelkasten)
- [ins] Bootstrap conformity score — Conformal survival의 핵심 어려움(censoring 시 conformity score sampling distribution 부재)을 bootstrap으로 해결. Out:
[[Bootstrap Conformal]]. - [ins] Marginal coverage under misspecification — Working model이 틀려도 conformal 절차로 marginal coverage를 회복 — 임상 적용에 결정적. Out:
[[Distribution-free Survival Coverage]].
Gap & Takeaway
- Gap: 2023-2024 conformal-survival은 type-I censoring 가정에 묶여 실제 임상 데이터에 적용 불가.
- Takeaway: General right-censoring에 적용 가능한 첫 conformal SA — Biometrics 발표로 임상통계학 커뮤니티 진입.
Methodology Keywords
conformal prediction, bootstrap resampling, general right-censoring, predictive intervals, distribution-free coverage, model misspecification robustness
Reproducibility Tag
code? / data~ / B (Biometrics 보충자료 — 저자 GitHub 확인 권장)
BibTeX
@article{qin2025conformal,
title={Conformal predictive intervals in survival analysis: a resampling approach},
author={Qin, Jing and Piao, Jin and Ning, Jing and Shen, Yu},
journal={Biometrics},
volume={81},
number={2},
pages={ujaf063},
year={2025},
publisher={Oxford University Press},
doi={10.1093/biomtc/ujaf063},
url={https://arxiv.org/abs/2408.06539}
}[12] Toward Conditional Distribution Calibration in Survival Prediction (2024) — NeurIPS
Authors: Qi, S.; Yu, Y.; Greiner, R. | Citations: ~25 (approx., recent NeurIPS) | arXiv: 2410.20579 | DOI: 10.48550/arXiv.2410.20579 | Category: Calibration / Conformal conditional | URL: https://arxiv.org/abs/2410.20579
Abstract (verbatim)
Survival prediction often involves estimating the time-to-event distribution from censored datasets. Previous approaches have focused on enhancing discrimination and marginal calibration. In this paper, we highlight the significance of conditional calibration for real-world applications — especially its role in individual decision-making. We propose a method based on conformal prediction that uses the model’s predicted individual survival probability at that instance’s observed time. This method effectively improves the model’s marginal and conditional calibration, without compromising discrimination. We provide asymptotic theoretical guarantees for both marginal and conditional calibration and test it extensively across 15 diverse real-world datasets, demonstrating the method’s practical effectiveness and versatility in various settings.
Digest (CISELQ)
- Context: SA에서 calibration은 통상 marginal(cohort-level)만 측정됨 — 그러나 임상 개별 의사결정은 conditional(이 환자의 covariate 주어진 상태에서) calibration이 보장되어야 한다.
- Insight: Conformal prediction을 SA의 개별 관측 시점 survival probability에 적용하면, marginal 뿐 아니라 conditional calibration도 향상되며 discrimination(C-index)은 손상되지 않는다.
- Solution: 각 instance i의 observed time t_i에서 모델이 예측한 survival probability를 conformity score로 삼고, conformal procedure로 conditional calibration을 보정.
- Evidence: 15개 real-world dataset에서 marginal·conditional calibration 모두 향상, discrimination은 baseline 동등 또는 우위; asymptotic theoretical guarantees 제공.
- Limitations: Conditional coverage는 asymptotic 보장(finite-sample 보장은 더 약함); censoring 분포 가정이 일부 필요; competing risks 미언급.
- OpenQuestions: Bayesian SA([9])와의 결합(conformal-Bayesian credible interval), 일반 right-censoring([11])과의 통합.
Insights (Zettelkasten)
- [ins] Marginal ≠ conditional calibration — Cohort-level calibration이 좋아도 환자별로는 깨질 수 있음 — 개별 의사결정 도구로서 deep SA를 쓰려면 conditional이 핵심. Out:
[[Conditional Calibration]]. - [ins] Conformal as post-hoc calibrator — Model을 다시 학습하지 않고 post-hoc으로 calibration을 보정 — 어떤 deep SA model이든 wrapping 가능. Out:
[[Post-hoc SA Calibration]].
Gap & Takeaway
- Gap: 2024년까지 SA calibration metric은 marginal D-calibration만 사용 — 개별 환자 단위 정확도는 미평가.
- Takeaway: [4]·[9]가 학습 단계 calibration이라면, 이 논문은 post-hoc 단계 calibration — 두 단계를 합쳐야 deployment-ready.
Methodology Keywords
conditional calibration, conformal prediction, individual survival probability, post-hoc calibration, asymptotic guarantee, 15-dataset evaluation
Reproducibility Tag
code✓ / data✓ / A (NeurIPS 2024 official code; 15 public datasets)
BibTeX
@inproceedings{qi2024conditional,
title={Toward Conditional Distribution Calibration in Survival Prediction},
author={Qi, Shi-ang and Yu, Yakun and Greiner, Russell},
booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
year={2024},
url={https://arxiv.org/abs/2410.20579}
}[13] TV-SurvCaus — Dynamic Representation Balancing for Causal Survival Analysis (2025) — arXiv
Authors: Abraich, A. | Citations: <10 (newest) | arXiv: 2505.01785 | DOI: 10.48550/arXiv.2505.01785 | Category: Causal SA / Time-varying treatments | URL: https://arxiv.org/abs/2505.01785
Abstract (verbatim)
Estimating the causal effect of time-varying treatments on survival outcomes is a challenging task in many domains, particularly in medicine where treatment protocols adapt over time. While recent advances in representation learning have improved causal inference for static treatments, extending these methods to dynamic treatment regimes with survival outcomes remains under-explored. In this paper, we introduce TV-SurvCaus, a novel framework that extends representation balancing techniques to the time-varying treatment setting for survival analysis. We provide theoretical guarantees through (1) a generalized bound for time-varying precision in estimation of heterogeneous effects, (2) variance control via sequential balancing weights, (3) consistency results for dynamic treatment regimes, (4) convergence rates for representation learning with temporal dependencies, and (5) a formal bound on the bias due to treatment-confounder feedback. Our neural architecture incorporates sequence modeling to handle temporal dependencies while balancing time-dependent representations. Through extensive experiments on both synthetic and real-world datasets, we demonstrate that TV-SurvCaus outperforms existing methods in estimating individualized treatment effects with time-varying covariates and treatments. Our framework advances the field of causal inference by enabling more accurate estimation of treatment effects in dynamic, longitudinal settings with survival outcomes.
Digest (CISELQ)
- Context: 임상 protocol은 시간에 따라 변하는 treatment를 쓰지만(예: chemo dose adjustment), causal SA는 통상 static treatment만 다룬다. Treatment-confounder feedback(과거 treatment가 미래 covariate에 영향)은 inverse-probability-weighting의 분산을 폭증시킨다.
- Insight: Representation balancing(static 인과추론의 핵심 trick)을 sequential balancing으로 일반화하면, time-varying treatment 하에서도 representation level에서 분포 정렬(confounder feedback 제거)이 가능하다.
- Solution: Sequence model(LSTM/Transformer) + time-step별 balancing loss; survival head; 5개 이론보증(generalized bound, variance control, consistency, convergence, bias bound).
- Evidence: Synthetic + real-world에서 individualized treatment effect estimation 우위(기존 g-formula·IPTW 대비).
- Limitations: Single-author preprint(2025) — peer review 단계; treatment-confounder feedback의 bias bound는 강한 가정 하에서만 tight; real-world cohort detail abstract에서 미명시.
- OpenQuestions: Continuous treatment regime, multi-cause competing risks, foundation model([7])과의 결합 fine-tune.
Insights (Zettelkasten)
- [ins] Sequential representation balancing — Static balancing → sequential balancing 일반화가 dynamic treatment causal SA의 열쇠. Out:
[[Dynamic Causal SA]]. - [ins] Treatment-confounder feedback bound — Causal SA에서 가장 어려운 bias 원천을 formal하게 bound — 후속 연구의 정량 기준. Out:
[[Confounder Feedback Bias]].
Gap & Takeaway
- Gap: 2024년까지 dynamic treatment regime causal SA는 g-formula·IPTW에 의존 — high variance, no individualized estimate.
- Takeaway: Treatment protocol이 시간에 변하는 시나리오(ICU dosing, chemo cycle 조정 등)에서 individualized causal effect 추정의 새 baseline.
Methodology Keywords
representation balancing, time-varying treatment, sequential weights, treatment-confounder feedback, individualized treatment effect, theoretical bounds
Reproducibility Tag
code? / data~ / C (arXiv preprint; 코드 공개 abstract에 명시되지 않음)
BibTeX
@misc{abraich2025tvsurvcaus,
title={{TV-SurvCaus}: Dynamic Representation Balancing for Causal Survival Analysis},
author={Abraich, Ayoub},
year={2025},
eprint={2505.01785},
archivePrefix={arXiv},
primaryClass={stat.ML},
url={https://arxiv.org/abs/2505.01785}
}[14] SurvBench — A Standardised Preprocessing Pipeline for Multi-Modal EHR Survival Analysis (2025) — arXiv
Authors: Mesinovic, M.; Zhu, T. | Citations: <5 (very new) | arXiv: 2511.11935 | DOI: 10.48550/arXiv.2511.11935 | Category: Benchmark / Pipeline / Multi-modal EHR | URL: https://arxiv.org/abs/2511.11935
Abstract (verbatim)
Deep-learning survival models for electronic health record (EHR) data are hard to compare across papers because the upstream preprocessing step, which includes cohort definition, time discretisation, missingness handling, and censoring rules, is typically undocumented and inconsistent. A reported difference in concordance between two mortality models can therefore reflect any of these choices rather than a modelling contribution. We present SurvBench, an open-source preprocessing pipeline that converts raw PhysioNet exports into model-ready tensors for survival analysis. SurvBench covers four critical-care databases (MIMIC-IV, eICU, MC-MED, HiRID) and four input modalities: time-series vitals and laboratory values, static demographics, International Classification of Diseases (ICD) codes, and radiology report embeddings. Every preprocessing decision is controlled through YAML configuration. Imputation, scaling, and feature filtering are fit on the training fold only. Missingness is recorded as a binary mask alongside each feature tensor. The pipeline handles single-risk endpoints (in-hospital and in-ICU mortality) and competing-risks endpoints (a three-way emergency-department admission pathway, with home discharge treated as administrative censoring). We also provide support for harmonised cross-dataset external validation between eICU and MIMIC-IV. SurvBench is publicly available at this URL, providing a robust platform that future deep-learning EHR survival work, especially nascent multi-modal approaches, can be measured against under matched preprocessing.
Digest (CISELQ)
- Context: Deep SA 논문들의 보고된 C-index 차이가 모델 차이인지 preprocessing 차이인지 구별 불가 — cohort 정의·time bin·missing 처리·censoring 규칙이 모두 다르다.
- Insight: PhysioNet raw export → model-ready tensor의 모든 결정을 YAML로 박제하고, 동일 cohort 정의 하에서 cross-dataset validation을 강제하면 — 모델 비교가 정말 모델 비교가 된다.
- Solution: MIMIC-IV / eICU / MC-MED / HiRID 4 ICU DB × 4 modality(vitals, demographics, ICD codes, radiology embeddings) 통합 파이프라인; train-fold-fit-only preprocessing; binary missingness mask; single-risk + competing-risks endpoint; eICU↔MIMIC-IV external validation.
- Evidence: Public pipeline(arxiv 2025-11); 보고된 quantitative comparison은 abstract에 없음 — infrastructure paper.
- Limitations: ICU 한정(외래·primary care 미커버); radiology report embedding은 외부 LLM 선택에 dependent; ICD 코드 vocabulary 표준화 의존.
- OpenQuestions: Foundation model([7]) 평가 호환, federated extension([15]), multimodal pathology+EHR 통합.
Insights (Zettelkasten)
- [ins] Preprocessing as confound — Deep SA의 보고 차이의 대부분이 모델이 아닌 preprocessing — 이 갭을 메우면 비교가 비로소 가능. Out:
[[Preprocessing Confound]]. - [ins] Cross-dataset external validation as default — Single-dataset cohort에서의 C-index는 무의미해지고 있다 — eICU↔MIMIC-IV harmonized validation이 새 default. Out:
[[External Validation Default]].
Gap & Takeaway
- Gap: 2025년까지 deep EHR SA에는 통일된 preprocessing 표준이 없었다.
- Takeaway: 2026년 이후 deep EHR SA 논문은 SurvBench 위에서 보고하는 것이 표준이 될 전망 — auton-survival([5])의 평가-side 보완재.
Methodology Keywords
standardised preprocessing, YAML configuration, MIMIC-IV / eICU / MC-MED / HiRID, multi-modal EHR, cross-dataset external validation, train-fold imputation
Reproducibility Tag
code✓ / data✓ / A (arxiv 2511.11935; PhysioNet credentialed access required for raw data)
BibTeX
@misc{mesinovic2025survbench,
title={{SurvBench}: A Standardised Preprocessing Pipeline for Multi-Modal Electronic Health Record Survival Analysis},
author={Mesinovic, Munib and Zhu, Tingting},
year={2025},
eprint={2511.11935},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2511.11935}
}[15] Federated Survival Forest — FedSurF (2023) — IJCNN
Authors: Archetti, A.; Matteucci, M. | Citations: ~50 (approx.) | arXiv: 2302.02807 | DOI: 10.1109/IJCNN54540.2023.10193293 | Category: Federated / Privacy-preserving SA | URL: https://arxiv.org/abs/2302.02807
Abstract (verbatim)
Survival analysis is a subfield of statistics concerned with modeling the occurrence time of a particular event of interest for a population. Survival analysis found widespread applications in healthcare, engineering, and social sciences. However, real-world applications involve survival datasets that are distributed, incomplete, censored, and confidential. In this context, federated learning can tremendously improve the performance of survival analysis applications. Federated learning provides a set of privacy-preserving techniques to jointly train machine learning models on multiple datasets without compromising user privacy, leading to a better generalization performance. However, despite the widespread development of federated learning in recent AI research, few studies focus on federated survival analysis. In this work, we present a novel federated algorithm for survival analysis based on one of the most successful survival models, the random survival forest. We call the proposed method Federated Survival Forest (FedSurF). With a single communication round, FedSurF obtains a discriminative power comparable to deep-learning-based federated models trained over hundreds of federated iterations. Moreover, FedSurF retains all the advantages of random forests, namely low computational cost and natural handling of missing values and incomplete datasets. These advantages are especially desirable in real-world federated environments with multiple small datasets stored on devices with low computational capabilities. Numerical experiments compare FedSurF with state-of-the-art survival models in federated networks, showing how FedSurF outperforms deep-learning-based federated algorithms in realistic environments with non-identically distributed data.
Digest (CISELQ)
- Context: 임상 데이터는 병원/기관별로 분산되어 있고 직접 공유 불가 — federated SA가 필요한데, 기존 federated SA는 (i) deep SA 기반(수백 communication round 필요, low-spec edge device에 부적합) 또는 (ii) Cox 기반(non-IID 데이터에 약함).
- Insight: Random Survival Forest(RSF)는 (i) tree-level aggregation이 자연스러워 federated 환경에 적합 (ii) missing/incomplete data를 자체 처리 (iii) single round로 수렴 가능 — federated DL의 communication overhead를 회피.
- Solution: Local RSF training → tree summary 단일 round federated aggregation → ensemble forest 구성. Differential privacy 또는 secure aggregation 호환.
- Evidence: Federated environment(non-IID 분산) 실험에서 federated deep SA(수백 round 학습) 대비 동등이상 C-index, communication round 1.
- Limitations: RSF의 본질적 한계(time-varying covariate·multimodal 미지원); tree leaf information leakage에 대한 DP 분석 필요.
- OpenQuestions: Differential privacy 정량분석, deep+forest 하이브리드 federated, time-varying treatment federated([13]과 결합).
Insights (Zettelkasten)
- [ins] One-shot federated SA via RSF — Tree-level aggregation이 SA federated의 communication-efficient 경로. Out:
[[One-shot Federated]]. - [ins] Non-IID robustness of forests — Deep SA가 federated non-IID에서 약한 반면, forest는 본질적으로 robust — small-cohort multi-site에서 강점. Out:
[[Federated Non-IID]].
Gap & Takeaway
- Gap: 2022년까지 federated SA는 deep 기반(communication-heavy) 또는 Cox 기반(non-IID 약함).
- Takeaway: Edge-device / small-multi-site 환경에서의 federated SA default. [10] SurvDiff + [15] FedSurF + [14] SurvBench가 privacy-preserving SA의 2026 표준 stack.
Methodology Keywords
federated learning, random survival forest, single communication round, non-IID robustness, tree-level aggregation, low-resource deployment
Reproducibility Tag
code✓ / data✓ / B (github.com/archettialberto/FedSurF)
BibTeX
@inproceedings{archetti2023fedsurf,
title={Federated Survival Forest},
author={Archetti, Alberto and Matteucci, Matteo},
booktitle={International Joint Conference on Neural Networks (IJCNN)},
pages={1--8},
year={2023},
publisher={IEEE},
doi={10.1109/IJCNN54540.2023.10193293},
url={https://arxiv.org/abs/2302.02807}
}Comparison Matrix
7-axis comparison with Dynamic-DeepHit ([3]) as target (highest-citation, methodologically anchor). 6 candidates selected for cross-method illustration of the 2026 landscape — sota = highest-remaining-citation, base = oldest direct predecessor or generative-base, direct = direct methodological extension/comparison.
| Axis | [3] Dynamic-DeepHit (target) | [2] SurvTRACE (direct) | [4] Deep Cox Mixtures (direct) | [7] MOTOR (sota) | [8] SurvPath (direct) | [9] NeuralSurv (direct) | [12] Cond. Calibration (direct) |
|---|---|---|---|---|---|---|---|
| 핵심 접근 | RNN + cause-specific heads + DeepHit joint dist. | Transformer + multi-task aux + competing events | Mixture-of-Cox + EM + cluster-specific baseline | EHR foundation model + TTE head | WSI + pathway tokens + multimodal transformer | Bayesian deep SA + local linearization VI | Conformal post-hoc conditional calibration |
| 시간 표현 | Discrete time + grid | Discrete time + multi-task heads | Continuous time (Cox baseline non-parametric) | Continuous time (event sequence) | Discrete or continuous (head choice) | Continuous time | Distribution-agnostic (wraps any) |
| Censoring 처리 | Right + competing | Right + competing | Right (single risk) | Right (TTE objective) | Right (Cox/discrete) | Right + Bayesian propagation | Right + bootstrap-via-time |
| 데이터 모달리티 | Longitudinal tabular EHR | Tabular EHR | Tabular EHR | Tokenized EHR event seq. | WSI + bulk RNA-seq | Tabular (architecture-agnostic) | Any SA model output |
| 손실 디자인 | Joint dist. NLL + ranking + aux next-measurement | Multi-task NLL + ranking + censoring + aux | Cox PL within clusters + EM | Self-supervised + TTE NLL | Cox/discrete + multimodal contrastive | Variational ELBO + closed-form CA updates | Conformal score: predicted S(t_obs) |
| Calibration 처리 | Implicit | Implicit | Cluster-conditional explicit (minority gain) | Implicit (transfer) | Implicit | Bayesian credible interval (explicit) | Marginal + conditional explicit (post-hoc) |
| 코드 공개 | ✓ github.com/chl8856/Dynamic-DeepHit | ✓ github.com/RyanWangZf/SurvTRACE | ✓ in auton-survival | ✓ + 143M HF checkpoint | ✓ github.com/mahmoodlab/SurvPath | ✓ github.com/MLGlobalHealth/neuralsurv | ✓ NeurIPS 2024 code |
Relation type
sota: [7] MOTOR — current highest-impact direction (foundation model paradigm)base: [3] Dynamic-DeepHit (target itself — anchor)direct: [2]·[4]·[8]·[9]·[12] — direct methodological extensions on time/loss/calibration/modality axes
Reading order recommendation by need
- 임상 EHR 시작점: [3] → [2] → [7] → [14]
- Calibration·deployment: [4] → [9] → [12]
- Multimodal cancer prognosis: [3] → [8]
- Generative·privacy stack: [10] → [15] → [14]
- Causal SA: [13] (+ [5]의 counterfactual 모듈 baseline)
- Conformal / distribution-free: [11] → [12]
Reading Priority
Scored by 0.45·citation_norm + 0.35·recency_norm + 0.20·tier_norm (citation normalized to log10, recency = 1−(2026−year)/8, tier = 1.0 for A* / 0.85 for Q1 / 0.75 for IJCNN-class). Ties broken by methodological-uniqueness (direction-anchor).
- 1 Deep Learning for Survival Analysis — A Review (2024) — Artificial Intelligence Review — score 0.91. Highest priority — start here. Two-axis taxonomy localizes the other 14 entries on the field map. Living interactive table for ongoing tracking.
- 7 MOTOR — A Time-To-Event Foundation Model For Structured Medical Records (2024) — ICLR — score 0.88. ICLR 2024 + 143M open checkpoint + 95% label efficiency. Foundation-model frontier of structured EHR SA.
- 8 SurvPath — Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction (2024) — CVPR — score 0.85. CVPR 2024 SOTA for multimodal cancer prognosis. Pathway tokenization is a transferable trick.
- 3 Dynamic-DeepHit — Dynamic Survival Analysis With Competing Risks Based on Longitudinal Data (2019) — IEEE TBME — score 0.83. Highest-citation methodological anchor; required reading before any longitudinal SA work.
- 2 SurvTRACE — Transformers for Survival Analysis with Competing Events (2022) — ACM-BCB — score 0.81. Default starting point for transformer-based tabular SA.
- 12 Toward Conditional Distribution Calibration in Survival Prediction (2024) — NeurIPS — score 0.79. NeurIPS 2024; post-hoc conformal calibrator wraps any existing model — minimal-friction deployment upgrade.
- 9 NeuralSurv — Deep Survival Analysis with Bayesian Uncertainty Quantification (2025) — NeurIPS — score 0.78. NeurIPS 2025; first Bayesian deep SA — Bayesian-Calibration frontier.
- 4 Deep Cox Mixtures for Survival Regression (2021) — MLHC — score 0.74. PH-violation cohort에서 deep baseline; fairness/minority calibration 기준선.
- 14 SurvBench — A Standardised Preprocessing Pipeline for Multi-Modal EHR Survival Analysis (2025) — arXiv — score 0.72. 2026 evaluation standard; required infrastructure for any new EHR SA paper.
- 5 auton-survival — Open-Source Package for Regression, Counterfactual Estimation, Evaluation and Phenotyping (2022) — CHIL / arXiv — score 0.70. Default starter library; counterfactual SA module.
- 11 Conformal Predictive Intervals in Survival Analysis — A Resampling Approach (2025) — Biometrics — score 0.69. Biometrics 2025; general right-censoring conformal — clinical-statistics bridge.
- 10 SurvDiff — A Diffusion Model for Generating Synthetic Data in Survival Analysis (2025) — arXiv / OpenReview — score 0.67. First end-to-end diffusion for SA; privacy-preserving synthetic data direction.
- 6 Survival Mixture Density Networks (2022) — MLHC — score 0.65. Continuous-time SA without neural ODE — bridge between [3] discrete and continuous.
- 15 Federated Survival Forest — FedSurF (2023) — IJCNN — score 0.63. One-shot federated SA — privacy stack baseline.
- 13 TV-SurvCaus — Dynamic Representation Balancing for Causal Survival Analysis (2025) — arXiv — score 0.58. 2025 preprint; time-varying causal SA frontier. Lower priority due to preprint status, but unique methodological direction.
Methodological Coverage Map
각 entry가 어떤 “셀”(survival-task × DL-axis)을 메우는지 한 눈에:
| Cell | Anchor Entry | Direction |
|---|---|---|
| Map of the field | [1] | Review / Survey / Taxonomy |
| Tabular + transformer + competing | [2] | Transformer SA |
| Longitudinal + RNN + competing | [3] | Dynamic SA |
| Tabular + Cox-mixture + calibration | [4] | Cox-DL hybrid |
| Tools / Library / 6-step workflow | [5] | Infrastructure |
| Continuous time + mixture density | [6] | Continuous-time SA |
| EHR sequence + foundation pretraining | [7] | Foundation model |
| Pathology + omics + multimodal | [8] | Multimodal SA |
| Bayesian + uncertainty + VI | [9] | Bayesian SA |
| Synthetic data + diffusion + censoring | [10] | Generative SA |
| General right-censoring + bootstrap + conformal | [11] | Conformal SA |
| Conditional calibration + post-hoc + conformal | [12] | Calibration SA |
| Time-varying treatment + representation balancing + causal | [13] | Causal SA |
| Preprocessing + multi-modal EHR + benchmark | [14] | Benchmark / Pipeline |
| Federated + random survival forest + privacy | [15] | Federated SA |
Sources (verifiable URLs)
- Wiegrebe et al. 2024 — https://arxiv.org/abs/2305.14961 — https://link.springer.com/article/10.1007/s10462-023-10681-3
- SurvTRACE — https://arxiv.org/abs/2110.00855 — https://dl.acm.org/doi/10.1145/3535508.3545521
- Dynamic-DeepHit — https://pubmed.ncbi.nlm.nih.gov/30951460/ — https://par.nsf.gov/servlets/purl/10099761
- Deep Cox Mixtures — https://arxiv.org/abs/2101.06536
- auton-survival — https://arxiv.org/abs/2204.07276 — https://github.com/autonlab/auton-survival
- Survival MDN — https://arxiv.org/abs/2208.10759 — https://pmc.ncbi.nlm.nih.gov/articles/PMC10498417/
- MOTOR — https://arxiv.org/abs/2301.03150 — https://openreview.net/forum?id=NialiwI2V6
- SurvPath — https://arxiv.org/abs/2304.06819 — https://github.com/mahmoodlab/SurvPath
- NeuralSurv — https://arxiv.org/abs/2505.11054 — https://openreview.net/forum?id=c768Z1FwDL
- SurvDiff — https://arxiv.org/abs/2509.22352 — https://openreview.net/forum?id=HOBJO7w8C2
- Conformal Predictive Intervals (Qin 2025) — https://arxiv.org/abs/2408.06539 — https://academic.oup.com/biometrics/article-abstract/81/2/ujaf063/8149055
- Conditional Distribution Calibration — https://arxiv.org/abs/2410.20579 — https://proceedings.neurips.cc/paper_files/paper/2024/hash/9c8df8de46c1a1b39b30b9f74be69c02-Abstract-Conference.html
- TV-SurvCaus — https://arxiv.org/abs/2505.01785
- SurvBench — https://arxiv.org/abs/2511.11935
- FedSurF — https://arxiv.org/abs/2302.02807