Cost Prediction Using a Survival Grouping Algorithm: An Application to Incident Prostate Cancer Cases

from Latest Results for PharmacoEconomics at on December 29, 2015



Prognostic classification approaches are commonly used in clinical practice to predict health outcomes. However, there has been limited focus on use of the general approach for predicting costs. We applied a grouping algorithm designed for large-scale data sets and multiple prognostic factors to investigate whether it improves cost prediction among older Medicare beneficiaries diagnosed with prostate cancer.


We analysed the linked Surveillance, Epidemiology and End Results (SEER)-Medicare data, which included data from 2000 through 2009 for men diagnosed with incident prostate cancer between 2000 and 2007. We split the survival data into two data sets (D0 and D1) of equal size. We trained the classifier of the Grouping Algorithm for Cancer Data (GACD) on D0 and tested it on D1. The prognostic factors included cancer stage, age, race and performance status proxies. We calculated the average difference between observed D1 costs and predicted D1 costs at 5 years post-diagnosis with and without the GACD.


The sample included 110,843 men with prostate cancer. The median age of the sample was 74 years, and 10 % were African American. The average difference (mean absolute error [MAE]) per person between the real and predicted total 5-year cost was US$41,525 (MAE US$41,790; 95 % confidence interval [CI] US$41,421–42,158) with the GACD and US$43,113 (MAE US$43,639; 95 % CI US$43,062–44,217) without the GACD. The 5-year cost prediction without grouping resulted in a sample overestimate of US$79,544,508.


The grouping algorithm developed for complex, large-scale data improves the prediction of 5-year costs. The prediction accuracy could be improved by utilization of a richer set of prognostic factors and refinement of categorical specifications.