Authors
Peng Liu1, Liuwen Li1, Chen Yu2 and Shumin Fei1, 1Southeast University, China and 2Western Medicine Jiangsu Cancer Hospital, China
Abstract
Cancer is one of the most common causes of death in the world, while gastric cancer has the highest incidence in Asia. Predicting gastric cancer patients’ survivability can inform patients care decisions and help doctors prescribe personalized medicine. Classification techniques have been widely used to predict survivability of cancer patients. However, very few attention has been paid to patients who cannot survive. In this research, we consider survival prediction to be a twostaged problem. The first is to predict the patients’ five-year survivability. If the patient’s predicted outcome is death, the second stage predicts the remaining lifespan of the patient. Our research proposes a custom ensemble method which integrated multiple machine learning algorithms. It exhibits a significant predictive improvement in both stages of prediction, compared with the state-of-the-art machine learning techniques. The base machine learning techniques include Decision Trees, Random Forest, Adaboost, Gradient Boost Machine (GBM), Artificial Neural Network (ANN), and the most popular GBM framework--LightGBM. The model is comprehensively evaluated on open source cancer data provided by the Surveillance, Epidemiology, and End Results Program (SEER) in terms of accuracy, area under the curve, Fscore, precision, recall rate, training and predicting time in the classification stage, and root mean squared error, mean absolute error, coefficient of determination (R2 ) in the regression stage.
Keywords
Gastric Cancer, Cancer Survival Prediction, Machine Learning, Ensemble Learning, SEER