Preprints with The Lancet is part of SSRN´s First Look, a place where journals identify content of interest prior to publication. Authors have opted in at submission to The Lancet family of journals to post their preprints on Preprints with The Lancet. The usual SSRN checks and a Lancet-specific check for appropriateness and transparency have been applied. Preprints available here are not Lancet publications or necessarily under review with a Lancet journal. These preprints are early stage research papers that have not been peer-reviewed. The findings should not be used for clinical or public health decision making and should not be presented to a lay audience without highlighting that they are preliminary and have not been peer-reviewed. For more information on this collaboration, see the comments published in The Lancet about the trial period, and our decision to make this a permanent offering, or visit The Lancet´s FAQ page, and for any feedback please contact preprints@lancet.com.
Prediction of Tissue of Origin and Molecular Subtypes for Cancer of Unknown Primary Using Machine Learning
436 Pages Posted: 18 Feb 2020
More...Abstract
It is estimated that approximately 5% of all metastatic tumors have no defined primary site despite adequate diagnostic workup and are therefore classified as cancers of unknown primary (CUP). CUP patients are denied site-specific therapy and have poor prognosis. The knowledge of a tumor’s primary site and molecular subtype can potentially play a critical role in the choice of treatment regimen and prognosis. We developed a deep learning method to identify the primary site using the transcriptional profiles of annotated primary tumors across 32 cancer types from The Cancer Genome Atlas project (TCGA). Further, given a putative tissue of origin, we have developed models to classify the molecular subtype of a sample for 11 primary cancer types. Our 1-D Inception convolutional neural network identifies the primary site with an overall top-1-accuracy of 97.20% in cross-validation and overall top-1-accuracy of 92.64% in independent external validation of metastatic tumors with known primaries. Gene expression data is ordered by gene chromosomal coordinates as input to the 1D CNN model, and the model utilizes multiple convolutional kernels with different configurations simultaneously to improve generality. The model has been optimized through extensive hyperparameter tuning, including different max pooling layer and dropout settings. This method to identify the primary site and molecular subtype will provide better and therapeutic opportunities for CUP patients.
Funding Statement: Funding for the project was provided by Cancer Research UK and the British Columbia Cancer Agency Branch. This work was supported by the Leukemia Research Foundation New Investigator Grant, The Jackson Laboratory Cancer Center New Investigator Award, and the National Institute of General Medical Sciences of the National Institutes of Health under Award Number R35GM133562. Research reported in this publication was partially supported by the National Cancer Institute of the National Institutes of Health under Award Number P30CA034196.
Declaration of Interests: The authors declare no competing interests.
Ethics Approval Statement: Not required.
Keywords: Cancer; Classification; Machine Learning; Deep Learning; Cancer of Unknown Primary; Convolutional Neural Networks; TCGA; 1-D Inception Network
Suggested Citation: Suggested Citation