lancet-header

Preprints with The Lancet is part of SSRN´s First Look, a place where journals identify content of interest prior to publication. Authors have opted in at submission to The Lancet family of journals to post their preprints on Preprints with The Lancet. The usual SSRN checks and a Lancet-specific check for appropriateness and transparency have been applied. Preprints available here are not Lancet publications or necessarily under review with a Lancet journal. These preprints are early stage research papers that have not been peer-reviewed. The findings should not be used for clinical or public health decision making and should not be presented to a lay audience without highlighting that they are preliminary and have not been peer-reviewed. For more information on this collaboration, see the comments published in The Lancet about the trial period, and our decision to make this a permanent offering, or visit The Lancet´s FAQ page, and for any feedback please contact preprints@lancet.com.

Prediction of Tissue of Origin and Molecular Subtypes for Cancer of Unknown Primary Using Machine Learning

436 Pages Posted: 18 Feb 2020

See all articles by Yue Zhao

Yue Zhao

The Jackson Laboratory - Jackson Laboratory for Genomic Medicine

Sandeep Namburi

The Jackson Laboratory - Jackson Laboratory for Genomic Medicine

Ziwei Pan

The Jackson Laboratory - Jackson Laboratory for Genomic Medicine

Carolyn A. Paisie

The Jackson Laboratory - Jackson Laboratory for Genomic Medicine

Honey V. Reddi

The Jackson Laboratory - Jackson Laboratory for Genomic Medicine

Richard Tothill

University of Melbourne - Centre for Cancer Research

Jens Rueter

The Jackson Laboratory - Jackson Laboratory Cancer Center

Kanwal P. S. Raghav

University of Texas at Houston - MD Anderson Cancer Center

William F. Flynn

The Jackson Laboratory - Jackson Laboratory for Genomic Medicine

Sheng Li

The Jackson Laboratory - Jackson Laboratory for Genomic Medicine

R. Krishna Murthy Karuturi

The Jackson Laboratory - Jackson Laboratory for Genomic Medicine

Joshy George

The Jackson Laboratory - Jackson Laboratory for Genomic Medicine

More...

Abstract

It is estimated that approximately 5% of all metastatic tumors have no defined primary site despite adequate diagnostic workup and are therefore classified as cancers of unknown primary (CUP). CUP patients are denied site-specific therapy and have poor prognosis. The knowledge of a tumor’s primary site and molecular subtype can potentially play a critical role in the choice of treatment regimen and prognosis. We developed a deep learning method to identify the primary site using the transcriptional profiles of annotated primary tumors across 32 cancer types from The Cancer Genome Atlas project (TCGA). Further, given a putative tissue of origin, we have developed models to classify the molecular subtype of a sample for 11 primary cancer types. Our 1-D Inception convolutional neural network identifies the primary site with an overall top-1-accuracy of 97.20% in cross-validation and overall top-1-accuracy of 92.64% in independent external validation of metastatic tumors with known primaries. Gene expression data is ordered by gene chromosomal coordinates as input to the 1D CNN model, and the model utilizes multiple convolutional kernels with different configurations simultaneously to improve generality. The model has been optimized through extensive hyperparameter tuning, including different max pooling layer and dropout settings. This method to identify the primary site and molecular subtype will provide better and therapeutic opportunities for CUP patients.

Funding Statement: Funding for the project was provided by Cancer Research UK and the British Columbia Cancer Agency Branch. This work was supported by the Leukemia Research Foundation New Investigator Grant, The Jackson Laboratory Cancer Center New Investigator Award, and the National Institute of General Medical Sciences of the National Institutes of Health under Award Number R35GM133562. Research reported in this publication was partially supported by the National Cancer Institute of the National Institutes of Health under Award Number P30CA034196.

Declaration of Interests: The authors declare no competing interests.

Ethics Approval Statement: Not required.

Keywords: Cancer; Classification; Machine Learning; Deep Learning; Cancer of Unknown Primary; Convolutional Neural Networks; TCGA; 1-D Inception Network

Suggested Citation

Zhao, Yue and Namburi, Sandeep and Pan, Ziwei and Paisie, Carolyn A. and Reddi, Honey V. and Tothill, Richard and Rueter, Jens and Raghav, Kanwal P. S. and Flynn, William F. and Li, Sheng and Karuturi, R. Krishna Murthy and George, Joshy, Prediction of Tissue of Origin and Molecular Subtypes for Cancer of Unknown Primary Using Machine Learning (February 7, 2020). Available at SSRN: https://ssrn.com/abstract=3534193 or http://dx.doi.org/10.2139/ssrn.3534193

Yue Zhao

The Jackson Laboratory - Jackson Laboratory for Genomic Medicine

10 Discovery Dr
Farmington, CT 06032
United States

Sandeep Namburi

The Jackson Laboratory - Jackson Laboratory for Genomic Medicine

10 Discovery Dr
Farmington, CT 06032
United States

Ziwei Pan

The Jackson Laboratory - Jackson Laboratory for Genomic Medicine

10 Discovery Dr
Farmington, CT 06032
United States

Carolyn A. Paisie

The Jackson Laboratory - Jackson Laboratory for Genomic Medicine

10 Discovery Dr
Farmington, CT 06032
United States

Honey V. Reddi

The Jackson Laboratory - Jackson Laboratory for Genomic Medicine

10 Discovery Dr
Farmington, CT 06032
United States

Richard Tothill

University of Melbourne - Centre for Cancer Research

Victoria, 3010
Australia

Jens Rueter

The Jackson Laboratory - Jackson Laboratory Cancer Center

Bar Harbor, ME
United States

Kanwal P. S. Raghav

University of Texas at Houston - MD Anderson Cancer Center

1901 East Road, Unit 1950
Unit 1905
Houston, TX 77030
United States

William F. Flynn

The Jackson Laboratory - Jackson Laboratory for Genomic Medicine

10 Discovery Dr
Farmington, CT 06032
United States

Sheng Li

The Jackson Laboratory - Jackson Laboratory for Genomic Medicine ( email )

10 Discovery Dr
Farmington, CT 06032
United States

R. Krishna Murthy Karuturi

The Jackson Laboratory - Jackson Laboratory for Genomic Medicine ( email )

10 Discovery Dr
Farmington, CT 06032
United States

Joshy George (Contact Author)

The Jackson Laboratory - Jackson Laboratory for Genomic Medicine ( email )

10 Discovery Dr
Farmington, CT 06032
United States