Histotyping

Histotyping is a novel approach where deep learning is utilised to predict patient outcome by analysing the tumour tissue in routine histological sections. The technique is based on deep Convolutional Neural Networks (CNNs) that predict patient outcome from haematoxylin and eosin (HE) stained routine tissue sections. Encouraged by excellent results and great potential, Histotyping will play a central role at the Institute in the years to come.

Background

Deep learning is a machine learning technique that allows computers to do what comes naturally to humans: learn by example. With enough data and computer power and a well-designed experiment, these artificial intelligence (AI) networks outperform competing techniques. Image analysis is one area with especially convincing results, and there is now a rapidly increasing number of publications demonstrating high performance of CNNs in the field of medical diagnostics.

We first developed Histotyping in colorectal cancer and published the results in The Lancet in February 2020. The DoMore-v1-CRC biomarker was developed from scanned HE-sections from 2500 colorectal cancer (CRC) patients with more than 12 000 000 image tiles from patients with a distinctly good or poor disease outcome from four cohorts were used to train a total of ten convolutional neural networks, purpose-built for classifying supersized heterogeneous images. A prognostic biomarker integrating the ten networks was determined using patients with a non-distinct outcome. The marker was tested on 920 patients with slides prepared in the UK, and then independently validated according to a predefined protocol in 1122 patients treated with single-agent capecitabine, using slides prepared in Norway. All cohorts included only patients with resectable tumours, and a formalin-fixed, paraffin-embedded tumour tissue block available for analysis. The primary outcome was cancer-specific survival.

The DoMore v1 Network This shows the set-up of the DoMore v1 network. First the Whole Slide Image (WSI) is scanned and the tumor area segmented. Each scan is anaysed 5 times at 10x resolution, and 5 times at 40x resolution. The image is split into tiles, which each are analysed in the DoMore v1 Network, giving a result relating to a a good or bad prognosis. If the number is closer to "1", the probability of a bad prognosis is larger, while as a result closer to "0" may predict a good prognosis. A patient prognosis is estimated based on the individual tiles’ prediction values.

A total of 828 patients from four cohorts had a distinct outcome and were used as a training cohort to obtain clear ground truth. In total, 1645 patients had a non-distinct outcome and were used for tuning. The biomarker provided a hazard ratio for poor versus good prognosis of 3.84 (95% CI 2.72–5.43; p<0·0001) in the primary analysis of the validation cohort, and 3.04 (2.07–4.47; p<0·0001) after adjusting for established prognostic markers, significant in univariable analyses of the same cohort, which were pN stage, pT stage, lymphatic invasion, and venous vascular invasion.

Histotyping-demonstration. 1) Samples are fed into the scanner. 2) The scanner scans the tissue sections, adjusting focus to get the best result possible. 3) The scans are sent to the deep learning models, running on GPUs where 4) the tumour region in each scanned sample is automatically segmented, 5) the outlined tumour is further, 6) divided into tiles, and 7) automatically assigned a probability for representing poor prognosis. Each tile is toned blue for good, or red for a bad prognosis. 8) Finally a report is generated, where patient prognosis is estimated based on the individual tiles’ prediction values.

A clinically useful prognostic marker was developed using deep learning allied to digital scanning of conventional haematoxylin and eosin-stained tumour tissue sections. The assay has been extensively evaluated in large, independent patient populations, correlates with and outperforms established molecular and morphological prognostic markers, and gives consistent results across tumour and nodal stage. The biomarker stratified stage II and III patients into sufficiently distinct prognostic groups that potentially could be used to guide the selection of adjuvant treatment by avoiding therapy in very low-risk groups and identifying patients who would benefit from more intensive treatment regimens.

Outcome prediction is based on individual predictions in ten different CNNs on two different magnification levels for robustness. The DoMore-v1-CRC marker is strongly associated with the tumour differentiation grading carried out by expert pathologists, but additionally classifies patients in the intermediate differentiation category into meaningful groups for increased and reduced risk for cancer-specific death. This relation to grading is important since a common objection to methods based on neural networks are their “black box” nature, while the relation to grading shows that the DoMore-v1-CRC marker also detects what the pathologists detect – and something more.

The Histotyping methods are fully automatic and are applied to digital scans without human intervention. In the processing pipeline, the tumour regions are identified automatically by one deep learning network before they are analysed for prognosis by another network.

Validation of Histotyping in colorectal cancers

a) Non-pedunculated T1 Tumours treated with surgical resection

The introduction of population-based screening programs for colorectal cancer (CRC) has led to a higher incidence of early-stage CRCs.1 Endoscopic resection is an attractive treatment for T1 CRC due to substantially lower morbidity and mortality rates as compared to surgery. Whether endoscopic treatment can be considered curative depends on the risk of incomplete resection and lymph node metastasis (LNM). An important difference between non-pedunculated and pedunculated T1 CRCs is that pedunculated lesions are especially amenable for endoscopic treatment because these polyps can be removed relatively easy with en-bloc snare polypectomy, with a risk of incomplete resection of <3%. The risk of LNM has been reported to be as low as 3%–7% in pedunculated T1 CRC, whereas this risk is about 7%–14% in non-pedunculated T1 CRC.

A marker is required to identify the patients with T1 colorectal tumours that is most likely not cured by the polypectomy during endoscopy and should be referred for surgery.

From our collaborators in Utrecht, we have received 1065 HE sections from 432 patients with non-pedunculated T1 Tumours treated with surgical resection. We will first attempt to apply the DoMore-v1 marker directly to see if we are able to predict the outcome for these patients. As our models were not trained for this purpose and only validated in high-risk patients, it will not be surprising if the models fail. If so, we will attempt to re-train the model with T1 tumours, and design a study where this material is extended with T1 tumours from Oxford (we have identified around 1500 potential patient samples in Oxford).

b) Validation of DoMore-v1 marker on CRC from Indivumed

Indivumed is a company founded by scientists to establish a multiomics cancer platform to develop individualized therapy. They were introduced to us by Dr John Marshall, Chief Oncologist at Georgetown University Hospital, and with him and Prof David Kerr, we have established a data-sharing collaboration with Indivumed. Their multiomics data will be used in another project described herein (Learning from Deep Learning on page 76), and the collaboration starts with us receiving 650 CRC cases (unstained slides + scans of HE slides) that will be analysed with the DoMore-v1 network. The results will be made available to Indivumed, and we will receive multiomics data and clinical data for the same patients. Results will be written up by ICGI and published together. Validation should be done with the windows application.

c) Validation of DoMore-v1 marker in the SCOT study

The SCOT study is an international, randomised, non-blinded, non-inferiority, phase 3 trial comparing six months versus three months of oxaliplatin and fluoropyrimidine adjuvant chemotherapy in patients with high-risk stage II or stage III colorectal cancer. We are receiving tissue sections from 2500 patients from this study, that will be used in our Prediction project. In this project, we will run a validation study of the DoMore-v1 network. The validation will be done with the Win App and stage and T will be combined in the final analysis. Alternatively, if the results from the invariance studies are ready these should be applied to this material before validation.

d) CRC liver metastases (CRLM) Part I

Stage IV colorectal cancers have spread from the colon/rectum to distant organs and tissues. Colon cancer most often spreads to the liver, but it can also spread to other places like the lungs, brain, peritoneum (the lining of the abdominal cavity), or to distant lymph nodes. The 5-year survival for patients with Stage IV CRC is 10-20%, and today there are no reliable biomarkers to identify those patients that will benefit from the different treatment options. In this project, we will investigate if it is possible to teach the DoMore-v1 network to identify treatment response through studies of the metastatic tumour tissue in liver biopsies from patients in Stage IV CRC. We have access to 2400 biopsies from about 500 patients with CRLM in three cohorts from Oslo, Liverpool and Rotterdam. Samples from Oslo and Liverpool are collected and scanned (on two platforms), and the material from the Erasmus University in Rotterdam is expected during Q1 2021. The study will be designed as described in The Lancet publication, with the exception that here we have multiple samples per patient.

e) CRC liver metastases (CRLM) Part II

ICGI is setting up a biobank of liver metastasis with matching primary CRCs together with the University of Liverpool, where the data collection is organized by Dr Robert Jones. So far, we have tissue from 57 patients with both primary tumour and metastasis and expect to receive WSIs for another 200 cases from the Erasmus University Rotterdam. We have started to analyse the DNA content in such paired samples and see that the DNA ploidy distribution is remarkably similar (often identical) in the primary tumour and the metastasis.

The aim of this project is to identify cellular features in both samples that may be used to predict a later metastasis by analysing the primary tumour at the time of diagnosis. Convolutional Neural Networks are a superb technology to identify such features, and we will initiate this work as soon as we have enough cases for a discovery dataset.

Histotyping as a prognostic biomarker in lung cancers

Lung cancer is the leading cause of cancer deaths worldwide. The average 5-year survival rate for all lung cancers is less than 20%. There are two main types; non-small cell lung cancer (NSCLC) and small cell lung cancer (SCLC). Roughly 9 out of 10 cases are NSCLC, and they have an overall better survival than SCLC (average 6% 5-year survival). About 50% of NSCLC are adenocarcinomas, 30% are squamous cell carcinomas arising most frequently in the central chest area in the bronchi, and the rest are either undifferentiated carcinomas (large cell carcinomas) or mixtures of different types. About 10-35% of lung cancers do not spread beyond the lung (Stage I or II) at time of diagnosis and are removed by surgery. However, only roughly 50% of these patients are alive after five years. The aim of this project is to develop a prognostic biomarker for patients with resected lung cancer, using the DoMore-v1 network.

Materials available at ICGI:

OUS NSCLC cohort: 950 patients were resected for primary non-small-cell lung cancer (NSCLC) at Oslo University Hospital between March 2006 and December 2018 as a part of the primary treatment and prospectively included in a clinical database.

At ICGI, we have:

Whole slide images from 3427 routine slides (3424 Aperio AT2 scans and 3393 Hamamatsu XR scans)
Whole slide images from 3485 ICGI slides (for both Aperio AT2 scans and Hamamatsu XR)

442 (48%) of the 930 patients died, 288 of whom died from NSCLC. The median follow-up of patients alive at last follow-up was 5.8 years (interquartile range 4.3 to 8.1 years). Of the 930 patients, 410 (44%) experienced a recurrence of NSCLC.

Tromsø NSCLC cohort: Of 663 patients resected for primary stage I to III NSCLC at University Hospital of North Norway and Nordland Hospital between 1990 and 2010 as a part of the primary treatment, ICGI has received whole slide images of 522 HE slides (for both Aperio AT2 and NanoZoomer XR) from 522 patients. A death was recorded for 384 (74%) of the 522 patients, 206 of whom died from NSCLC. The median follow-up of patients alive at last follow-up was 6.9 years (interquartile range 4.5 to 10.1 years). A recurrence of the NSCLC was recorded for 243 (47%) of the 522 patients.

TCGA NSCLC cohort: Of 956 patients in The Cancer Genome Atlas (TCGA) projects TCGA-LUAD and TCGA-LUSC with ‘Diagnostic Slide’ available in National Cancer Institute’s (NCI’s) Genomic Data Commons (GDC) portal (https://portal.gdc.cancer.gov/), ICGI has whole slide images of 1042 HE slides (for one scanner only) from 946 patients. A death was recorded for 372 (39%) of the 946 patients, of whom 190 died from NSCLC. The median follow-up of patients alive at last follow-up was 1.9 years (interquartile range 1.2 to 3.4 years).

These three cohorts will be used to allow the DoMore-v1 network to learn how to distinguish between poor and good prognosis after surgery. Test- and validation cohorts will be obtained from University College London and Uppsala University.

Histotyping as a prognostic biomarker in prostate cancers

Prostate cancer is one of the most common cancers, with approximately 5000 new cases diagnosed in Norway each year. Although the survival rate is generally high, the incidence still makes this the third most common cause of cancer death in Norway. The combination of an expected increase in prostate cancer incidence as the population ages and a shortage of pathologist expertise [3], [4], creates a potential bottleneck in prostate cancer diagnostics and treatment. Another cause of increased workflow for pathologists is that more patients with low-risk prostate cancer enter active surveillance programs, where their disease is monitored without receiving treatment. These patients will typically have multiple biopsies taken while they are being monitored, and these biopsies must be assessed by a pathologist to decide whether the patient should stay in active surveillance or if he requires treatment. Implementing an artificial intelligence (AI) system could decrease the pathologist’s workload and help ensure good care to all patients.

The Gleason score describes how abnormal the tumour tissue appears compared to healthy prostate morphology and is a measure of the tumour aggressiveness. Although the Gleason score is a strong clinical marker for prostate cancer prognosis, it suffers from substantial inter-observer variation, meaning that two pathologists could give the same sample different Gleason scores. This has the unfortunate consequence that the quality of the pathology assessment, and therefore of the treatment selected, could be affected by the level of experience of the pathologist reviewing the case. This is particularly problematic when patients are treated at their local hospital, which may not have the same expertise as a larger specialist hospital. By implementing an AI system to aid the pathologists, we could not only decrease their workload but also ensure that every patient has access to the same level of expertise, regardless of the hospital at which they receive care.

Deep learning is an appropriate strategy for this kind of problem, as it allows training models without having to predefine the features to be extracted from the input data. However, this requires a large amount of annotated data, preferably from multiple hospitals. Because the Gleason score has shown success as a marker for prognosis and is an important part of the current prostate cancer diagnostics, the published literature on digital pathology and deep learning in prostate cancer mainly focus on automatic Gleason scoring. In the project described here, we propose to shift this focus to disease recurrence. Firstly, this will eliminate some of the uncertainty caused by the inter- and intra-observer variation in the Gleason score. Secondly, the risk of recurrence is what ultimately matters to the clinician determining which treatment to give and of course, to the patient himself.

Histotyping explained

For more around histotyping, please see the Norwegian Research Council Lighthouse Project: DoMore! website.

This text was last modified: 27.02.2025

Part of

Chief Editor: Tarjei S. Hveem, Interim Institute Director
Copyright Oslo University Hospital. Visiting address: The Norwegian Radium Hospital, Ullernchausséen 64, Oslo. Tel: 22 78 23 20