Karyotyping is the process of ordering the chromosomes of an organism to provide a genome-wide snapshot of an individua’s chromosomes. Karyotypes are prepared from mitotic cells that have been arrested in the metaphase or prometaphase part of the cell cycle when chromosomes are in their most condensed conformations. Analyses of karyotypes reveal changes in chromosome numbers (aneuploidy) and more subtle structural changes such as chromosomal deletions, duplications, translocations, or inversions. Karyotyping is a source of diagnostic information for genetic disorders, specific birth defects and cancers, and for the latter also a source of prognostic information.
The best-known example from cancer is probably the Philadelphia (Ph) chromosome discovered in bone marrow cells from chronic myeloid leukaemia patients. It was later shown that the Ph-chromosome arose through a reciprocal translocation between chromosomes 9 and 22, t(9;22)(q34;q11), and that it recombined the genes BCR (from 22q11) and ABL1 (from 9q34) to a novel BCR/ABL1 fusion gene that produced a qualitatively new oncoprotein with abnormal tyrosine kinase properties (Fioretos et al., 2009). Since the Ph-chromosome discovery more than 50 years ago, similar successes have been repeated over and over again now with literally hundreds of cancer-specific acquired chromosomal abnormalities having been detected in various neoplastic entities (Mitelman et al., 2020), and cancer cytogenetics has become an invaluable diagnostic tool.
The main steps in the analysis process are tumour cell cultivation, harvesting of cells in metaphase and visual annotation and interpretation of metaphase chromosomes imaged with a microscope. Although samples from approximately 1200 new-borns and around 3000 cancer patients are karyotyped yearly at Oslo University Hospital (OUH) alone, this method’s usability is limited by the fact that the visual interpretation of chromosomes requires specially trained personnel and is a time-consuming process that is carried out for only up to 25 cells for each patient sample. Whereas karyotyping is an integrated and necessary part of the diagnostic armamentarium in haematological malignancies, it is not properly explored in the more common cancers of the lung, breast and prostate, where the tumour heterogeneity would require analysis of far more than 25 cells. The limited efficiency of the method is now also becoming a challenge within the field of haematology. When genetic analyses were implemented in haematology more than 30 years ago, the genetic analyses were primarily of importance for the choice of consolidation therapy, particularly whether the patient should be considered a candidate for allogeneic stem cell transplantation as part of the primary treatment. Waiting for the result of the analyses for 3-4 weeks was acceptable. However, when targeted therapy has become part of the initial treatment, the clinicians need the results within a shorter timeframe, exemplified by the use of all-trans retinoic acid in acute promyelocytic leukaemia and imatinib in chronic myelogenous leukaemia and Philadelphia-positive acute lymphoblastic leukaemia. The translocation t(15;17) is pathognomonic for the former condition, while in the two latter diseases the primary genetic aberration is the Philadelphia-chromosome created through a 9;22-translocation. Now we are implementing targeted therapy as part of the initial treatment based on genetic analyses in a constantly increasing number of diagnoses and patients. Accordingly, the clinicians need the results of genetic analyses, including karyotyping in less than a week, rather than 3-4 weeks.
A more efficient means of karyotyping is required, and we aim to automate the manual process of identifying and interpreting chromosome aberrations with artificial intelligence (AI). The use of AI in assisting cancer diagnosis has been proven beneficial in a large number of studies. With automated karyotyping, time and cost savings will be substantial, resulting in shorter response time and allowing for the study of an increased number of cells for each patient. Automated karyotyping would thereby allow for more robust results and an increased sensitivity to detect small clones with chromosomal aberrations. In order to illustrate the potential cost savings, five cytogeneticist person-years are required annually to analyse the about 3000 patient samples at the karyotyping laboratory at ICGI.
The data from diagnostics in the clinical routine over the last ten years will be included in the study, including more than 26,000 patient results, with around 400,000 analysed cells and more than 15 million chromosomes. This represents a formidable data source for research and development of an automated karyotyping system.
Data acquisition, deep learning model research & development, and validation & dissemination of results are the three main activities in the project, which have been organised into work packages (WPs) accordingly. The three activities will require dedicated competence and will be implemented sequentially with partial overlaps in time and personnel. Each WP has a leader and a co-leader responsible for reaching the milestones and keeping the deadlines for each phase of the project. The activities are summarised in Table 15. The project requires substantial research and software development in different phases. Data collection and organisation are the main focus of the first WP (WP1). The aim is to provide a representation of chromosomes with annotations of aberrations and trace the imaged chromosomes from metaphase spread to annotated karyograms. We will also develop methods to automatically segment and store images of each chromosome to allow the training of deep learning models that predict chromosome number and the presence of aberrations for the individual chromosomes. Image registration and segmentation are required to accomplish this task. Software development is also required to support the digitisation of the handwritten karyotyping results.
The development and training of deep learning models for karyotyping will take place in WP2, and evaluation of the system performance and dissemination of results is planned in WP3. Robust testing and validation are required before the system can be integrated into the routine clinical diagnostic analysis pipeline. A robust validation is planned and about 6500 (25%) patient karyotype samples are held aside and will serve as a validation set where we will apply the final deep learning models and compare the predicted karyotype with the manually assessed ground truth. All discrepancies will be evaluated by our experienced cytogeneticists. Results on the accuracy of the method will be published in internationally peer-reviewed journals. Furthermore, the study of the effect on detected abnormalities when analysing all available metaphase spreads will also be carried out in WP3, based on about 9000 collected samples.
This text was last modified: 08.06.2021