Emerging Tools for Computer-Aided Diagnosis and Prognostication

Although vast amounts of data are collected at clinical presentation, ranging from macroscale Magnetic Resonance Imaging (MRI) scans, to micro-scale pathology slides, to nanoscale proteins and genes, there are challenges associated with analyzing, combining, and correlating these data to make diagnostic, prognostic, and theranostic predictions [2–4]. Computerized image analysis and data integration methods have the potential to improve our understanding of the relationship between these heterogeneous multi-format, multi-scale data to better predict disease outcomes and treatment responses.

morphologic features, some of which were not previously recognizable using traditional quantitative pathology techniques. Although the molecular basis for the prognositically significant morphologic phenotypes has yet to be elucidated, and the effectiveness of computer-aided pathological interpretation has yet to be established on whole-slide images and tested on a diverse set of images, this approach shows great potential because it has predicted survival outcomes with a high degree of statistical significance and has the potential for further refinement. This example illustrates the potential for using automated, unbiased image analysis and machine-learning systems for producing standardized, objective, reproducible results that could eventually support clinical practice [8].
In one of the first applications to combine imaging and non-imaging (protein expression) data, Lee and Madabhushi developed a Generalized Fusion Framework (GFF) to integrate the micro-scale morphological features obtained from digital histopathology slides with nano-scale protein expression measurements from mass spectrometry [13]. This GFF was created to observe whether quantitative integration of image-based signatures from digital histopathology slides with corresponding peptide measurements from mass spectrometry could be used to differentiate prostate cancer progressors with prostate cancer nonprogressors. The challenge of integrating this multi-scale, multi-modal, multi-protocol data was overcome by combining the 3 data modalities (architectural histopathology features, morphological histopathology features, and m/z mass spectrometry features in 51, 100, and 570 dimensions, respectively) into a common low-dimensional meta-space projection with 3 dimensions using principal component analysis. This projection was then normalized, concatenated, and reduced a second time with principal component analysis to yield the lowdimensional integration product of the original high-dimensional data. Results reflected the suitability of using this GFF to integrate heterogeneous multi-format, multi-scale data for differentiating between patients with different disease profiles.

Future Directions
While computer-based image analysis, heterogeneous data integration methods, and computer-aided prognostics are currently demonstrating their efficacy in the pre-operative or pre-therapeutic cancer population, they will inevitably have applicability in other fields.
In cardiovascular medicine, for instance, large amounts of macro-scale heart morphology and phenotype data (from MRI, hemodynamics, and echocardiograms), micro-scale wholeslide imaging data (from biopsies, donors, explants, and device placements), and nano-scale gene expression and transcriptome data are being collected at several institutions for clinical and research purposes [17]. Because typical cardiac pathology scoring systems are rather rudimentary, such as the Dallas criteria for myocarditis [18] and the International Society for Heart and Lung Transplantation scoring of rejection in cardiac allografts [19], there is rich opportunity for computer-aided interpretation and multi-modality integration to provide new insights into myocardial disease mechanisms, severity and prognosis. As with the oncology applications described above, a key step in these myocardial applications will be correlation with clinical outcomes and current clinical reference standards. As heterogeneous data integration tools become increasingly sophisticated and validated, they could provide a rational basis for the identification of interpatient distinctions necessary for greater individualization of therapeutics.
Computers are becoming increasingly ready to supplement and enhance imaging (MRI, ultrasound), morphologic information (tissue), and molecular classification (whole-genome sequencing, expression profiling, proteomics, and metabolomics) with diagnostic, prognostic, and theragnostic predictions [8]. These computer-based tools for heterogeneous data integration have begun to demonstrate their effectiveness in large retrospective studies and will soon be ready for prospective, multi-institutional validation studies as the next step before adoption into clinical practice.