Integration of machine learning and genome-scale metabolic modeling identifies multi-omics biomarkers for radiation resistance (2021) | Abstract summary

Explaining an application of machine learning in medicine, line-by-line

Published in

One Minute Machine Learning

5 min readOct 15, 2021

Lewis, J.E., Kemp, M.L. Integration of machine learning and genome-scale metabolic modeling identifies multi-omics biomarkers for radiation resistance. *Nat Commun* **12,** 2700 (2021). https://doi.org/10.1038/s41467-021-22989-1

This article, published in Nature Communications in May 2021, has a lot of biological jargon that makes it hard to read for the average machine learning practitioner. Here is a breakdown of all the sentences of the abstract, hopefully providing enough background information and context to understand the general idea completely.

Abstract line-by-line

Resistance to ionizing radiation, a first-line therapy for many cancers, is a major clinical challenge.

For many people diagnosed with cancer, one of the first suggested treatments is “ionizing radiation”, also often called “radiotherapy”. This treatment works by sending waves of high energy particles (imagine like from an X-ray machine) towards cancerous areas in order to damage the genes of cancer cells and prevent them from growing and dividing anymore.
However, this treatment doesn’t work for all cancer patients (about one-fifth), and recommending this treatment to people for whom it doesn’t work is bad for their health and also wastes the precious time of the patient and resources of the hospitals.

Personalized prediction of tumor radiosensitivity is not currently implemented clinically due to insufficient accuracy of existing machine learning classifiers.

Although scientists are working on developing machine learning models that could predict whether or not radiotherapy is worthwhile for any given patient/tumor, the current models are not good enough to be used in actual hospital settings. Current machine learning models use things like tumor histology (tissue sample images), tumor grade (how seriously the cancer has spread), and genomic information (the specific patient’s DNA), but none combine them with metabolomic data (the concentrations of small molecules in the area).

Despite the acknowledged role of tumor metabolism in radiation response, metabolomics data is rarely collected in large multi-omics initiatives such as The Cancer Genome Atlas (TCGA) and consequently omitted from algorithm development.

Genomics is the study of DNA and genetic information within a cell, transcriptomics is the study of RNA and differences in mRNA expression, proteomics is the study of protein activity, and metabolomics is the study of small molecules (like the substrates or products of metabolism). The types and concentrations of small molecules are influenced both by genetics and the cell environment (like a tumor environment).
The Cancer Genome Atlas (TCGA) is huge database for researchers to use, comprised of the genomic, epigenomic, transcriptomic, and proteomic data of a bunch of different cancer samples (and their comparative normal tissue samples). It is extremely useful for cancer research. Unfortunately, it does not have a lot of metabolomic data, especially for individual samples.

In this study, we circumvent the paucity of personalized metabolomics information by characterizing 915 TCGA patient tumors with genome-scale metabolic Flux Balance Analysis models generated from transcriptomic and genomic datasets.

In this study, the authors were able to calculate the concentrations of small molecules (metabolites) for individual patients/samples by combining the transcriptomic and genomic data of the individual samples (from the TCGA)with metabolic information about human’s in general. They did this with a mathematical approach called Flux Balance Analysis.

Metabolic biomarkers differentiating radiation-sensitive and -resistant tumors are predicted and experimentally validated, enabling integration of metabolic features with other multi-omics datasets into ensemble-based machine learning classifiers for radiation response.

After the authors calculated the concentrations of various small molecules (metabolomic biomarkers) for all of their 915 samples, they identified which of them were highly present or absent in the tumor samples that responded well to radiotherapy (radiation-sensitive) vs. the tumor samples that were resistant to radiotherapy (radiation-resistant).
They also validated that their Flux Balance Analysis results were reasonable by they analyzing four samples and measuring the actual small molecule concentrations to compare with their calculated concentrations.
Then, they trained a couple of different machine learning classifier models separately. In one, the inputs were the calculated small molecule concentrations (metabolomic information) of all the samples to a machine learning classifier. In others, the inputs were either clinical, genomic, or transcriptomic information. For all of them, the outputs were the probability of classifying a sample as radiotherapy-sensitive or radiotherapy-resistant based on the inputs. Finally, they combined the outputs for the different individual classifiers to get the overall prediction of radiation response (this is ensemble learning).

These multi-omics classifiers show improved classification accuracy, identify clinical patient subgroups, and demonstrate the utility of personalized blood-based metabolic biomarkers for radiation sensitivity.

The classifier strategy described above (combining the outputs from the clinical, metabolomic, genomic, and transcriptomic classifiers) provided the best results so far (AUROC = 0.904) for predicting whether individual patients would respond well to radiotherapy or not. AUROC refers to the Area Under Receiver Operating Characteristic curve, which is a measure for classification models where 1 means a perfect score.
The authors also took it a step further and were able to identify where the results came from for each patient. For example, for some patients the clinical information contributed the most to the final radiotherapy-response-prediction, whereas for other patients the metabolomic information was important. By identifying these patient subgroups, the authors showed that radiation-response can be predicted for some patients easily if they have certain clinical features (e.g. certain cancer types, cancer grade, etc), but others might need more tests.

The integration of machine learning with genome-scale metabolic modeling represents a significant methodological advancement for identifying prognostic metabolite biomarkers and predicting radiosensitivity for individual patients.

The new technique of calculating metabolomic data (the small molecule concentrations) with Flux Balance Analysis was shown to be pretty good for figuring out which small molecules are associated with whether a patient would respond well to radiotherapy or not, and helped make machine models trying to solve this problem more accurate in general.

—

Recurrent Neural Networks | one minute summary

This is a recurring concept that you should make sure you understand

medium.com

BERT (Bidirectional Encoder Representations from Transformers) | one minute summary

It took the Transformer, and transformed it to make it even more useful