Google PageRank Applied To Cancer Outcome Prediction
While PageRank may still be a huge part of Google’s search algorithm, some feel the model is outdated, and are looking for new approaches to web search. That’s not stopping scientists from finding interesting applications for PageRank, however.
Earlier this year, we looked at a story about Washington State University chemistry professor Aurora Clark who claimed to have adapted Google’s PageRank algorithm for use in moleculaRnetworks, which is designed to enable scientists to determine molecular shapes and chemical reactions “without the expense, logistics and occasional danger of lab experiments.”
In fact, we also interviewed her:
More recently, a study, published in the Public Library of Science journal Computational Biology, looked at improving outcome prediction for cancer patients by network-based ranking of marker genes, using Google’s PageRank concept. The abstract for the study says:
Predicting the clinical outcome of cancer patients based on the expression of marker genes in their tumors has received increasing interest in the past decade. Accurate predictors of outcome and response to therapy could be used to personalize and thereby improve therapy. However, state of the art methods used so far often found marker genes with limited prediction accuracy, limited reproducibility, and unclear biological relevance. To address this problem, we developed a novel computational approach to identify genes prognostic for outcome that couples gene expression measurements from primary tumor samples with a network of known relationships between the genes. Our approach ranks genes according to their prognostic relevance using both expression and network information in a manner similar to Google’s PageRank. We applied this method to gene expression profiles which we obtained from 30 patients with pancreatic cancer, and identified seven candidate marker genes prognostic for outcome. Compared to genes found with state of the art methods, such as Pearson correlation of gene expression with survival time, we improve the prediction accuracy by up to 7%. Accuracies were assessed using support vector machine classifiers and Monte Carlo cross-validation. We then validated the prognostic value of our seven candidate markers using immunohistochemistry on an independent set of 412 pancreatic cancer samples. Notably, signatures derived from our candidate markers were independently predictive of outcome and superior to established clinical prognostic factors such as grade, tumor size, and nodal status. As the amount of genomic data of individual tumors grows rapidly, our algorithm meets the need for powerful computational approaches that are key to exploit these data for personalized cancer therapies in clinical practice.
The Author Summrary says:
Why do some people with the same type of cancer die early and some live long? Apart from influences from the environment and personal lifestyle, we believe that differences in the individual tumor genome account for different survival times. Recently, powerful methods have become available to systematically read genomic information of patient samples. The major remaining challenge is how to spot, among the thousands of changes, those few that are relevant for tumor aggressiveness and thereby affecting patient survival. Here, we make use of the fact that genes and proteins in a cell never act alone, but form a network of interactions. Finding the relevant information in big networks of web documents and hyperlinks has been mastered by Google with their PageRank algorithm. Similar to PageRank, we have developed an algorithm that can identify genes that are better indicators for survival than genes found by traditional algorithms. Our method can aid the clinician in deciding if a patient should receive chemotherapy or not. Reliable prediction of survival and response to therapy based on molecular markers bears a great potential to improve and personalize patient therapies in the future.
I’m not going to pretend like I understand the ins and outs of this complex study, and try to dissect it here, but if you want to dig through it, you can do so here.