Mapping the depths of the genome

(Illustration: Ray Oranges)

(Illustration: Ray Oranges)

Using algorithms to analyse the whole-genome sequence of a tumour can make treatment more successful - and can even help determine how cells become cancerous.

Detailed genetic analysis of tumour tissue samples has become standard practice at a small number of the world’s leading hospitals specialising in cancer treatment. Experts extract DNA from the samples and use it to sequence the whole cancer genome. Together with information on the activity of individual genes, this helps doctors define the type of cancer more precisely and predict which treatment options and drugs the patient will respond to best.

Yet whole-genome sequencing of a patient’s tumour produces several hundred gigabytes of raw data that first have to be analysed. This would not be possible without efficient machine learning algorithms, says Niko Beerenwinkel, Professor of Computational Biology at the Department of Biosystems Science and Engineering, who specialises in the analysis of high-throughput molecular biological data.

Modern DNA sequencers may be fast and powerful, but they deliver "noisy" raw data that can only be interpreted by advanced computer analysis. "Algorithms reduce the noise by comparing the raw data of a genomic analysis with a multitude of other genomic analyses and deciding what is most probably noise and what isn’t," says Beerenwinkel.

Finding a needle in a haystack

But that’s only the start of the analysis process. "In many cases, thousands of small changes will have accrued in the tumour genome, only a few of which are relevant," says Beerenwinkel. "What’s more, some of these changes might be insignificant in themselves from a medical standpoint but play a decisive role in combination with other changes." Once again, computer algorithms can help extract medically relevant information from these large amounts of data. Equally important is the fact that tumours are composed of different cell types that differ from each other, both genetically and in terms of their function. Tumours contain not only cancer cells but also a variety of other cells, including blood vessel and immune cells. Because the genome of cancer cells evolves so rapidly, a tumour contains several genetically distinct populations of these malignant cells, which in general will respond differently to the same drug.

Together with his research group, Beerenwinkel is developing machine learning methods and software that can identify and interpret the significant genetic diversity in tumours. "Current cancer therapies tend to take only the most frequently occurring cell populations into account - future forms of treatment will be able to address all of them," says Beerenwinkel.

Prognosis and therapies

Valentina Boeva, Professor of Biomedical Informatics in the Department of Computer Science at ETH Zurich, also uses machine learning algorithms. One focus of her research is on epigenetic changes in tumour cells. These are temporary and reversible changes in the genome; they are not permanent genetic alterations.

"One result of these epigenetic changes is that different genes are active in the tumour cells than in the original healthy cells, and different proteins are produced," says Boeva. She uses databases of anonymised patient data that have been made available to researchers and analyses these using computer algorithms. In one of her as yet unpublished research papers, she was able to show why epigenetic changes are associated with increased aggressiveness in certain tumours: the changes enable tumours to evade the body’s immune response. Since these changes can be reversed with drugs, her findings could offer useful pointers for potential new treatment options.

Another example is the search for genome segments that regulate gene activity. Mutations in these segments are also a relevant factor in cancer development. These segments are often located in close proximity to the gene that they regulate. But if they are further away, they can be hard to find. "Another challenge is to find out which gene this type of segment regulates," says Boeva. She turned to a modern method of machine learning that was developed in computer linguistics to determine the meaning of a text. Using this method, Boeva analysed genomic data to determine the "meaning" of individual genome segments. In this way, she successfully uncovered previously unknown regulatory sequences.

But Boeva doesn’t always need cutting-edge analytical methods for her work. "Sometimes I get the results I need using statistical methods that scientists developed decades ago," she says. There are plenty of methods to choose from, she says, and it is often hard to tell beforehand which method is most likely to solve a particular problem, so trying out multiple options is crucial. "But machine learning is getting better all the time," she says - so there may well be algorithms in the future that can select the best machine learning method automatically.

Crucial career skills

There is a great deal of interest in machine learning among students. The pharmaceutical industry has also identified machine learning and artificial intelligence as core technologies. As well as playing a role in Beerenwinkel’s and Boeva’s field of molecular biomarkers, these technologies are also being used to develop new drug molecules. "I’m already seeing significant interest from industry in working with us on research projects and employing our graduates," says Beerenwinkel.

If Valentina Boeva succeeds in finding new cancer-relevant genome segments, it’s not only patients in state-of-the-art hospitals who will benefit. Even less specialised hospitals are increasingly carrying out limited genetic analyses for cancer patients. Instead of tackling the whole genome, these analyses look at just a few dozen segments. These are the segments and mutations that Boeva, Beerenwinkel and many other researchers all over the world have discovered with the help of machine learning - and whose function they have successfully decoded.

Fabio Bergamin