Nature study uses machine learning to predict Autism genes

The Princeton study discovered several new candidate genes linked to Autism.

Although researchers estimate there are hundreds of Autism-linked genes, only a fraction have actually been identified with strong experimental evidence. A study published today in Nature aims to change this by using a big data and machine learning approach to make a genome-wide prediction of Autism spectrum disorder (ASD) genes. The first author of the study, Arjun Krishnan, told us how their results could help with the early diagnosis and treatment of ASD.

RG: Could you briefly summarize your study?

Arjun Krishnan: Autism spectrum disorder (ASD) has a strong genetic basis, but, currently, only about 65 autism genes out of an estimated 400-1000 have been found. Because of the how complex ASD is, sequencing or genetics studies alone are severely underpowered to uncover the genetic basis of autism. So, we decided to take a complementary data-driven approach to address this challenge. The approach we developed is based on learning patterns in how previously known ASD genes are connected in a human brain-specific gene network, and we used these patterns to identify novel ASD genes.

The key result is the prediction of a comprehensive complement of autism-associated genes across the genome. In the rest of the study, using these genome-wide ASD candidate genes and our brain network, we have identified the stages and regions of brain development, and the specific cellular functions that might be disrupted in autism. We have also built an interactive web portal where any biomedical researcher or clinician can access and investigate our results.

RG: What is the significance of the results you found?

Krishnan: We predict hundreds of 'novel' candidate genes, those that have never been identified or implicated in previous genetic studies of autism. For geneticists, this means they can use our predictions to direct future sequencing studies, enabling much faster and cheaper discovery of autism genes. Researchers can use them to prioritize and interpret results of whole-genome sequencing studies of ASD. Lastly, biomedical researchers can use them and subsequent analyses to hone in on novel autism genes and study their autism-associated functional, developmental, and anatomical effects.

RG: Could you explain the technology that made this machine-learning approach possible to us?

Krishnan: The technology is in some basic way akin to how, say, Facebook uses the ‘social network’ of how people are related to each other in a social context. It might suggest a middle-school buddy for you to 'friend' by first finding who your friends are in the social network and then identifying other people in the network that are also linked to those same friends of yours.

We have built a brain-specific gene network that is a map of how genes are functionally related to each other in the brain. Using this network, we employ a similar idea to suggest novel ASD genes – first, we find functional partners of known ASD genes in the brain network, and then we identify other genes in the network that are also linked with those same partners.

This idea, along with a number of others, is formulated into a machine-learning framework that we use to make systematic predictions.

RG: What method did you use to reach these results?

Krishnan: The method we used to make ASD-gene predictions is a machine-learning approach that learns how known autism genes are connected to other genes in a gene network, and then uses these patterns to predict novel ASD genes. The gene network we used represents how genes function together in cellular pathways in the brain, or, intuitively, a molecular-level functional map of the brain.

We gathered genes previously linked to autism from all possible sources, including those with strong experimental evidence to weak circumstantial evidence, while keeping track of how reliable the evidence is for each gene. We then built a network-based evidence-weighted disease-gene classifier that learns the connectivity patterns of known ASD genes in this brain network (taking into account the level of evidence for each gene) and then uses data-driven patterns to predict the level of potential ASD association for every gene in the genome.

RG: How does this approach differ from previous gene-prediction methods?

Krishnan: There are two major contributions we have made to traditional approaches to gene-prediction. The first is the use of a genome-scale tissue-specific network. Human diseases have origins and manifestations in specific tissues and cell-types in the human body, for example, hypertension to the kidney, or autism to the brain. Therefore, to accurately characterize which genes are linked to a disorder like autism, we need to understand and predict these genes in the context of what happens specifically in the brain, not just generally in the human body. We achieved this by using brain-specific network genes across the human genome built by integrating brain-specific signals from thousands of genomic experiments.

The second contribution is the use of an evidence-weighted classifier. We have carefully curated a set of genes linked to ASD from a number of sources, keeping track of how reliable those sources were, and used their level of evidence as part of our machine-learning approach to make new predictions. Predictions made this way are significantly more accurate than predictions made just based on the high-confidence genes, demonstrating that we can take advantage of diffuse yet valuable biological signal when the problem formulated in the right way.

RG: What could your findings mean for people with Autism Spectrum Disorder?

Krishnan: We critically need a genetic or molecular test to diagnose ASD and to introduce drugs or other therapeutic interventions as early in brain development as possible based on their genetic makeup. Our findings take us one step closer towards these goals by helping researchers efficiently narrow down the genetic underpinnings of ASD and focus future genetic screens and laboratory experiments on these candidates.

RG: Where do you see the biggest potential for machine learning within the field of medical research?

Krishnan: The biggest potential that I see is for the use of machine-learning is in the grand challenge of accurately predicting aspects of the state of health and disease of an individual based their genetic makeup. Our work is a step in this direction for one major disease, helping us figure out the genetic "features" that might define the disease, which can, hopefully, be used to make predictions about the disease. What is remarkable in the pursuit of this goal is the rapid advance not just in the separate fields of machine-learning or biomedical research but in how much computational and medical practitioners appreciate the potential in marrying these two fields.

RG: What are the next steps in this research?

Krishnan: One of the most exciting next steps a number of us are thinking about is how we can use these predictions to interpret whole-genome sequencing studies of autism patients. Sequencing whole genomes is going to throw-up a deluge of variants along the genome. Our predictions can guide the interpretation of these results by helping researchers focus on a variation that falls in or close to genes that we highly rank as candidate ASD genes.

Featured image courtesy of Christiaan Colen.