Disease Detective Part 3: Machine Learning and phenotype triangulation

Disease Detective Part 3: Machine Learning and phenotype triangulation
2nd March 2017

Author: SciBite Team

Welcome to the final part of our blog trilogy for Rare Disease Day.  In parts one and two we explored:

Today, we’ll take Part 2’s topic a little further and zoom in on how we can find new relationships between diseases where direct evidence is sparse.  This is particularly important in the rare disease arena, where the amount of research is at a much lower volume, compared to more common conditions.

Getting closer to the ideal

Imagine if you could tap in a condition and get a ranking of all the phenotypes those diseases had in commmon.  Impossible?  Too time intensive? Hugely expensive?  Think again.

Here, we’ll describe a method we’ve developed for quantification of disease similarity based on phenotypic signatures text-mined from Medline.  We bring together machine learning algorithms and our super fast TERMite engine to map rare diseases linked by their common phenotypes.  It’s an area fraught with many complexities, not least because of the difficulties we mentioned in part 1 surrounding the spread of the research and the semantics of the terminology involved.

Triangulation

The triangulation comes in when we infer relationships between two nodes on a network that are indirectly connected via other nodes. The more intermediate connections that two disconnected nodes have in common, the more likely that there is some sort of relationship between them.

Phenotype Triangulation

Diseases 1 and 2 are strongly linked via uniquely shared phenotypes; The relationship between Diseases 2 and 3 is weak because their single shared Phenotype 4 is also shared with a high number of other diseases

Diseases 1 and 2 are strongly linked via uniquely shared phenotypes; The relationship between Diseases 2 and 3 is weak because their single shared Phenotype 4 is also shared with a high number of other diseases

In the case of Phenotype Triangulation, we compare diseases based on their shared phenotype profiles. Where there is strong overlap in phenotype signatures, we can hypothesize that a disease pair could share an underlying mechanistic relationship. Further weight is added to the hypothesis through overlaying known genetic associations where available, as we described in part 2.

Machine Learning 

At SciBite, we’ve developed Machine Learning algorithms to apply weightings to these relationships and so predict how scientifically “interesting” they are. The method is described in more detail in our previous blog.

Calculating the scores

For every indication-phenotype pair, we count how often both entities appear in the same sentence and set this against counts of how often the pair members appeared independently. These values are then plugged into a specific statistical algorithm to generate a relationship score. You can find more background to similar techniques at wikipedia. This score can then be ranked against all other disease-phenotype co-occurrence scores, thus enabling filtering out of the less interesting relationships.

An extension of the method is to measure similarity between diseases based on their phenotype signatures.

Here’s an example of this ranking comparing Insulin Resistance (IR) and Alzheimer’s Disease (AD), also sometimes called Type 3 Diabetes. Based on the extracted phenotype signatures, the computer has been trained to recognise that these two diseases are associated at some level and this is backed up in the literature.

rdd3-fig2

As you can see from this method, with no prior knowledge of IR or AD, SciBite’s algorithms can effectively extract themes from the scientific literature without any human intervention.

What does this all mean for rare diseases?

Through our three parts on Rare Disease Day, we’ve brought you ideas and examples of how the SciBite Platform can be applied in the real world to help solve the challenges that scientists researching rare diseases face:

  • Facilitating collaboration with other researchers across the globe in relevant areas
  • Enabling deeper research at a faster rate through making connections between diseases at a mechanistic level
  • Discovering relationships between diseases in light of potentially sparse evidence

Together, these elements could drive forward the research journey towards new therapies and treatments for rare diseases, all the while helping to avoid duplication and encouraging pooling of resources. 

We’ve written a White Paper on how we used Machine Learning to liberate data.  To find out more about our work and how we could best help you, please contact us with your name, contact details and your organisation.  We’d love to hear from you.

Sign up for our newsletter