Disease Detective Part 3: Machine Learning and phenotype triangulation
2nd March 2017
Author: SciBite Team
- How pharmaceutical companies can identify relevant research centres across the globe
- Faster, deeper research into the mechanistic behaviour of rare diseases
Today, we’ll take Part 2’s topic a little further and zoom in on how we can find new relationships between diseases where direct evidence is sparse. This is particularly important in the rare disease arena, where the amount of research is at a much lower volume, compared to more common conditions.
Getting closer to the ideal
Imagine if you could tap in a condition and get a ranking of all the phenotypes those diseases had in commmon. Impossible? Too time intensive? Hugely expensive? Think again.
Here, we’ll describe a method we’ve developed for quantification of disease similarity based on phenotypic signatures text-mined from Medline. We bring together machine learning algorithms and our super fast TERMite engine to map rare diseases linked by their common phenotypes. It’s an area fraught with many complexities, not least because of the difficulties we mentioned in part 1 surrounding the spread of the research and the semantics of the terminology involved.
The triangulation comes in when we infer relationships between two nodes on a network that are indirectly connected via other nodes. The more intermediate connections that two disconnected nodes have in common, the more likely that there is some sort of relationship between them.
In the case of Phenotype Triangulation, we compare diseases based on their shared phenotype profiles. Where there is strong overlap in phenotype signatures, we can hypothesize that a disease pair could share an underlying mechanistic relationship. Further weight is added to the hypothesis through overlaying known genetic associations where available, as we described in part 2.
At SciBite, we’ve developed Machine Learning algorithms to apply weightings to these relationships and so predict how scientifically “interesting” they are. The method is described in more detail in our previous blog.
Calculating the scores
For every indication-phenotype pair, we count how often both entities appear in the same sentence and set this against counts of how often the pair members appeared independently. These values are then plugged into a specific statistical algorithm to generate a relationship score. You can find more background to similar techniques at wikipedia. This score can then be ranked against all other disease-phenotype co-occurrence scores, thus enabling filtering out of the less interesting relationships.
An extension of the method is to measure similarity between diseases based on their phenotype signatures.
Here’s an example of this ranking comparing Insulin Resistance (IR) and Alzheimer’s Disease (AD), also sometimes called Type 3 Diabetes. Based on the extracted phenotype signatures, the computer has been trained to recognise that these two diseases are associated at some level and this is backed up in the literature.
As you can see from this method, with no prior knowledge of IR or AD, SciBite’s algorithms can effectively extract themes from the scientific literature without any human intervention.
What does this all mean for rare diseases?
Through our three parts on Rare Disease Day, we’ve brought you ideas and examples of how the SciBite Platform can be applied in the real world to help solve the challenges that scientists researching rare diseases face:
- Facilitating collaboration with other researchers across the globe in relevant areas
- Enabling deeper research at a faster rate through making connections between diseases at a mechanistic level
- Discovering relationships between diseases in light of potentially sparse evidence
Together, these elements could drive forward the research journey towards new therapies and treatments for rare diseases, all the while helping to avoid duplication and encouraging pooling of resources.
We’ve written a White Paper on how we used Machine Learning to liberate data. To find out more about our work and how we could best help you, please contact us with your name, contact details and your organisation. We’d love to hear from you.