Today, we’ll take Part 2’s topic a little further and zoom in on how we can find new relationships between diseases where direct evidence is sparse. This is particularly important in the rare disease arena, where the amount of research is at a much lower volume, compared to more common conditions.
Imagine if you could tap in a condition and get a ranking of all the phenotypes those diseases had in commmon. Impossible? Too time intensive? Hugely expensive? Think again.
Here, we’ll describe a method we’ve developed for quantification of disease similarity based on phenotypic signatures text-mined from Medline. We bring together machine learning algorithms and our super fast TERMite engine to map rare diseases linked by their common phenotypes. It’s an area fraught with many complexities, not least because of the difficulties we mentioned in part 1 surrounding the spread of the research and the semantics of the terminology involved.
The triangulation comes in when we infer relationships between two nodes on a network that are indirectly connected via other nodes. The more intermediate connections that two disconnected nodes have in common, the more likely that there is some sort of relationship between them.
In the case of Phenotype Triangulation, we compare diseases based on their shared phenotype profiles. Where there is strong overlap in phenotype signatures, we can hypothesize that a disease pair could share an underlying mechanistic relationship. Further weight is added to the hypothesis through overlaying known genetic associations where available, as we described in part 2.
At SciBite, we’ve developed Machine Learning algorithms to apply weightings to these relationships and so predict how scientifically “interesting” they are. The method is described in more detail in our previous blog.
For every indication-phenotype pair, we count how often both entities appear in the same sentence and set this against counts of how often the pair members appeared independently. These values are then plugged into a specific statistical algorithm to generate a relationship score. You can find more background to similar techniques at wikipedia. This score can then be ranked against all other disease-phenotype co-occurrence scores, thus enabling filtering out of the less interesting relationships.
An extension of the method is to measure similarity between diseases based on their phenotype signatures.
Here’s an example of this ranking comparing Insulin Resistance (IR) and Alzheimer’s Disease (AD), also sometimes called Type 3 Diabetes. Based on the extracted phenotype signatures, the computer has been trained to recognise that these two diseases are associated at some level and this is backed up in the literature.
As you can see from this method, with no prior knowledge of IR or AD, SciBite’s algorithms can effectively extract themes from the scientific literature without any human intervention.
Through our three parts on Rare Disease Day, we’ve brought you ideas and examples of how the SciBite Platform can be applied in the real world to help solve the challenges that scientists researching rare diseases face:
Together, these elements could drive forward the research journey towards new therapies and treatments for rare diseases, all the while helping to avoid duplication and encouraging pooling of resources.
We’ve written a White Paper on how we used Machine Learning to liberate data. To find out more about our work and how we could best help you, please contact us with your name, contact details and your organisation. We’d love to hear from you.
In celebration of Rare Disease Day 28th Feb, we have a 3 part blog post looking into some of the challenges/analysis techniques involved in the research process.Read
Today, we’ll look at a fresh way of enabling scientific researchers, either in pharmaceutical R&D or in medical institutes to deepen their investigations and consider new links.Read
Get in touch with us to find out how we can transform your data.
© SciBite Limited / Registered in England & Wales No. 07778456