One of the biggest headaches a researcher faces is the huge volumes of published literature out there that they’d want to mine. The conundrum is how to get quickly to the most important and relevant points. Fast distillation is key.
Now, text mining is already out there, so you may be wondering what it is that SciBite can bring to semantics analytics party.
We offer a two-pronged resolution with our high quality VOCabs – hand curated ontologies, tailored to the scientific domain. We then pair this with our super-fast TERMite engine to liberate more data that might have otherwise remained buried.
And the results?
1) It enables you to find direct links in literature more readily
2) You’re able to find new links which may have never been previously (or explicitly) stated
3) You gain a better understanding of the mechanisms behind disease – unravelling how and why someone gets it, its behaviour, development, what it looks like and its weak spot.
Then, you have the start of a journey that could lead on to applying gene therapy and eventually a potential therapy or treatment.
So let’s demonstrate this technology on a real-life rare disease and its related conditions.
Friedreich’s Ataxia is a debilitating disorder with a heartbreaking degeneration. It’s described on the Rare Disease Day website as:
“…a genetic, progressive, neurodegerative movement disorder, with a mean age of onset between 10 and 15 years. Initial symptoms may include unsteady posture, frequent falling, and progressive difficulty walking due to impaired ability to coordinate voluntary movements (ataxia).”
What we’re aiming for here is a better characterisation of this rare disease based on its similarities to more widely understood conditions.
We ran TERMite across 25 million medline abstracts and extracted co-occurring pairs of conditions and clinical signs.
TERMite results from Medline abstracts
We performed a statistical analysis of the results. We did this so that we could identify the most scientifically interesting relationships.
We then loaded the results into a graph database, providing us with scalable and flexible retrieval.
Here you can see an initial visualisation of that graph database using Linkurious. The image below shows the major phenotypes associated with Friedreich’s Ataxia.
Now, let’s interrogate this knowledge base.
How Friedreich’s Ataxia shares multiple phenotypes with Huntington’s Disease
Now that we can calculate the major phenotypes associated with thousands of conditions, we can compare their phenotype profiles and apply similarity scoring algorithms.
The next image shows the conditions that have the most similar phenotype profiles to Friedreich’s Ataxia:
Indications related by similar phenotype profiles. The numbers on the grey lines represent the relative similarity score for each pair of conditions
We can also export the data as a list of the related indications and their major shared phenotypes (from the Neo4J interface into Excel)
If you’re an expert in the field, you may be thinking that many of these indications are well known, but keep scanning down the list – less well known information may become apparent.
Let me make this clear – this was all worked out by the computer with no prior knowledge of the condition: a computer which can now also characterise thousands of other conditions in the same way.
Exploiting the power of this analysis
So now it’s time to explore the associated genes for these phenotypically related conditions. By doing this, we’ll get an idea of where there are knowledge gaps for how these conditions might be mechanistically related. We can also show potential areas where these gaps might be filled.
By overlaying gene association data from DisGeNET, we can see some conditions with many known gene associations. However, for Friedreich’s Ataxia, there is only one – frataxin (FXN).
Are there any conditions with lots of gene associations? Yes – you can see Peripheral Neuropathies has a huge number of associated genes – these are linked because of the sheer amount of research done in this area.
By contrast, take a look at our Friedreich’s Ataxia. There are clearly huge gaps in mechanistic understanding and we can see that there’s not a great deal of investigation.
Going back to FXN, and to help get an idea of where it might fit in with the other gene/protein entities displayed on the graph, we added in protein-protein interaction data from iRefIndex. This fills in some of the gaps from the above image and we now see FXN interacting with several genes that are known to be associated with phenotypically related conditions. In doing so, we’re building up a picture of related conditions and their underlying genetic mechanisms.
The incredibly useful thing about this method is that we’ve brought together three sets of data:
Once some interesting and plausible hypotheses have been derived from the graphs, an individual can help to drive research in new directions.
For example, the gene entity PASK (PAS domain containing serine/threonine kinase) seen on the image above, interacts with FXN and is also known to be associated with Peripheral Neuropathies. From the analysis, this was one of the most phenotypically similar conditions to Friedreich’s Ataxia, as well as SDHA (succinate dehydrogenase complex, subunit A – you can see why it’s shortened!) being linked to a number or related conditions.
Could this be a new area of research?
What we love at SciBite about using our software in this way is exactly that – opening up new possibilities. And opening them up quickly, leaving researchers more time to, well, research.
We’ve written a White Paper on how we used Machine Learning to liberate data. To find out more about our work and how we could best help you, please contact us with your name, contact details and your organisation. We’d love to hear from you.
In celebration of Rare Disease Day 28th Feb, we have a 3 part blog post looking into some of the challenges/analysis techniques involved in the research process.Read
In our final disease detective article, we’ll take Part 2’s topic a little further and zoom in on how we can find new relationships between diseases where direct evidence is sparse.Read
Get in touch with us to find out how we can transform your data.
© SciBite Limited / Registered in England & Wales No. 07778456