Could semantic analysis help us to enjoy a worry-free slice of toast?
In defence of toast…
2nd February 2017
Author: Monica Kulkarni and Michael Hughes
Ah, toast. Never met someone who doesn’t like toast. When you’re looking for a quick snack, toast. When you’re ill, toast. With butter, jam, cheese, Marmite (not all at once). And acrylamide.
“Whoah there! Acrylamide? I didn’t put that on my shopping list!” I hear you cry. Well, yes you did, depending on how you toast your toast, according to Denise Lewis, the face of the FSA’s ‘Go for Gold’ campaign. Have you forgotten already how scorching, I mean, caramelising your starches creates acrylamide? It was only last week that Denise and the FSA were very worried about its links to cancer and wanted to warn us all. Which got me thinking – if it’s that prevalent, surely someone is looking at reducing acrylamide exposure? But how could I find that without trawling through thousands of articles?
Enter SciBite’s TERMite software – semantic analysis made simple
How would TERMite help? Let me explain. About 80% of data is unstructured – no tags, lots of synonyms, all over the place, different formats, you get the idea. With only a small amount that will have been manually curated, there’s plenty of potential to miss something. Something that could be extremely useful…
We could take acrylamide alone here as our example, but to be honest, I’m like Denise and more interested in its relationship with my diet (alas, this is the only way I’m like the Olympic champion Denise Lewis). Anyway, let’s see what happens if we expand this out to acrylamide AND food.
A set of review articles was exported from PubMed and then TERMite was set to work. This resulted in a set of articles annotated with scientific concepts. The image below shows how this is displayed for the human reader but for those involved in large-scale processing of data SciBite also provides computer-readable formats.
Here we see a paper focused on l-asparaginase and its value to the food industry for mitigating the formation of acrylamide. The TERMite Hits section has been trimmed to fit the page but what we see is a sample of the extracted scientific entities which have been normalised to a unique identifier (e.g. see how ALL is paired up with acute lymphoblastic leukemia). This ability for large-scale extraction identifiers of uniquely identified scientific concepts opens up a world of possibilities for gaining value from the masses of unstructured information that is available on the web or on your internal systems.
Working with DOCStore to enrich your data
To look more deeply into the research around l-asparaginase and Acrylamide, we used another tool in our armoury, namely DOCStore.
DOCStore is a platform that enables the user to interrogate millions of documents from multiple sources through a simple, easy-to-use interface.
Here we show how DOCStore’s semantic indexing can quickly direct the user to major topics associated with Acrylamide, such as Indications, Adverse Events, Biological Processes, Foods and many more.
Our semantic indexing means that a simple search for “Acrylamide and Asparaginase” is automatically set to capture articles mentioning these terms or anything from a wealth of associated synonyms. The image shows how results can be sorted by date or relevancy, and also one of the major features of this tool, which is to list out the top associated topics in the interactive panel on the right. Additionally, we can see how the semantically enriched search results were pulled from a set of >27 million documents in a speed of 131 milliseconds.
That’s faster than you can say Asparaginase.
Phew. So no need to panic. Although, maybe I should have just listened to Sir David Spiegelhalter, Professor of the Public Understanding of Risk at Cambridge University, who says “there is no good evidence of harm from humans consuming acrylamide in their diet.”
Oh well, at least I got to show you some fun data extraction.
Enhance, not replace your current data infrastructure
The lovely thing about all this software is that it slots right in to your existing systems. A straightforward API, TERMite can be embedded into other applications or run in the end-user interface. Implement either of these options and voilà! You’ve opened up semantic analytics to even more colleagues in your organisation.
Our suite of products and TERMite’s family
Think of it as a full English breakfast rather than just toast.