Zika virus is a mosquito-borne virus first identified back in the 1940-50’s 1. More recently the world’s attention has been drawn to this mostly harmless infection due to the potentially serious implications to neonates. Initially observed in the Africa and the Pacific region, the virus has spread to South and Central America and is predicted to continue to spread in the future.
Over the last 18 months, there has been a predictable surge in research on the Zika virus as the scientific community try to better understand the disease area. We decided to take a look at this topic to see how much research we being done across the globe and what phenotypes/symptoms have been mentioned to date.
Have a play with the interactive analysis below and see what insight you can uncover.
To manually review and analyse the text from over 1,000 articles requires a significant investment of time and effort. Here’s where our semantic analytic technologies can help. Our tools facilitate the rapid scanning and extraction of key terms from documents such as publications transforming raw text into scientifically relevant, machine-readable data.
Parsing structured XML
Our Inxights module allows users to perform complex mining of more structured documents such as Excel or XML. Having the ability to select individual fields within a file and extract any combination of terms from within enables users to quickly create valuable datasets with minimal effort. Starting with a Zika virus XML download from PubMed, we used Inxights to extract the phenotypic terms from the abstract, the institution name from the affiliation field and the publication date for each document within the corpus.
Phenotypic extraction and normalisation
We scanned the xml using our phenotypic vocabulary (containing over 1.5 million terms) and extracted all the terms within the abstracts. There are multiple ways to describe any phenotype, Microcephaly, small skull, small head etc.. our VOCabs are designed to manage the synonymous language found in scientific literature and normalise the results with ease.
Geo-location of publications
We’ve recently added a GEO library to our VOCabs meaning you can now add location to semantic searches where institutions or addresses are provided.
Using Tableau, we created the following interactive information. A view over time of where the publication powerhouses for the Zika Virus sit and the emergence of Microcephaly as the predominantly mentioned phenotype (look at publications prior to 2015, it wasn’t always the case).
Remember, this analysis stemmed from unstructured text extracted in a single XML file from a PubMed search. SciBite technologies provide you with the ability to transform individual documents into semantically enriched scientific data which can be built into powerful visualisations supporting a wide range of use cases from disease exploration to identification of emerging centres of excellence for your specific research fields.
Get in touch if you’d like to know more.
PerkinElmer, Inc., today announced sophisticated scientific semantic enhancements to the PerkinElmer Signals™ Perspectives platform, powered by SciBite® and Attivio.®Read
Like it or loathe it, plain text is a goldmine of information. The challenge is that data mining is often complicated through ambiguity. Sure, identifying, disambiguating and extracting those scientific terms is a big challenge but we’ve got it covered.Read
Get in touch with us to find out how we can transform your data.
© SciBite Limited / Registered in England & Wales No. 07778456