Sir Tim Berners-Lee, the creator of the Internet, defined a 5-star deployment scheme for open data. In recent customer discussions, we’ve talked about a similar scheme to describe the status of data across their organisation and how text analytics can help contextualise unstructured data.
Using the table below, how much of the information in your company sits in each bucket?
|★||Unstructured content locked in proprietary formats or systems|
|★★||As above but in open, accessible formats (PDF, Office)|
|★★★||Semi-structured data with basic definition (XML, Excel)|
|★★★★||Semantic, structured data where elements are described to some formal specification (RDF, Ontologies)|
|★★★★★||Linked, interoperable semantic data|
If you were to plot out how the data in your organisation falls into the 5 categories above,
A eutopic vision would be to have all available data indexed, structured and linked together. Clearly we are not there yet, but it’s not as much of a pipe dream as you might think. Follow me on a short journey from 1-5★ of data structure and learn how SciBite text analytics and semantic technologies can help transform your data.
★ and ★★ Silos
Let’s face it; large swathes of scientific content in their very nature are unstructured. Publications, conference records, news articles even internal presentations – this valuable scientific data is often spread across many locations and multiple formats. There simply isn’t the time and money available to manually process ever-increasing volume and velocity of data generation. Technology needs to lend a hand.
★★★ Moving to Contextualised Data
Understanding the content and focus of each document speeds up the process of filtering through large volumes of information for the right data. Ambiguity and synonymy are commonplace in scientific text and complicate simple keyword extraction techniques. Controlled vocabularies and extensive ontologies help to group related terms but who manages and updates those you need? Designed to understand the complexity of scientific text, SciBite’s Named Entity Recognition engine, TERMite calls on a reference library of millions of scientific synonyms stored in multiple ontologies to transform documents of any type into semantically enriched machine-readable data.
Once the entities have been identified, disambiguated and multiple related synonyms normalised, TERMite can output results in multiple structured formats, including RDF, NoSQL, Graph (Neo4j) and many more. Building graphed data from once plain text really starts to open up the exploratory potential of this data through text analytics and forms the foundation of many current data integration strategies.
★★★★★ A Single View of Many Slices
The final stage in our 5★ of data structure is linking together multiple sources of information. Here, we have linked data from Pubmed, Clintrials.gov, OrphanNet all in the same database ready for analysis.
SciBite’s technologies are designed to plut into many of the current major systems for 4★ and 5★ data. Including Neo4J, Open Link Virtuoso, Cambridge Semantics Anzo platform, Spotfire, Linkurious and many more.
Regardless of the source data, or end applications used, the results should be linked in a manner that lets the science speak for itself.
Scientific knowledge can be represented as relationships between things. Thousands or millions of such relationships make a knowledge graph or network analysis. SciBite technology enables extraction of these relationships, and in doing so, can uncover knowledge that might otherwise have remained hiddenRead
Sir Tim Berners-Lee, the creator of the Internet, defined a 5-star deployment scheme for open data. In recent customer discussions, we’ve talked about a similar scheme to describe the status of data across their organisation and how text analytics can help contextualise unstructured data.Read
Get in touch with us to find out how we can transform your data
© SciBite Limited / Registered in England & Wales No. 07778456