The Challenge of Datamining
Like it or loathe it, plain text is a goldmine of information. The challenge is that data mining is often complicated by ambiguity. Sure, identifying, disambiguating and extracting those scientific terms is a big challenge – but we’ve got it covered.
Fit for purpose
Our 80+ hand-curated vocabularies containing more than 20 million synonyms, enriched over any publicly available alternative, are just what’s needed for the job.
Combined with our entity extraction engine analysing over 1 million words per second (that’s the entire Harry Potter collection every second), you have a pretty powerful solution in front of you.
The end result
Taking Medline as an example: over 24 million articles of unstructured plain text. Using just 6 vocabularies we identified 121 million individual, disambiguated assertions in 4.5 hours (did we mention our tools are fast?) and that was just on a laptop.
Back to the topic of data mining, learn more about our text analysis engine.
Sir Tim Berners-Lee, the creator of the Internet, defined a 5-star deployment scheme for open data. In recent customer discussions, we’ve talked about a similar scheme to describe the status of data across their organisation and how text analytics can help contextualise unstructured data.Read
Get in touch with us to find out how we can transform your data
© SciBite Limited / Registered in England & Wales No. 07778456