The Challenge of Datamining
Like it or loathe it, plain text is a goldmine of information. The challenge is that datamining is often complicated through ambiguity. Sure, identifying, disambiguating and extracting those scientific terms is a big challenge but we’ve got it covered.
Fit for purpose
Our 80+ hand-curated vocabularies containing >20 million synonyms, many fold enriched over any publicly available alternative, are just what’s needed for the job.
Combined with our entity extraction engine analysing >1 million words/sec (that’s the entire Harry Potter collection every second), you have a pretty powerful solution in front of you.
The end result
Take Medline as an example, over 24 million articles of un-structured plain text. Using just 6 vocabularies we identified 121 million individual, disambiguated assertions in 4.5 hours (did we mention our tools are fast?) and that was just on a laptop…
Back to the topic of datamining, what was it you were looking for again?
Sir Tim Berners-Lee, the creator of the Internet, defined a 5-star deployment scheme for open data. In recent customer discussions, we’ve talked about a similar scheme to describe the status of data across their organisation and how text analytics can help contextualise unstructured data.Read
Get in touch with us to find out how we can transform your data
© SciBite Limited / Registered in England & Wales No. 07778456