Please enter your email address to get this file.


Loving the Data Others Don’t

Like it or loathe it, plain text is a goldmine of information. The challenge is that data mining is often complicated through ambiguity. Sure, identifying, disambiguating and extracting those scientific terms is a big challenge but we’ve got it covered.

The Challenge of Datamining

Like it or loathe it, plain text is a goldmine of information. The challenge is that datamining is often complicated through ambiguity. Sure, identifying, disambiguating and extracting those scientific terms is a big challenge but we’ve got it covered.

 

Fit for purpose

Our 80+ hand-curated vocabularies containing >20 million synonyms, many fold enriched over any publicly available alternative, are just what’s needed for the job.

Combined with our entity extraction engine analysing >1 million words/sec (that’s the entire Harry Potter collection every second), you have a pretty powerful solution in front of you.

 

The end result

Take Medline as an example, over 24 million articles of un-structured plain text. Using just 6 vocabularies we identified 121 million individual, disambiguated assertions in 4.5 hours (did we mention our tools are fast?) and that was just on a laptop…

Back to the topic of datamining, what was it you were looking for again?

 

Related articles

  1. TERMite v5.9 now available

    Announcing the latest version of our flagship text analytics software for life sciences, TERMite 5.9.

    Read
  2. The 5 Star of Structured Data

    Sir Tim Berners-Lee, the creator of the Internet, defined a 5-star deployment scheme for open data. In recent customer discussions, we’ve talked about a similar scheme to describe the status of data across their organisation and how text analytics can help contextualise unstructured data.

    Read

How could the SciBite semantic platform help you?

Get in touch with us to find out how we can transform your data.

Contact us