Down arrow


Unlock the value of scientific text in seconds with our named entity recognition (NER) and extraction engine

How do you scan millions of publications, patents, reports and any other document type to get at the information you need most?

More and more of the fundamental science content critical to the innovation process is locked up inside electronic documents.

TERMite (TERM identification, tagging & extraction) is the ultra-fast named entity recognition (NER) and extraction engine at the heart of our semantic analytics software suite.

Coupled with our hand-curated VOCabs, it can recognise and extract relevant terms found in scientific text transforming unstructured content into rich, machine-readable data.

Information Professionals

You Are: A life science professional who’s job involves hunting for key facts in literature, patents, grants and internal documents.
We Offer: The ability to data-mine millions of documents to identify critical mentions and relationships.

Enterprise Search

You Are: A company wishing to make its internal search portals more accurate.
We Offer: The ability to enhance your existing search tool to find key biological entities more accurately, making your users happier and more productive!

Solution Provider

You Are: Anyone who produces textual content in the life-sciences or supplies IT systems that contain such text within them (ELNs, Project Management Tools, Industry Databases etc.)
We Offer: The opportunity to enrich your content for search, navigation and significantly increase the value to your consumers.

Get in touch with the team to learn more or download the TERMite datasheet.

Download datasheet

Key product highlights

  • Rapid

    Rapid Start-Up

    Get up-and-running quickly, with no pre-indexing or complex set-up required

  • Robust


    Enterprise-grade and scalable to billions of documents, with the ability to run large-scale document processing on systems such as Hadoop

  • Accurate


    Precisely tag and disambiguate scientific terms in unstructured scientific text using SciBite’s VOCabs containing >20 million synonyms across >80 Life Science topics including genes, drugs, diseases, adverse events

  • Fast


    Process millions of documents such as the entire Medline database, or large numbers of patent or internal documents in minutes

Want to learn more about TERMite?

Get in touch with us to find out how we can transform your data

Contact us

Use cases

Biomarker Discovery in Literature

The identification and application of biomarkers in basic and clinical research is almost a mandatory process in any productive pipeline of a pharmaceutical organisation. Validated biomarkers play a crucial role in the prediction of clinical outcome and support the translation from candidate discovery to successful clinical treatment.

A wealth of valuable biomarker-related information is available in the biomedical literature. However, the process of discovering and validating new biomarkers depends on the ability to extract insight from this resource effectively.

SciBite uses semantic enrichment to unlock the value of unstructured text and simplify the identification of new potential biomarker leads from scientific text.

Read the full use case

Eliminating the Data Preparation Burden

For most pharmaceutical companies, extracting insight from heterogeneous and ambiguous data remains a challenge. The era of data-driven R&D is motivating investment in technologies such as machine learning to provide deeper insights into new drug development strategies.

The quality of data directly impacts the accuracy and reliability of results of computational approaches. However, the work required to achieve clean, high quality data can be costly, often prohibitively so, requiring data scientists to spend the majority of their time as ‘data janitors’, rather than actually analysing data.

SciBite provides an integrated, cost-effective solution to significantly reduce the time and cost associated with the process of data cleansing, normalisation and annotation. The output ensures that downstream integration and discovery activities are based on high quality, contextualised data.

Read the full use case

More Than FAIR: Unlocking the Value of Your Bioassay Data

Databases dedicated to managing bioassay data contain an amazing wealth of R&D knowledge and, as such, provide a rich resource for mining with both scientific and operational questions. However, most pharmaceutical companies are unable to realise its true value of their data because of the way it has been captured and/or managed.

A wider scientific community initiative has resulted in the establishment of principles to ensure that data is Findable, Accessible, Interoperable and Reusable. Although initially focused on the accessibility of public domain data, the FAIR principles are rapidly gaining interest from the pharmaceutical industry.

SciBite’s unique combination of retrospective and prospective semantic enrichment immediately brings scientific intelligent search to any bioassay platform, enabling the wealth of information within it to be unlocked and exploited effectively and efficiently.

Read the full use case

SciBite and Hadoop: Transforming Big Data

With the rise in machine learning and artificial intelligence approaches to big data, systems that can integrate into the complex ecosystem typically found within large enterprises are increasingly important.

Hadoop systems can hold billions of data objects but suffer from the common problem that such objects can be hard or organise due to a lack of descriptive meta-data. SciBite can improve the discoverability of this vast resource by unlocking the knowledge held in unstructured text to power next-generation analytics and insight.

Here we describe how the combination of Hadoop and SciBite brings significant value to large-scale processing projects.

Read the full use case

Semantics in Enterprise Search

To become more information-driven, pharmaceutical companies are turning to enterprise search technologies to make faster, more informed decisions based on the most relevant information available to them. Enterprise search platforms provide the scalable, high performance infrastructure to enable secure access to millions of documents from across the whole organisation and deliver content analytics from a single portal.

However, users can typically only search for exactly what was written by the author of a document. The inconsistent use of synonyms during data entry makes it difficult to identify and collate all relevant data related to a topic of interest.

Through semantic enrichment, SciBite brings scientific understanding to enterprise search, enabling it to ‘understand’ scientific concepts within unstructured text. This opens unparalleled access to drug discovery intelligence and vast amounts of knowledge and ensures users are better informed, without overloading them with information.

Read the full use case

Related articles

  1. Are Ontologies relevant in a Machine Learning-centric world?

    SciBite CSO and Founder Lee Harland shares his views on why ontologies are relevant in a machine learning-centric world and are essential to help "clean up" scientific data in the Life Sciences industry.

  2. Exploring ontology visualisation techniques for biological data

    What’s the most useful way to visualise an ontology? SciBite CTO James Malone gives his views on answering this commonly asked question regarding ontology visualisation techniques.


How could the SciBite semantic platform help you?

Get in touch with us to find out how we can transform your data

Contact us