Down arrow

TERMite Text Analysis Engine

Unlock the value of scientific text in seconds with our named entity recognition (NER) and extraction engine

How do you scan millions of publications, patents, reports and any other document type to get at the information you need most?

More and more of the fundamental science content critical to the innovation process is locked up inside electronic documents.

TERMite (TERM identification, tagging & extraction) is the ultra-fast named entity recognition (NER) and extraction engine at the heart of our semantic analytics software suite.

Coupled with our hand-curated VOCabs, it can recognise and extract relevant terms found in scientific text transforming unstructured content into rich, machine-readable data.

Information Professionals

You Are: A life science professional who’s job involves hunting for key facts in literature, patents, grants and internal documents.
We Offer: The ability to data-mine millions of documents to identify critical mentions and relationships.

Enterprise Search

You Are: A company wishing to make its internal search portals more accurate.
We Offer: The ability to enhance your existing search tool to find key biological entities more accurately, making your users happier and more productive!

Solution Provider

You Are: Anyone who produces textual content in the life-sciences or supplies IT systems that contain such text within them (ELNs, Project Management Tools, Industry Databases etc.)
We Offer: The opportunity to enrich your content for search, navigation and significantly increase the value to your consumers.

Latest features of TERMite

The latest TERMite 6.4 release has a number of features and updates aimed at making your research smarter and faster. The latest features include:

  • TERMite now integrates with SciBiteAI – Access up to 23 NER machine learning models
  • Bundle Editor – Create and manage your advanced TExpress pattern searches queries
  • TERMite- CENtree one click – Edit publicly available ontologies in CENtree and upload these easily into TERMite
  • New Security Manager – Assign and use roles to control access to your TERMite domains
  • Vocab hot-reloading – Set your TERMite server to scan for the latest updates
  • Enhanced Server monitor tool
  • Regular expressions as synonyms
  • Parallelisation in Java API
  • Improved python support
  • Parallel server-batch mode
  • Enhanced scripting support

Get in touch with the team to learn more or download the TERMite datasheet.

Download TERMite datasheet

VOCabs – Premade Expert Ontologies


Computational approaches help to sift through and identify relevant material from multiple sources but struggle to deal with the ambiguity of scientific literature. Multiple terms can be used to describe the same topic making any keyword search difficult.

Our high-quality vocabularies and ontologies provide the critical foundation which enables SciBite’s TERMite engine to accurately detect important topics within biomedical text.

Each vocabulary is enhanced by a combination of our in-house and experienced hands-on ontologists and biocurators and our proprietary ontology enrichment software.

Our VOCabs cover many more topics in far greater depth than any publicly available ontologies such as MeSH, Uniprot and MeDDRA.

If you’re not using SciBite VOCabs, you’re not going to capture the information your users need.

Get in touch with the team to learn more or download the VOCabs datasheet.

Download VOCabs datasheet

Key product highlights

  • Rapid

    Rapid Start-Up

    Get up-and-running quickly, with no pre-indexing or complex set-up required

  • Robust

    Robust

    Enterprise-grade and scalable to billions of documents, with the ability to run large-scale document processing on systems such as Hadoop

  • Accurate

    Accurate

    Precisely tag and disambiguate scientific terms in unstructured scientific text using SciBite’s VOCabs containing >20 million synonyms across >80 Life Science topics including genes, drugs, diseases, adverse events

  • Fast

    Ultra-Fast

    Process millions of documents such as the entire Medline database, or large numbers of patent or internal documents in minutes

Want to learn more about TERMite?

Get in touch with us to find out how we can transform your data

Contact us

Use cases

Biomarker Discovery in Literature

The identification and application of biomarkers in basic and clinical research is almost a mandatory process in any productive pipeline of a pharmaceutical organisation. Validated biomarkers play a crucial role in the prediction of clinical outcome and support the translation from candidate discovery to successful clinical treatment.

A wealth of valuable biomarker-related information is available in the biomedical literature. However, the process of discovering and validating new biomarkers depends on the ability to extract insight from this resource effectively.

SciBite uses semantic enrichment to unlock the value of unstructured text and simplify the identification of new potential biomarker leads from scientific text.

Read the full use case

Eliminating the Data Preparation Burden

For most pharmaceutical companies, extracting insight from heterogeneous and ambiguous data remains a challenge. The era of data-driven R&D is motivating investment in technologies such as machine learning to provide deeper insights into new drug development strategies.

The quality of data directly impacts the accuracy and reliability of the results of computational approaches. However, the work required to achieve clean, high quality data can be costly, often prohibitively so, requiring data scientists to spend the majority of their time as ‘data janitors’, rather than actually analysing data.

SciBite provides an integrated, cost-effective solution to significantly reduce the time and cost associated with the process of data cleansing, normalisation and annotation. The output ensures that downstream integration and discovery activities are based on high quality, contextualised data.

Read the full use case

More Than FAIR: Unlocking the Value of Your Bioassay Data

Databases dedicated to managing bioassay data contain an amazing wealth of R&D knowledge and, as such, provide a rich resource for mining with both scientific and operational questions. However, most pharmaceutical companies are unable to realise its true value of their data because of the way it has been captured and/or managed.

A wider scientific community initiative has resulted in the establishment of principles to ensure that data is Findable, Accessible, Interoperable and Reusable. Although initially focused on the accessibility of public domain data, the FAIR principles are rapidly gaining interest from the pharmaceutical industry.

SciBite’s unique combination of retrospective and prospective semantic enrichment immediately brings scientific intelligent search to any bioassay platform, enabling the wealth of information within it to be unlocked and exploited effectively and efficiently.

Read the full use case

SciBite and Hadoop: Transforming Big Data

With the rise in machine learning and artificial intelligence approaches to big data, systems that can integrate into the complex ecosystem typically found within large enterprises are increasingly important.

Hadoop systems can hold billions of data objects but suffer from the common problem that such objects can be hard or organise due to a lack of descriptive meta-data. SciBite can improve the discoverability of this vast resource by unlocking the knowledge held in unstructured text to power next-generation analytics and insight.

Here we describe how the combination of Hadoop and SciBite brings significant value to large-scale processing projects.

Read the full use case

Semantics in Enterprise Search

To become more information-driven, pharmaceutical companies are turning to enterprise search technologies to make faster, more informed decisions based on the most relevant information available to them. Enterprise search platforms provide the scalable, high performance infrastructure to enable secure access to millions of documents from across the whole organisation and deliver content analytics from a single portal.

However, users can typically only search for exactly what was written by the author of a document. The inconsistent use of synonyms during data entry makes it difficult to identify and collate all relevant data related to a topic of interest.

Through semantic enrichment, SciBite brings scientific understanding to enterprise search, enabling it to ‘understand’ scientific concepts within unstructured text. This opens unparalleled access to drug discovery intelligence and vast amounts of knowledge and ensures users are better informed, without overloading them with information.

Read the full use case

Related articles

  1. SciBite releases new version of ultra-fast term identification, tagging & extraction engine – TERMite 6.3

    SciBite releases a new version of industry leading, ultra-fast named entity recognition (NER) and extraction engine, TERMite 6.3, which delivers a range of new enhancements, including simplified connectivity to third party systems.

    Read
  2. Enhanced Clinical Vocabularies – Inclusion of CDISC in latest TERMite 6.3 release

    SciBite's latest TERMite 6.3 release includes a new set of clinical ontologies as it introduces a set of CDISC vocabularies.

    Read

How could the SciBite semantic platform help you?

Get in touch with us to find out how we can transform your data

Contact us