Down arrow

Ontology Services

Extensive range of tools & techniques to rapidly develop new ontologies or assist in managing existing resources

SciBite’s Ontology Services provide a cost-effective route to augment your data science team and access SciBite’s team of experienced ontologists and biocurators.

Our expert team combine many years of expertise and are actively engaged with Life Sciences initiatives, such as Pistoia Alliance, OBO Foundry, ISB and ICBO.

We provide thought leadership based on our unique and valuable combination of extensive industry experience, coupled with unparalleled expertise in ontology development, standards and biocuration.

With an extensive range of tools and techniques at our disposal, we deliver significant value across a range of use cases, including:

  • Building new vocabularies
  • Extending (augmenting) SciBite vocabularies
  • Developing bespoke semantic search queries
  • Data cleansing
  • Mapping internal lists to standard vocabularies
  • Building gold-standard training sets for Machine Learning and Artificial Intelligence systems
  • SciBite’s Ontology Services enable you to realise the full potential of the SciBite platform.

Key product highlights

  • Automated

    Efficiency

    Flex your capacity and extend the capabilities of your team quickly and easily

  • Experience

    Experience

    Extensive Life Science experience and active engagement with industry initiatives

  • Expertise

    Expertise

    Unparalleled expertise in ontology development, standards and biocuration

Want to learn more about our Ontology Services?

Get in touch with us to find out how we can transform your data

Contact us

Use cases

Building a New, Robust Domain-Specific Vocabulary

The Business Challenge:

A global bioscience company wanted to standardise its use of scientific terminology. However, they found that most publicly available ontologies focused on pharmaceutical terminology and did not provide appropriate coverage relevant to their business, which is to develop solutions for the food, nutritional and agricultural industries. The manual development and review of a new vocabulary would have consumed significant internal resource.

The SciBite Solution:

We started with several existing ontologies, including publicly available bacterial species names, which were enriched using internal terminology commonly used within the client’s organisation, including bacterial strain names and biosafety terms. We also leveraged machine learning techniques to support the enrichment process in a controlled way.

Read the full use case

Developing Bespoke Semantic Queries to Enable Comprehensive Monitoring

The Business Challenge:

In an attempt to identify articles of interest amongst the background ‘noise’, LifeArc’s Scientific Horizon Scanning Team members used to manually scan through PubMed, grant information and a range of biotech-focussed news websites for potentially interesting articles. This process was resource intensive, limiting the coverage, depth and frequency of review possible.

The SciBite Solution:

To provide the foundation for semantic enrichment and to complement SciBite’s existing vocabularies, we created several new vocabularies containing bespoke terms tailored to LifeArc’s business, including those relating to novelty and specific technologies such as biomarkers and diagnostics.

Read the full use case

Building Gold-Standard Training Sets for Machine Learning and Artificial Intelligence Systems

The Business Challenge:

Many applications of AI involve pattern recognition, but their accuracy is highly dependent on the data being unambiguous. Machine learning models can be used to identify sentences describing positive and negative relations between entities (i.e. X  has some relation with Y). However, in order to train such models, it is vital to have as clean a dataset as possible. For example, without prior semantic enrichment of the text, a machine model would not be able to correctly identify that the phrase “...the binding of repaglinide to HSA in human plasma...” refers to an interaction between a drug and a protein, rather than between two proteins.

The SciBite Solution:

We created a tool that makes use of SciBite’s Named Entity Recognition (NER) engine, TERMite, to accurately identify and categorise examples of sentences that mention protein-protein interactions. First, all sentences mentioning entity type 1 and entity type 2 were extracted from MEDLINE. In the case of protein-protein interactions, we were looking for two GENE mentions in a sentence. These sentences were then surfaced to a curator, along with related metadata. The curator then assigns the sentence to one of three sets: i) sentences that describe a positive interaction, ii) sentences that describe a negative interaction, or iii) coincidental mentions. This data that can then be used to train machine learning models to automate the extraction of sentences describing a relation of interest.

Read the full use case

Data Cleansing to Unlock the Potential of Bioassay Data

The Business Challenge:

A global pharmaceutical company recognised the potential of the huge volumes of bioassay data that they had generated, but struggled to gain insights from this valuable resource. A lack of standardisation across their data repositories, including LIMS and other bioassay databases, had resulted in the different ways to describe the same thing, for example ‘mouse’, ’mice’, ‘Mus musculus’ and ‘m. musculus’, making it hard to collate data for a particular species. This was compounded by the fact that some database fields were sparsely populated fields while others contained useful information buried in long assay descriptions.

The SciBite Solution:

We enriched our species, gene and bioassay vocabularies with customer-specific terms and synonyms to ensure all relevant information would be recognised. We then analysed the assay names from the legacy database and extracted the different entities within each one. Each entity was extracted and mapped to a single, standard vocabulary term to normalise the data.

Read the full use case

Mapping Vocabularies to Enable Effective Data Mining

The Business Challenge:

A leading business intelligence company had developed and acquired a range of life sciences databases. However, each database was indexed differently, resulting in silos of data that had to be searched independently.

The SciBite Solution:

As an initial step, we mapped the client's internal lists of indexing terms to standard controlled life sciences vocabularies, including the Indication branch of MeSH (Medical Subject Headings) and Drugs from ChEMBL. This resulted in a single consistent means to index the clients databases. With the index mapping in place, connections could be made between entries in previously disconnected databases, enabling users to seamlessly navigate the content within them.

Read the full use case

Related articles

  1. How ontologies are unlocking the full potential of biomedical data

    Our latest blog explains how SciBite's Ontologies team takes public biomedical ontologies and tailors them so that they can be used for named entity recognition (NER).

    Read
  2. Keeping up with the Life Sciences literature – How Semantic Enrichment is changing the way we search

    In our latest blog we discuss the challenges life sciences companies, like LifeArc, face in keeping up-to-date with scientific literature, and how semantic enrichment technology can automate this process to reduce the time spent mining data by up to 80%.

    Read

How could the SciBite semantic platform help you?

Get in touch with us to find out how we can transform your data

Contact us