SciBiteAI in depth

An artificial Intelligence (AI) platform combining deep learning with powerful semantic algorithms to enable our customers to exploit life science data and accelerate its downstream use in research and development

Within the pharmaceutical industry, the combination of Artificial Intelligence (AI) and big data is triggering a revolution across the entire drug development lifecycle; from the way new drugs and treatments are discovered, to identifying opportunities to re-purpose those already in the market.

For a sector that has seen the cost of bringing a new drug to market rise from $1.2bn to $2bn in the last ten years, and its return on investment drop from 10% to under 2% over the same period, AI has the potential to deliver unprecedented productivity improvements and drive better outcomes for both pharmaceutical companies and patients.

The challenges posed by big data were well articulated in Doug Laney’s benchmark definition (the so-called 5 V’s):-

  • Volume: The quantity of data produced across all sources.
  • Velocity: The speed at which new data are created, collected and analysed.
  • Variety: The different types of data being created, including structured data sets, semi-structured data and unstructured text.
  • Veracity: The unpredictability of the data collected. Is it of good quality? Did it come from a trusted source?
  • Value: The worth of the data being collected.

Within the pharmaceutical and healthcare sectors, big data represents an even greater hurdle as approximately 80% of clinical data is stored as unstructured text.  AI techniques such as text mining and Natural Language Processing (NLP) are therefore required to identify concepts, entities and relationships within the document corpus.

While the volume and variety of big data represents a major technical challenge to any pharmaceutical organisation, the payoffs are also substantial: enabling patterns and trends to be identified that can inform decision making at all stages of the drug development process.


SciBite’s Artificial Intelligence platform (SciBiteAI) combines deep learning with our powerful semantic algorithms to enable customers to exploit life science data and accelerate its downstream use in research and development.

Implemented as a server-based application and deployed via Docker, SciBiteAI enables users to rapidly load and run deep learning models.

SciBiteAI’s application programming interface (API) provides customers with a simple, consistent interface for both users and applications, insulating them from the complexities of the underlying implementation.

Building on SciBite’s wealth of experience in data preparation and standards mapping, we also offer consulting services to help select, train and test machine learning models for specific use-cases.

Download the SciBiteAI datasheet or SciBiteAI use case to learn more.

Download Datasheet     Download Use Case

Cornerstone Components

SciBiteAI provides a framework for leveraging AI and deep learning models alongside our award-winning semantic technologies to unlock insights into your data.

Making Data AI-Ready (SciBiteAI .prepare)

“If your data is bad, your machine learning tools are useless…” (Harvard Business Review).

Even today, 80% or more of an organisation’s data is held in unstructured text such as Word documents, PowerPoint slides and PDFs. This is also true of external data sources such as patents, blogs, clinical notes, call centre scripts, literature databases and the growing body of experimental data typically entered via online forms or electronic laboratory notebooks (ELNs).

SciBite’s standards-based semantic tools enable Findable Accessible Interoperable Reusable (FAIR) data across the entire enterprise, a crucial pre-requisite to obtaining the high-quality training data required by machine learning models.

Our powerful ontology management builds on the FAIR data approach, turning “strings into things” and delivering a dataset capable of sophisticated operations such as synonym independent searches (e.g. Viagra or Sildenafil), ontology searches (e.g. “Find projects on Kinases…”) and connection searches (“Drugs that reduce inflammation…”).

This next generation ontology-based FAIR data is the essential bedrock for AI in all its forms.

Training Machine Learning Models (SciBiteAI .model)

Among the deep learning models employed within biomedicine, three of the most important are named-entity recognition (NER), semantic relationship extraction and question answering based on semantic structures.

At SciBite, we have in-depth experience in building all three of these models, and our consultancy service offers you the opportunity to work with our experts in creating, refining and deploying sophisticated deep learning models for your project.

With first-hand experience of industry-leading models such as BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), LSTM (Long Short-Term Memory) and Word2vec, we can help you select the right algorithm for your data.

Our practical experience within life sciences means we can also assist in planning and costing your project – right down to calculating the number of training samples required to prepare a deep learning model for a specific application.

The models are able to detect context specific relationships within text and disambiguate simple ‘mentions’ of entities from a sentence which is asserting a statement. For instance, protein-protein interactions appear in sentences where two proteins are mentioned, however, not all sentences with two proteins in them are interactions. To determine whether a relationship is actually being described requires the actual understanding of the sentence. The following shows an example of a protein interaction with complicated language that SciBiteAI can identify:-

That NPM1 promotes PEDV growth is due to N protein inhibition of caspase-3-mediated cleavage of NPM1, which prevents proteolytic cleavage of NPM1 and enhances host cell survival.

This is opposed to examples of text where proteins are mentioned but are not describing interactions and were not identified by the SciBiteAI PPI model:-

In order to identify cellular RNAs that stimulate mutant MDA5, Ahmad et al. recently described an RNase protection assay where total RNA extracted from cells is mixed in the test tube with recombinant MDA5 protein bearing a mutation in its helicase domain.

Deploying Machine Learning Models (SciBiteAI .deploy)

There are several machine learning language models now in the public domain, the best-known of these being BERT, BioBERT, ELMo and Word2vec. While these represent a genuine leap forward in our ability to process natural language, they do not fully address real-world use-cases:-

  • They are algorithms, not services; making them cumbersome to install and integrate.
  • The code for machine learning models (e.g. a Python script) can be difficult to maintain and distribute within an organization – a significant constraint as these models change frequently.
  • Machine learning models can be difficult to re-train using internal data, and many organisations struggle to achieve the metrics reported for models trained on public data.
  • To realise their full potential, these models still require domain-specific ontologies at the training, validation and interpretation stages.

At SciBite, we understand these limitations and recognise that customers need simple, deployable machine learning services. Our solution therefore separates the API from the implementation and does not require labour-intensive python coding.

SciBiteAI is a Docker container-based application for serving multiple models via a simple REST API, enabling you to leverage the power of deep learning models across the whole enterprise.

Amazon Sagemaker can also be used to train and deploy machine learning models created using SciBiteAI, allowing customers to develop their models within an AWS cloud environment.

Connecting Machine Learning Output (SciBiteAI .connect)

SciBiteAI offers a powerful REpresentational State Transfer (REST) API for leveraging the power of deep learning models across your enterprise.

The API provides a consistent, easy to use interface that can be quickly adapted to new architectures, and which shields users from implementation issues associated with the underlying machine learning models.

The API is also integrated into TERMite 6.4.

Leverage our Experience of Deep Learning

At SciBite we have experience in developing and deploying semantic deep learning models that perform a wide variety of functions:-

  • Named Entity Recognition (NER): Identifying concepts not covered by existing vocabularies;
  • Context-Specific Detection: The detection of concepts only in certain contexts. Examples include new vs pre-existing conditions and the anatomical sites of tumours;
  • Relationship Identification: Identify complex relationships between concepts such as protein-protein interactions, reporting of drug adverse events, etc.; Learn more about SciBiteAI Relationship Extraction models
  • Assisted ontology development: The use of AI to suggest new terms, identify inconsistencies and accelerate ontology development and quality control;
  • Predictors: Spot patterns in data that help predict future outcomes;
  • Clustering and classification: Group documents and concepts based on their underlying data relationships.

Download the SciBiteAI datasheet, SciBiteAI use case or get in touch with the team to learn more about SciBiteAI.

Download Datasheet     Download Use Case

How could the SciBite semantic platform help you?

Get in touch with us to find out how we can transform your data

Contact us