Please enter your email address to get this file.

Down arrow


Is text mining an important part of your regular analytic techniques? TExpress is a semantic regular expression engine for use in text mining approaches which identifies and extracts semantic patterns of biomedical entities within sentences.

An entity can be a specific physical entity (gene or drug), it can be a class of an entity (Kinase receptor) or any other term in our VOCabs (including biomedical verbs).

We have curated a specific vocabulary of scientific verbs for TExpress, these can also be augmented/edited or replace with your own collected as required.

Documents are first passed through TERMite which identifies the individual entities in the text. Once marked up, they are then passed to TExpress to match semantic patterns defined in the search as in the example below.

The patterns can be as broad or narrow as needed depending on the specificity of the entities used.

Outputs can be delivered in multiple formats including JSON, XML, HTML and TSV.

Three of the most common use cases

  1. Sentence co-occurrence (SCC). Find me all the entities that co-occur with other entities.
  2. Sentence annotation. Identify the entities and verbs in each sentence (perhaps to identify the most “active” sentences in the text).
  3. Specific event detection. For instance, genes linked to diseases in a gene-verb-disease relationship.

Related articles

  1. A Hacker’s Guide to Understanding Bio-ontology Jargon

    Perfect for those new to bio-ontologies or who work with ontologists - a whole new vocabulary deciphered!

  2. DOCstore 1.2 – the semantic search tool is released and live

    Search just got a whole lot more powerful.


How could the SciBite semantic platform help you?

Get in touch with us to find out how we can transform your data.

Contact us