TExpress - Identifying semantic patterns in text mining

Is text mining an important part of your regular analytic techniques?  TExpress is a semantic regular expression engine for use in text mining approaches which identifies and extracts semantic patterns of biomedical entities within sentences.

An entity can be a specific physical entity (gene or drug), it can be a class of an entity (Kinase receptor) or any other term in our VOCabs (including biomedical verbs).  We have curated a specific vocabulary of scientific verbs for TExpress, these can also be augmented/edited or replace with your own collected as required.

Documents are first passed through TERMite which identifies the individual entities in the text.  Once marked up, they are then passed to TExpress to match semantic patterns defined in the search as in the example below.  The patterns can be as broad or narrow as needed depending on the specificity of the entities used.

Identifying semantic patterns in text mining

Outputs can be delivered in multiple formats including JSON, XML, HTML and TSV.

Three of the most common use cases are as follows.

  • Sentence co-occurrence (SCC). Find me all the entities that co-occur with other entities
  • Sentence annotation. Identify the entities and verbs in each sentence (perhaps to identify the most “active” sentences in the text)
  • Specific event detection. For instance, genes linked to diseases in a gene-verb-disease relationship

If you would like to discuss how and where TExpress can be applied to your text mining activities, contact us now for a demonstration.