DOCstore 1.2 is released and live
22nd January 2018
Author: Phil Verdemato
It’s here! DOCstore v1.2 is ready with a whole host of new features, including an all new Connectors package. Powered by elastic, DOCstore provides faceted, semantic search for unstructured data. The ideal tool for a range of roles, from bench scientists to business analysts, it allows you to:
- Create a highly enriched, more analytical in-house version of Medline.
- Combine multiple data sources such as grants, trials and literature into a broad literature search tool
- Create bespoke project team databases; organize your team’s documents in a relevant way.
- Build an intelligent business/competitive intelligence platform to share information across your organisation
We thought it would be the perfect time to share the new features with you, so here’s a rundown of what you can expect.
- Advanced Search
- Customisation of the User Interface
- Optional Google Analytics Integration
- Better memory profile
- DOI field now searchable
- API additions
- Text attributes added to the data model
- Connectors – automated data management pipeline
These developments are a direct result of our customers’ feedback – as always at SciBite, we love hearing from clients as to how we can make things even better.
Let’s look at each one a bit more closely.
Search just got a whole lot more powerful with the ability to add multiple queries on multiple fields. So, for example, if I wanted to search for documents that mention ‘PDE5A’ in the title, but also mention ‘university’ in the organisation field I can now do this using the Advanced Search feature.
I can also filter on publication date or document index date.
And that’s not it. Additional filters exist for SubSource , Project ID and SubProject ID fields if they’re populated. Each of these fields represents a way to sub categorise a document. They’re settable at DOCstore load time and allow you to hold multiple copies of the same underlying document, perhaps indexed in different manners, under different contexts.
Customisation of the User Interface
You can now customise the user interface by supplying custom HTML snippets in the configuration. Examples include adding bespoke links in the ‘Explore’ dropdown, or adding an icon to the results panels with an icon linking back to the source document.
Additionally, DOCstore is now able to serve static content such as original PDF files. If you’ve indexed PDF files, you can now link to the original ones and have them show up in your browser.
Optional Google Analytics Integration
If you want to integrate Google Analytics into your DOCstore server to monitor patterns of usage, you can. Extremely useful to see what it’s being used for and when, which we know matters to organisations.
Better memory profile
DOCstore can now be run in 2G of memory with Medline and ct.gov data. That’s about a 6x saving compared to DOCstore v1.1 for this dataset.
DOI field now searchable
The doi field is now included in the list of searched fields, making it possible to search for articles that have a doi.
There is now more comprehensive input validation to the REST API, and better error messages. There is also the ability to add document unique identifiers into Co-occurrence Matrix API calls for the top 200 (sorted on publication date) documents that fulfill the co-occurrence criteria.
You can also do document or sentence level searches and retrieve only document metadata, such as ids, sources etc, rather than the entire documents themselves. This reduced payload option is ideal if you only need to use a small part of the data from each returned data set. You can now simply get the data you require, instead of a huge amount of other information that would just cost transfer time and slow down your processing.
A new operation was added for this to happen on the sentence level:
and a new parameter for the document level:
Text Attributes added to the data model
Attributes in the termite output ‘attributes’ section are now stored as key-value text in DOCstore. They’re not yet searchable (you’ll have to wait for the next version for that), but they’re returned in the data and if you apply the customisations detailed above to the user interface, you can see the data there.
Colour coded entities in the User Interface
There are now visual cues as to the entity types in the User Interface, such as different coloured underlines.
Now you can take the pain out of maintaining your data pipeline into DOCstore. Imagine being able to automate the management of that data pipeline without manually setting up, checking and approving each update, or having to outsource the task.
Without Connectors, scripting is necessary maintain the load pipeline. Usually, this involves separate scripts to:
- Fetch eg. Medline data
- Annotate (run TERMite on it)
- Load that output to DOCstore
With Connectors, this is all handled via a web-based user interface with no command line access required.
And that’s not all. The parameters required to run TERMite vary according to data sources. Again, Connectors helps here, where it will suggest the appropriate ones for the right data source.
Connectors moves the burden of loading DOCstore from the hands of the IT technician to the scientist.
Other features include:
- Run updates regularly or everyday, with the option to schedule more precisely
- Control for you – you define the pipeline and the number of steps
- Intelligent - utilising a checkpoint system, Connectors will return to the last sound update point, should anything go awry
- Non expert and expert modes
- Extensible architecture – simply plug in code into the pipeline
And that’s DOCstore 1.2 . Faster, more efficient search, allowing you to cut out the noise of unwanted information and customise what you see. And with Connectors, once again, our developments are democratising data management for the life sciences.