The era of data-driven R&D is motivating investment in technologies such as machine learning and natural language processing to provide deeper insights into new drug development strategies. Despite major advances in technology, many computational approaches struggle to deal with the complexity and variability of unstructured scientific language.
One fundamental of data science remains unchanged: the accuracy and reliability of results are both critically dependent on clean, high quality data.
However, the data cleansing and annotation work required to achieve clean, high quality data can be costly, often prohibitively so. For example, data scientists spend almost 80% of their time as ‘data janitors’, collecting, cleaning, formatting and linking data, and only 20% of their time actually analysing data.
Furthermore, for most data scientists, data preparation is the least enjoyable part of their role. This presents a significant risk: when people spend a significant part of their time on a task they don’t enjoy, mistakes are bound to occur.
For most Pharmaceutical companies, extracting insight from heterogeneous and ambiguous data remains a challenge, consuming a significant amount of the time of their already constrained data scientist resources.
The identification and application of biomarkers in basic and clinical research is almost a mandatory process in any productive pipeline of a biopharmaceutical organisation.
Validated biomarkers play a crucial role in the prediction of clinical outcome, support the translation from candidate discovery to successful clinical treatment.
The process to discover and validate new biomarkers depends on effective methodologies often calling on text mining approaches to extract insight from biomedical literature.
The following white paper evaluates SciBite’s capabilities in identifying new gene biomarkers in Breast Cancer against a published methodology.
Given the wealth of information available in biomedical literature, an important thing is to be knowledgeable of all the existing biomarkers and also other biomolecules that may be suitable as new biomarkers.
One of the most valuable assets for any organisation is its data. However, most pharmaceutical companies are unable to realise its true value as a result of either i) deploying a data management system that is geared towards entering rather than mining data and/or ii) replacing such systems over time, resulting in silos of legacy data.
The way in which an organisation captures and manages its data is fundamental to addressing this problem. A wider scientific community initiative has resulted in the establishment of the FAIR principles1 to ensure that data is Findable, Accessible, Interoperable and Reusable. Although initially focused on the accessibility of public domain data, the FAIR principles are rapidly gaining interest from the pharmaceutical industry2.
The benefits of FAIR can be illustrated using the example of bioassay data management. A significant proportion of the pre-clinical data that has been accumulated by every pharmaceutical company is a result of conducting a range of biological assays to characterise drug targets and evaluate potential therapeutic molecules. Databases dedicated to managing bioassay data contain an amazing wealth of R&D knowledge and, as such, provide a rich resource for mining with both scientific and operational questions.
Get in touch with us to find out how we can transform your data.
© SciBite Limited / Registered in England & Wales No. 07778456