The International Society of Biocuration (ISB) has taken an interesting approach to its 14th Annual Biocuration Conference, not only making it virtual, but spreading it out via a series of free online talks and workshops throughout the year. Our CTO was invited to take part in the first session, held on 13 April, where he shared his thoughts in a thought-provoking discussion on “The Future of Biocuration.”
As we have argued in the blog, curators are the heroes of bioinformatics. In fact, SciBite boasts its own in-house team of scientific curators, managed by head of ontologies, Jane Lomax, who is also a member of the ISB Executive Committee. Although curators often do their meticulous work with little fanfare, the life science world relies on them to perform the essential task of translating and integrating biomedical information into interoperable databases that are vital to researchers.
In the ISB panel discussion, chaired by Genentech’s Rama Balakrishnan, panelists started by sharing some of the ways that they define the job. “When I think about it, it’s applying semantic standards to ensure data findability and aggregation,” explained Carol Bult of the Jackson Laboratory in Bar Harbor, Maine. Kambiz Karimi of Myriad Women’s Health in San Francisco talked about how it involves structuring content and using a controlled vocabulary, also noting that “cleaning up” the content is a significant part of the effort.
The clean-up duties of data curation connect to the critical issue of data quality. Rama asked the panelists what is the meaning of “quality data” and what are the key metrics to ensure that quality? For us, the answer was “Testing, testing, testing.” He explained that SciBite utilises comprehensive, gold-standard tests, and also pointed out that the end product should be “in the right form so you can consume it and use it”.
Carol also clarified that “Data quality and annotation accuracy are two different things” and are approached with different processes. Some of the panelists described methods of ensuring accuracy that are quite hands-on, including reaching out to authors, publishers and laboratories directly to fact-check information.
On the related matter of quality control, Kambiz explained that his company, which specialises in genetics testing and personalised medicine, has a peer review process and maintains 30 curators on their team to ensure rigorous checking and double-checking for errors and omissions—plus some automated processes as well.
Looking at the future of biocuration, AI and machine learning loom large. “It will enhance our work by making some bottom-level decisions for us,” suggested Sandra Orchard of the European Bioinformatics Institute, who doesn’t envision machine learning replacing manual curation. Although she imagines it becoming increasingly important as ML becomes more powerful, she thinks research papers are going to continue requiring human interpretation to best understand what the human who wrote it meant.
We told conference attendees that, at SciBite, we are putting a lot of effort into developing techniques and building models, and that those models can work quite well—but there is still a lot of work to do to really start exploiting deep learning advances. He does see a future in training AI models to help with curation, but predicts that, “It will become an assistant; it will not replace subject matter experts.”
Carol Bult identified one particular danger of AI and ML, which is a broad misunderstanding about what they can actually do. “Trying to get funding for biocuration is challenging because of the perception that machine learning can do most of it. We’re working on the technology, but it’s not going to replace biocurators.” She feels that biocurators need to tackle this mis-perception and articulate a framework of how AI and biocuration go hand in hand.
Importantly, SciBite highlighted the link between data science and curation, noting that much of a data scientist’s work is data wrangling and cleaning data, and so a big chunk of what they do is, essentially, curation. Carol argued that biocurators need to make sure people realize how important their discipline is to data science, because the value of data science is already recognised across industries.
“If we frame biocuration in the context of data science, I think that will help,” she said. “We have to get better at explaining what the return on investment is. What can you do—because data are quality controlled and curated—that you wouldn’t be able to do if it wasn’t curated?”
SciBite believes that the growing prevalence of AI will actually shine a light on the value of curation. The more commonplace that AI becomes, and the more that approaches for data lakes and knowledge graphs and so on are at the forefront of decision maker’s mind, the more they will appreciate the value of well-labelled data. After all, he says, your models are only as good as your data, and the same is true for data lakes and knowledge graphs—proving that biocuration has never been more relevant.
Learn more about upcoming sessions of the Biocuration Conference.
Discover more about SciBite AI.
SciBite announces the release of SciBite AI Relationship Extraction models, which provide the enhanced ability to identify complex relationships within text to further unlock insights from Life Sciences data.Read
SciBite announces the launch of SciBite AI, a state-of-the-art Artificial Intelligence software platform for leveraging machine learning models alongside semantic technologies to unlock insights into Life Sciences data.Read
Get in touch with us to find out how we can transform your data
© SciBite Limited / Registered in England & Wales No. 07778456