Michael and I are just back from a visit to the Healtex conference, held in Manchester last week. It’s organised as part of the network that brings together folks interested in healthcare NLP from across healthcare providers, academia and commercial organisations. Before I get to the science, I would encourage anyone in this space and not already part of the network to give it a look, the conference was attended by a great bunch of folks from across Europe and beyond. Kudos to the organisation team who kept things running incredibly well, even after a mass-evacuation which required calling in the bomb squad, they still managed to keep us mostly to time!
What made the conference particularly refreshing for me was a heavy dose of practicality and realism in the face of the growing hype of Machine Learning and AI, particularly from some sections of the commercial community. Moreover, this wasn’t some deep academic exercise, but rather a walk though of the successes and remaining challenges in some real-word use-cases. In my brief summary below I won’t review every talk, but will highlight some general themes that caught my attention and pick up on some points from a couple of key presentations.
There was a lot of discussion on the high variability of clinical text, making NLP challenging in this space. Things like the vast, vast number of different spellings generated by fast-typing clinicians, colloquial abbreviations (Vets using “DUDE” (defecating, urinating, drinking and eating), and see here for challenges in that) and contradictions within the same text will be familiar to those in the field. An interesting nugget for me were when asked about why we couldn’t use autocomplete-based controlled vocabularies to make life easier, one speaker highlighted legal regulations that data entry was not to be “unduly influenced” and that clinicians are forced to use free-text so as to enter exactly what they mean. The discussions highlighted instances where physicians clearly copy/paste between patient records (and the acceptance that clinicians must do whatever they need to in often difficult circumstances) and also repetition within the same report, both of which can of course affect how we do NLP.
One positive theme from the conference was something we have blogged about often here at SciBite, the use of both data cleansing and ontologies as a pre-requisite to machine learning methods. A number of groups presented on the value of data preparation for ML tasks as well as how ontologies and machine learning could be combined, either as features or in one case, as a mechanism to create new ontological relationships. In fact, ontologies such as SNOMED and ICD-10 featured heavily in the presentations, given that a lot of the work revolves around ‘coding’ (i.e. converting free text to ontological codes). The challenge of ‘drift” over time was a recurring issue – that comparing a report coded 3 years ago with one from today was not straightforward due to the constantly changing ontologies. This was particularly pertinent to us at SciBite as we’re actively building systems to help our customers manage this in a better way. We also had some interesting talks on how to address the well known “too-many-B’s” issue with Swanson literature based discovery, and some nice work on the use of Tries for addressing the major issue of misspelling in physicians notes.
The talk that I enjoyed the most was Dr. Wendy Chapman’s keynote entitled “NLP Out of the Box: Is it Possible?”. For anyone who doesn’t know, Dr Chapman is a renowned researcher in this field, particularly well known for the “NegEx” algorithm for detection of negative findings in clinical text. Her presentation (which doesn’t look to be online, but I hope they put it up) walked through a theoretical example of developing an NLP system for surveillance of breathing problems indicated in A&E (ER) reports, then transferring that same system to monitor for Pneumonia occurrence and finally, moving that system to a different hospital. An excellent interactive talk identified a significant number of challenges, such as how precision NLP rules were very use-case dependent, and how the nature of record storage at different hospitals would have a massive impact on processes and algorithms. In the end, an e-Poll of the audience showed that most (including me) believed we were some distance away from the nirvana of “out-of-the-box NLP for healthcare”.
What I took away from this was that there remain huge challenges in delivering “off the shelf” NLP systems for clinical decision making (and one should be very wary of anyone claiming otherwise!), but that NLP is already delivering value in specific, tailored use cases. As Dr Chapman’s talk highlighted, we may be some way from simple customisable, repurposable solutions, but the tools we have now can and are paving the way to an exciting future. However, the key inhibitory factor most people agreed with was the availability of data. Given recent events, the public are naturally sceptical already and the audience agreed that full anonymisation was either very hard or even impossible. Workarounds such as ‘moving the algorithm to the data’ are common, but suffer the same issue if only a small amount of training data were available. It was interesting that as I headed back to Cambridge on the train, I caught up with the news about a new medical record sharing initiative, and so maybe we may start to address this and open up the field to a new wave of research.
All in all, an excellent short conference, it helped that the weather was inexplicably perfect but I think it showed how valuable the Healtex network is and our thanks to them for a great couple of days.
We’re finalists! SciBite has been shortlisted for Bio-IT World's prestigious Best Practices #Award at the upcoming Expo on 15-17 May. We’ve been nominated for our ground breaking collaborative project with Pfizer, ClassifR.Read
Search just got a whole lot more powerful. DOCstore enables researchers to harness the power of semantic analysis search to rapidly and comprehensively scan multiple biomedical sources.Read
Get in touch with us to find out how we can transform your data
© SciBite Limited / Registered in England & Wales No. 07778456