OUR LATEST ARTICLES
Case Study: Using Natural Language Processing for Healthcare Summaries
A leading healthcare organization recently engaged Manceps to help them bring Machine Learning solutions to their case preparation process.
By using natural language processing and state-of-the-art language models to integrate their wealth of data into a scalable system, the company was able to automatically structure complex case files into single-page medical narratives.
In most complex medical claims, insurers and patients have the right to request a medical review of prescribed treatment from an independent reviewer. Our client is such a reviewer, acting as a mediator between payers and providers for medical necessity reviews and preauthorizations.
Once our client receives the details of the case, the organization must then validate (or overturn) the insurer's decision.
Validating treatment plans is just one of many ways that this organization helps at the intersection of the insurer, physician, and patient. In addition to providing an appeal mechanism, our client can also provide treatment pre-authorizations as outsourced by insurance providers.
When a case is brought before this healthcare organization, it receives an upload through their application portal of hundreds — if not thousands — of pertinent medical document pages that it will need to interpret in order to render a verdict.
For liability purposes, this information tends to be overwhelmingly comprehensive. Not only will the organization receive information about the case, such as the patient’s medical records and test results but it will also receive documentation relating to the insurance company, its policies, and other extenuating details.
Further complicating matters, the information can come in a variety of formats such as printed text, scanned handwritten notes, images, and/or computer-generated EHR dumps, all of which can have inaccuracies or otherwise be incomplete.
It is the job of our client and its clinical staff to transform this poorly-organized data into a decision — one that must be made quickly and accurately.
Manceps built a scalable, containerized data engineering system to structure their patents’ files through Natural Language Processing (NLP) to summarize the case and drastically reduced the number of hours their in-house medial team had to spend evaluating case files.
Our first step was to organize the crush of content they receive and convert it into a normalized, structured data set that our Artificial Intelligence system could eventually interpret.
To do this, we built a service that extracts embedded and scanned language through digital extraction and OCR (optical character recognition), respectively, in order to process every word on every page into something that could be read, tagged, and understood by our AI system.
During this process, we also built an exhaustive set of intelligent validators to guarantee the accuracy of the case materials, ensuring that all the records were accurately associated with the correct patient and the case at hand.
The core challenge of any NLP project is that people understand sequences of words while computers understand sequences of numbers. By translating words, sentences, and language into numbers — or vectors, as Data Scientists call them — computers are able to map the relationships words have to one another.
These word relationships are the key to understanding language. Only by associating the word leopard to the words “wild”, “cat”, and “spots” can humans begin to understand what a leopard is. It is in this way that Natural Language Processing becomes Natural Language Understanding. Instead of associating the word “leopard” with the word “cat” in a holistic way, however, computers do this mathematically, converting words into a veritable constellation of numerical understanding.
The most important part of any NLP implementation is finding the right language model for translating text into such vectors, while maintaining a common link between the two distinct entities.
Fortunately, state-of-the-art pre-trained language models are available to perform these tasks with deep-learning-powered language processing.
Once we had built our data pipeline to properly extract and stream text, we were able to do two things with it: provide indexed text for dynamic end-user interaction and funnel language embeddings to power our ML models training and inference.
This enabled our Deep Learning models to understand whether particular sections or sentences of the case file were relevant to the medical procedures under review. Relevant information was then sent back and forth across the system to different stakeholders.
By layering the language model onto our client’s data, our Machine Learning system could now understand the story of the case file and begin to summarize it.
Pragmatically speaking, using natural language processing to summarize dense text requires two steps. The first is to extract relevant information. The second is to rewrite that extracted information into a coherent narrative. Because the source material was exceedingly long for this project, Manceps performed multiple to produce the best results.
First, our system dug through the original case file and extracted the 500 most important sentences, based on the set priorities.
At the extraction phase, our system then reduced the word count further. It chose 10 of the 500 sentences to serve as the most concise summary possible. In this case, we tuned the system to prioritize comprehensively capturing all information contained in the source material, even if that meant repeating information.
Finally, once the system had reduced the case file down to a single page, we used Natural Language Generation tools to rewrite those 10 sentences into a completely summarized, totally comprehensive narrative.
Our system has already saved this organization thousands of hours. By automatically organizing and summarizing case file information, its physicians are now able to quickly understand case elements so they can make informed, medically accurate, and timely determinations.
24.02.2020