Contributed to PharmaLeaders by Authors: By Ana Bargo, M.S.; Jeannine Cain, MSHI, RHIA, CPHI; Shannon Fee, B.A.; Mark Yap, FNP-C
In 2016, a major transformation occurred in how we evaluate clinical data for real-world effectiveness. The FDA signed into law the 21st Century Cures Act, which impacted the Real-World Evidence Program in the U.S. It signaled a paradigm shift in medicine by formally recognizing the importance of real-world data (RWD) in bringing medical innovation to patients. But the promise of RWD comes with hurdles. Simply having massive quantities of data at your disposal does not equate to having meaningful answers—especially when considering the sheer length and complexity of medical records.
Medical records, in their raw form, are far from being set up in a consistently organized and tabulated format. Due to inconsistencies, redundancies, and format variations, it’s difficult to identify relevant aspects of a patient’s medical journey easily. To gain deeper clinical insight from information that holds nuanced and critical details about patient care, we must rely on sophisticated machine learning approaches like natural language processing (NLP). We must be able to process and make sense of the information-rich, natural language that exists throughout medical records to understand a patient’s health journey – regardless of where that information is stored.
The Challenges of Medical Data
The most important, insightful clinical data exist in narrative form. It captures the medical journey of a patient as a series of events. Many important elements are hard to contextualize while maintaining patient privacy and data security. The processes of inputting information are inherently flawed because electronic medical records (EMRs) come from administrative systems designed for reimbursement purposes, not research.
Medical record challenges are directly caused by the overall lack of standardization, clear labeling and enforcement of data consistency, but it’s not possible to retroactively apply standardization. Much of this variability comes from medical notes and the individual style in which people fill in medical information. Inconsistencies are commonplace as multiple healthcare providers input content and aren’t mandated to apply consistent standards.
Clinical content is harder to decipher when the stated intent of care does not match what the actual documentation reflects. The challenge is documenting everything that a healthcare provider has thought about and done in the care of a patient versus how that information is captured in the record. Therefore, you must consider how the coding of notes is impacted by human interpretation.
There is also the volume of information existing in patient records. Consider that one patient can make dozens of visits to different providers over several years. Each visit can generate different levels of data – the amount of medical information created becomes immense. As medical record data continue to grow exponentially, hospitals continue storing it in their own siloed way, based on individual workflows and operations. The result makes cross-comparisons difficult when, for example, contradictory outcomes emerge from the same procedures or hospitals use outdated data formats and modes of information sharing, such as faxes, scanned PDFs and pathology reports with handwritten scribbles. The trick is to bring order to this scattered data and highlight key elements that can lead to deeper insights into the nuances of data in a patient’s healthcare journey.
Machine Learning, Natural Language Processing and the Complexity of Health Data
To handle the Pandora’s box of health data that lives within medical records, we need to harness deep learning. Deep learning is a subset of machine learning that makes it possible for multi-layer, computational neural networks to solve complex problems. NLP, a form of machine learning, helps process and understand human language in a way that gets at the heart of what matters in the data.
By highlighting specific, clinically relevant content, the “noise” captured for regulatory purposes becomes less prominent and convoluted. The resulting data sets can be tailored based on the area of focus. Machine learning and NLP, transform data into an output from which we can draw insights. For instance, NLP can classify sections of medical records so that they’re more searchable, allowing us to “read and summarize” thousands of text pages faster than if we depended strictly on human methods of extraction.
Setting Machine Learning Up for Success
Machine learning is a voracious tool for data processing. But it requires time and training. Machines don’t have the same level of cognitive reasoning as humans, so they need human experts to “teach” basic rules to follow by labeling the data to ensure that the correct information is extracted from records. By training the machine to “read,” humans are guiding it with examples, themes and relevant concepts within the medical record text to create a coded (or structured) representation. With time, the process becomes more efficient at extracting nuanced information, vetted by human expertise, ensuring that it accurately aligns with the clinical question being asked.
A Deeper Dive With NLP
Extracting meaningful information from jumbled medical records depends on a tool that can understand individual medical records’ unique “grammar.” Critical questions that we often ask include: How do we differentiate various medical record sections and classify the document types stored within? How do we distinguish a patient’s history from their physical, discharge summary, lab results, visits and the like?
NLP relies on machine learning to be sensitive to the inconsistencies in how information is documented and the multiple ways a medical concept can be expressed, abbreviated or mistakenly written. NLP must also be adaptive to the constant, high-paced evolution of medicine, our disease understanding, how we test for diseases and the new terms and updated lexicon that reflect this change. In other words, NLP is at the heart of navigating semi-structured and unstructured data in medical records, and it begins interrelating what it extracts.
The Richness of RWD Insight
Tackling real-world health data and extracting rich insights depends on knowing the exact information that is being sought. It’s crucial to understand how one piece of information fits into the larger picture of the patient’s medical journey—stating clear goals when building models provides richer results from RWD.
Formulating relevant questions has never been so crucial, especially amid our evolving understanding of how the virus that causes COVID-19 works and impacts people’s health. This includes knowing the types and names of diagnostic tests patients have taken or whether a ventilator was used during treatment. The source, or point of information, also factors into the reliability of data, which is why it’s important from both an epidemiological and a treatment perspective to rely on rich, large-scale data for answers.
Where Health Data Will Take Medicine
Medical records are the most valuable piece in providing healthcare providers and clinical researchers with meaningful answers. This insight helps to paint a fuller medical picture of patient care. Identifying and summarizing the full patient journey depends on sophisticated NLP that quickly extracts and makes sense of information buried within an enormous mound of medical records. Combining every bit of relevant, de-identified patient data can inform a full timeline and profile. Achieving this allows us to automate and quickly access rich summaries of patient journeys to effectively inform medical innovation. Re-conforming unstructured data into structured formats will provide research organizations with data that are scaled to accelerate medical research and develop our ability to predict risk and effectively treat challenging conditions.
To learn more about Ciox’s DataFit Platform, visit us at cioxrwd.com.