Blended Natural Language Processing Solutions Combine Best of LLM and Deterministic Approaches to Improve Drug Development


Jane Reed - Director NLP for Safety & Regulatory; Constantinos Katevatis - Associate Director NLP for R&D, IQVIA.

The impressive abilities of general-purpose chatbots such as ChatGPT and Bard, powered by large language models (LLMs), to generate realistic text has prompted many life science organizations to re-evaluate their approach to natural language processing (NLP).

Recently, there has been substantial innovation in NLP, including the widespread adoption of Machine Learning (ML) models capable of generating high-quality, human-like text. These new tools answer questions, translate text, generate summaries, and more, enabled by conversational interactions that can help scale human interaction with knowledge.

In specialized and sensitive domains such as healthcare - particularly in drug development - it is important to understand where these newer technologies can safely, effectively, and ethically be applied for regulated processes and patient care, for example, in applications such as product safety, care delivery, or health-related evidence generation. In that context, NLP tools should be evaluated to demonstrate high accuracy and reliability, whether to code an adverse event, identify a biomarker or mutation, extract a medication dosage or a lab measurement, or describe a general patient profile.

Accuracy, however, isn’t the only key consideration. For many applications, NLP tools must also provide evidence of where the information originated in the source data and allow for human review of the original context or record. As we work to integrate the power of LLMs into drug development workflows and healthcare processes, responsible innovation of these new technologies is imperative to understanding where they deliver meaningful benefits and where they leave risks or gaps.

Blended Solutions: The Best of Both Worlds

Healthcare information extraction tasks require accuracy, with high stakes for patients and providers. Full automation of chart reviews for a clinical study, for example, means that uncertainty in NLP performance puts the scientific integrity of the study at risk. LLMs have been shown to “hallucinate,” meaning that they can produce plausible but factually incorrect statements, making it is essential to benchmark and compare the accuracy of different NLP approaches, before evaluating their appropriateness for certain tasks.

The U.S. Food and Drug Administration has provided guidance stating that measuring the accuracy of NLP software that is used to extract and transform data is one necessary component of demonstrating reliability. When measuring algorithm quality, recall and precision are two of the most important and commonly used performance metrics.

LLMs are trained broadly and extensively to accommodate a wide range of use cases, rather than providing an optimal solution for a specific use case. This can present challenges when a specific and specialized task needs to be performed, such as entity normalization from clinical or scientific text. To improve performance on specialized tasks, there are a range of NLP techniques that can be combined with state-of-the-art LLMs to create performant “blended” solutions.

Generally, off-the-shelf versions of the latest LLMs achieve good performance on general extraction tasks, but inferior performance when compared to blended, domain-specific solutions applied to tasks that require sophisticated and specialized knowledge. Effective modern NLP offerings combine deterministic approaches (e.g., semantics and ontologies) combined with sophisticated ML algorithms such as fine-tuned BERT models for sub-tasks, trained on domain-specific data with a deep understanding of the client problem and regulatory environment. For instance, in clinical and scientific documents, standardized terminologies can be used to search and extract the mention of an indication, drug, dosage, or population information from specific document regions. Linguistic patterns can be used to normalize mentions of relevant relationships, and machine learning models can extract novel adverse events mentioned with the drug. LLMs can also be trained to create summaries or fine-tuned for extractions from complex or inconsistent text formats.

Maximizing Value Beyond NLP Performance

Of course, NLP platforms do not deliver much value to users unless they are user-friendly and facilitate human review and validation. For example, use cases that may require human validation, such as targeted literature reviews, extract multiple complex data points that are interdependent. Curating gene-disease or gene-pathway associations can be useful before the extracted data is visualized through knowledge graphs or analytics. In these cases, human review plays an important role in driving the speed and data quality of the end-to-end process. To meet this need, leading NLP platforms provide rich out-of-the-box functionality alongside tooling for validation and gold standard creation, as well as workflow integration.

Choosing the Right NLP

There are many considerations when selecting which products, technologies, and vendors will be used to extract insights and information from vast unstructured healthcare data sources. Factors to examine when identifying the most appropriate NLP solution include:

  • What level of accuracy is required? If it is a component of a regulated process, you will need clear performance metrics on standard benchmarks.
  • Can you use hybrid techniques? Large language models combined with existing NLP techniques yield strong performance.
  • Can you justify the cost of base LLM development? Including training, re-processing, deployment, maintenance, and support.
  • Do established low-cost techniques solve the problem? Tried and true NLP methods e.g., rules-based and ontologies can effectively solve several tasks at a fraction of the cost.
  • Can you validate extractions, with human-in-the-loop review in a workflow to meet audit needs? Review and curation tools can create fit-for-purpose solutions and aid adoption by end-users.
  • Is the representation of the underlying source text needed? Marking up extractions in the source documents can be essential for certain types of use cases.

While the emergence of LLMs is prompting many drug developers to explore how this new technology can streamline drug development, most off-the-shelf LLMs are not yet ready for domain-specific work that depends upon deep expertise. That’s why, for most healthcare applications today, the best approach involves a “blended” solution that features a combination of NLP technologies, including LLMs, to extract and capture critical data. Ultimately, patients will see the largest benefits from this new technology when scientists, medical experts, clinicians, and engineers work together to implement LLMs and NLP effectively, ethically, and safely.

Publication Detail
This article appeared in American Pharmaceutical Review:
Vol. 27, No. 1
Jan/Feb
Pages: 36-37


Subscribe to our e-Newsletters
Stay up to date with the latest news, articles, and events. Plus, get special
offers from American Pharmaceutical Review delivered to your inbox!
Sign up now!

  • <<
  • >>

Join the Discussion