Advancing Regulatory Compliance with Natural Language Processing


Jane Reed- Director Life Science Safety Regulatory Quality, IQVIA

The pharmaceutical industry is one of the world’s most heavily regulated industries, and the traditional document-centric approach poses challenges in terms of efficiency, collaboration, and compliance. Regulatory frameworks, guidelines, and reporting requirements are continuously evolving, and it is crucial for drug developers to keep pace with these regulatory changes, to avoid compliance issues and concerns.

Regulatory affairs are often perceived as traditionally conservative, with a heavily manual workload requiring human input for many repetitive tasks during operations. Accessing and analyzing the key data needed from the vast amounts of documents to develop submissions, keep labels up to date, understand guidelines, and maintain compliance with constantly shifting regulations requires significant resources in the form of time, money, and effort – activities that add to pharmaceutical companies’ costs but do not necessarily enhance revenue.

To overcome these barriers, drug developers are looking for digital transformations within regulatory disciplines to move from a document-driven to a data-driven approach. Regulatory teams need innovative technologies and systems that enable them to discover key data within regulatory documents, and extract and standardize these attributes, for use in downstream processes such as reporting, labeling, master data management, or structured content authoring.

Figure 1

As a result, many pharma regulatory teams are turning to artificial intelligence (AI) within regulatory data management systems, to automate processes, improve efficiency, and ensure accuracy in reporting. A key AI technology that can be applied is Natural Language Processing (NLP), which can transform unstructured text into structured data that can be rapidly analyzed, databases, or visualized.

What is NLP and Why is it Important?

NLP is a branch of AI that focuses on the interaction between computers and human language. NLP involves the understanding, interpretation, and generation of natural language by machines. It enables computers to process, analyze, and derive meaning from human language in a way that is similar to how humans do, from a vast range of text types (emails, tweets, literature, reports, reviews, and more).

NLP enables pharmaceutical companies to examine large collections of documents to discover new information or help answer specific research questions. The process is useful for identifying facts, relationships, and assertions that would otherwise remain buried in huge volumes of textual data. After this information is converted into a structured form that can be further analyzed, it can be integrated into databases, data warehouses, or business intelligence dashboards and used for descriptive, prescriptive, or predictive analytics.

Though the technology has existed for decades, NLP has matured in recent years, with products such as Siri, Alexa, and Google’s voice search employing NLP to understand and respond to user requests. More recently, the development of Large Language Models (LLMs, as made popular by ChatGPT) has invigorated discussions around the application of NLP technologies in fields as diverse as medical research, risk management, customer care, insurance fraud detection, and contextual advertising.

Modern NLP solutions are capable of analyzing unlimited amounts of text-based data without fatigue in a consistent, unbiased manner. These platforms can understand concepts within complex contexts and decipher the ambiguities of language to extract key facts and relationships, as well as provide summaries.

NLP has become an essential technology across pharma drug discovery, development, and commercialization, in part due to its ability to outperform traditional methods of search. Although traditional search engines like Google now offer refinements such as synonyms and auto-completion, the vast majority of search results point only to the location of documents, leaving searchers with the problem of having to spend hours manually extracting the necessary data by reading through individual documents.

Given the huge quantity of unstructured data that are produced every day throughout the healthcare industry, from EHRs to social media posts to medical images, NLP has become a critical tool for the industry to analyze text-based data efficiently. NLP can help improve the efficiency, accuracy, and innovation of drug discovery and development processes, by leveraging the vast amount of unstructured data available in the pharma industry.

In drug discovery, for example, NLP-powered literature mining identifies potential drug targets by extracting relevant information from scientific publications, accelerating target identification processes. In safety, NLP is used in automated extraction and analysis of adverse event reports, enabling faster identification of safety trends and signals for regulatory reporting. In medical affairs, for example, NLP can analyze topics and trends from communications with healthcare professionals, patients, regulators, etc. on medical information, education, and evidence.

How NLP Helps with Compliance

Leading drug developers are now using NLP for a variety of purposes within regulatory affairs, including to speed regulatory affairs and compliance, boost labeling processes, standardize regulatory data, map to master data management systems, and drive digital transformation in regulatory processes.

NLP provides substantial value across several regulatory disciplines, including:

  • Regulatory labeling: Access to drug labels from some of the larger regulatory authorities is important to help labeling teams find reference information for disease and symptom terms, contraindications, adverse events, special populations, and more.
  • Regulatory intelligence: Access to the landscape of regulatory updates, with integrated data flows to consume textual documents, both internal (such as corrective and preventive actions) and external (such as regulatory guidelines and FDA letters) is essential for regulatory teams.
  • Regulatory mapping: Compliance teams need a means of finding key data attributes from unstructured text documents and mapping that data to standards, such as Identification of Medicinal Products (IDMP), a set of international standards that define the rules that uniquely identify medical products.
Figure 2

NLP in Action: How Pharma is Using the Technology

Below are some real-world use cases that highlight how pharmaceutical companies are using NLP to improve regulatory compliance and processes:

Internal and external risk management: A top 10 pharmaceutical company’s product development and supply team needed a way to improve its understanding of internal and external risk management data to optimize the formulations, commercial supply, and post[1]market regulatory compliance of its products.

To fuel the initiative, the team developed a data lake to capture important internal and external feeds. Internal feeds included deviations, corrective and preventative actions (CAPAs), risks, and responses to questions (RTQs). External feeds included FDA warning letters, biological license applications (BLA) review reports, white papers, and industry benchmark repositories.

The team employed NLP to structure and generate this intelligence data, extracting concepts, relationships, and sentiments embedded in the information. The data’s value to the team is further enhanced by easy-to-understand visualizations, enabling end-users to drill down and navigate the information. These data pipelines and workflows are updated automatically and deliver sustainable and scalable reporting of the regulatory landscape, featuring key risks and recommendations to act upon.

Semi-automated regulatory intelligence tracking: Often, compliance teams depend on manual methods to monitor regulatory affairs, such as having individual team members regularly perform checks of relevant agency websites or subscribe to industry emails, to stay up-to-date on recent guidelines, public consultations, and meeting conclusions.

Although the process is important because it provides compliance teams with essential intelligence to identify key concerns, deadlines, events, and regulatory decisions for compounds of interest, it is generally costly in terms of resources and time.

One pharmaceutical company surmounted these barriers by using NLP to create a workflow to semi-automate information acquisition and summaries. A key feature of the company’s approach involved the integration of NLP technology with Large Language Models (LLMs), which served to enhance human teams’ abilities and drive more effective decision-making.

With these tools, the company used a combination of AI and human capabilities to create a regulatory intelligence assistant, which provided team members with user-friendly question-and-answer access to updated regulatory information and risk categorization for substances of interest. By employing this model, the team delivers dynamic insights into various regulatory fields, highlighting major areas of risk, by extracting, summarizing, and classifying information for user-specified substances.

Access to drug labels for more effective authoring: A leading pharmaceutical company is utilizing NLP technology to explore drug label data efficiently. Various teams within the company, including global labeling, regulatory affairs, medical, and safety, faced the challenge of identifying and accessing labels and label content from diverse sources and in multiple languages. To address this, a labeling intelligence “hub” was implemented, incorporating FDA Drug Labels, EMA Drug Labels, and local European databases, powered by NLP. The tool enables users to conduct customized searches, refine results, and export data for further analysis. Additionally, users can compare specific labels through an interactive view and access original documents directly. This solution streamlines the process of developing new labels, updating existing ones, and expediting regulatory approval, ultimately saving time for the teams involved.

NLP for identification of medicinal products and regulatory master data management: IDMP (IDentification of Medicinal Products) is a set of international standards, developed by the ISO, to define the rules that uniquely identify medicinal products and the relevant elements to identify them. IDMP is being adopted globally by health regulatory agencies and provides a common language to connect currently siloed data across R&D, safety and regulatory, and supply chain systems. Many pharma companies are using IDMP implementation to assist with master data management (MDM) across the enterprise. But one of the key challenges is that many of the data entities required are buried in unconnected silos of unstructured text.

A top 10 pharma consumer division used NLP to find and extract over 140 IDMP data attributes from a range of regulatory documents including Summary of Product Characteristics and regulatory dossiers (eCTD sections 3.2.S and 3.2.P). The challenges are familiar to anyone involved in IDMP compliance - varied document sets, some up to 50 years old, in mixed formats (Doc, Docx, image, and text PDFs), across different languages, integrating with SPOR vocabularies. The output needed to be mapped to the regulatory MDM schema for their IDMP submission and internal business use. NLP was highly effective for data extraction and document processing, saving the team significant time and resources.

Conclusions

Constant changes in regulations mean that companies require new tools and solutions to assist with regulatory review and compliance, to respond in an effective and timely manner. In some cases, meeting the regulators’ requirements is straightforward, while in other cases, accessing the necessary data can take a significant amount of time, money, and effort, while not necessarily increasing revenue. The inefficient, laborious, and error-prone nature of traditional manual search processes has led many pharmaceutical companies to use AI-based technologies to provide relief to compliance teams. Due to its ability to transform a wealth of internal and external data into high-value, actionable insights, NLP is among the primary AI-based technologies pharmaceutical companies are using to synthesize information from many sources to deliver critical supporting evidence for business decisions. Utilizing AI/ML tools, such as Natural Language Processing, enables digital transformation to improve efficiencies in regulatory disciplines. These innovative tools bring agility to regulatory teams, enabling them to rapidly address critical business issues across regulatory affairs.

Subscribe to our e-Newsletters
Stay up to date with the latest news, articles, and events. Plus, get special
offers from American Pharmaceutical Review delivered to your inbox!
Sign up now!

  • <<
  • >>

Join the Discussion