Unlocking the Potential of Medical Imaging as Real-World Data


Elif Sikoglu- Senior Director of Life Sciences Solutions, Flywheel

The FDA’s definition of “real-world data” (RWD) includes “data derived from electronic health records, medical claims data, data from product or disease registries, and data gathered from other sources (such as digital health technologies) that can inform on health status.” Under this definition, medical imaging can certainly be understood as a type of RWD. However, in the drug development process, where RWD is playing an increasingly important role, medical imaging is not often among the data that teams are actively leveraging.

Standardized, Digital Imaging and Communications in Medicine (DICOM) based data, across different medical imaging modalities, can be a powerful type of RWD. These images are large and complex objects that provide rich information—often in 3D, and often with a fourth, longitudinal, dimension. This rich data can deliver important insights in drug discovery and development.

Information acquired via different imaging modalities can be used to better characterize disease mechanisms and provide insights to therapeutic responses. This information could directly impact the speed of delivery, as well as precision for much-needed therapeutics. For example, larger and more realistic data sets may lead to more accurate predictive models to inform clinical trial protocols that select more targeted patient cohorts and help identify novel biomarkers as objective study endpoints. Historically, however, leveraging medical imaging for R&D has been challenging. This is due to several factors:

  • Data is not acquired utilizing a standardized protocol across sites and subjects and therefore may not be suitable for population analysis due to confounding factors.
  • Collecting the data from different sources may not be trivial due to a lack of tools and a clear plan with associated instructions.
  • The data prep required to leverage imaging is often significant. Imaging data must be de-identified, harmonized and uniformly curated. Using traditional methods, performing these tasks at the scale required for R&D can take weeks or months.
  • Scale poses another challenge in data storage and computation processes. Medical imaging datasets are very large in size, and running computations on them can be time consuming and costly without cutting-edge infrastructure. A single MRI could be a gigabyte in size, while X-rays are in the megabyte range. The processing power these datasets require was not commonly accessible until our current cloud age.
  • Finally, expertise is required in assessment and analysis of the images to make them usable for research. Targeted research questions, a detailed study plan and rigorous execution are needed. To this end, tools (including viewer, reader forms, analysis packages, etc.) should be accessible to the researchers.

For drug developers, the promise of mobilizing RWD with data acquired from other sources, including exploratory data, legacy and retrospective data, and data held by CROs, is too great to ignore. If these types of data can be centralized and leveraged, they can fuel innovative new approaches for artificial intelligence/machine learning (AI/ML) and complex analysis within early and late-stage R&D. However, this requires the ability to manage petabytes of data—representing millions of patient exams and many time points. The strategies that organizations design to harness this unwieldy data can make or break their success.

Strategies for Managing Medical Imaging in R&D

While imaging has been underleveraged in the past, this is changing within some pharma enterprises. Pharmaceutical companies are now more frequently requesting to take possession of imaging datasets from CROs and other partners. They may also have access to real-world imaging via partnerships with academic medical centers. These early adopters have discovered some foundational data management strategies that are fueling efficient use of imaging including RWD in R&D.

One common strategy is the use of a centralized data management platform, which is crucial to unite imaging data with other types of data and to prevent data siloing. A platform with support for multimodal data is key to facilitating modern research. In the realm of RWD, the ability to unite various data types can help wring extra value. For example, real-world imaging acquired from an academic medical center is more useful when accompanied by clinical reports and image annotations.

Such a platform should have ingestion and curation capabilities that can bring in the necessary datasets efficiently, with audit trails and repeatable processes. As noted, the datasets involved in R&D are at the petabyte level, so automating and parallelizing ingestion and curation wherever possible can help the data-centralization step move faster.

De-identification of data is of course critical, and tools for de-id should be deployable at the edge of a data platform whenever possible to minimize risk. Because the types of personal health information (PHI) at risk of exposure can vary widely from modality to modality, teams need various de-id tools. Text-based anonymization can be applied to data in the DICOM header at scale in a repeatable method. Imaging data can also contain PHI that goes beyond text and hence pixel-based anonymization techniques (e.g., redaction of burned-in text, “defacing” of the exterior layer of the skin in brain images) are important to better ensure compliance.

Where multimodal data is being used, there is an additional layer of complexity in de-id strategies. Anonymization and PHI removal should be performed on the imaging data in a way that maintains the ability to link it with the other types of data pertaining to the same patient.

Once data is centralized in a platform, the metadata embedded within the images can be extracted. This is most efficiently done with automated tools that can index large volumes of data via a scalable method. It is also useful to have the ability to perform validation and verification on the data. The quality of imaging data from real-world datasets can vary, so the ability to automate quality control is valuable. This step also enables teams to correct issues in the data or metadata early and/or eliminate some of the datasets in the process, helping prevent problems that can arise from using poor-quality data to train ML models.

The ultimate goal of these data management strategies is to turn complex imaging data into analysis-ready datasets that can be used and reused over time. So, while there may be an initial lift to implement platforms and tools that can perform these functions, the automation they can bring to what were previously very manual, time-intensive procedures can yield dividends.

Imaging R&D Successes in Life Sciences

Real-world imaging data is utilized heavily in AI/ML initiatives at pioneering companies and within academia. One top-five pharma organization has used a data management platform to power the digital transformation of its R&D department. After centralizing its imaging assets alongside its legacy data, this organization had more than 50 million standardized and organized imaging studies. This data is now accessible to the many members of the enterprise’s R&D teams and can be used and reused without creating duplicate data. Research teams are now leveraging this data for complex analysis and machine learning.

In another application of imaging RWD, the University of Wisconsin Medical School studied 50,000 datasets during the early days of the COVID pandemic to create an AI model to assist in diagnosing COVID-19 via X-ray. This research united chest X-rays from multiple hospital PACS with EMR data showing PCR test results. It also used a database of chest X-rays from the NIH for control data. Researchers efficiently organized the images and created workflows for labeling and AI training. The resulting deep neural network was able to differentiate COVID-19 from other types of pneumonia “with performance exceeding that of experienced thoracic radiologists.”

As these examples illustrate, imaging data can be leveraged to address both broad organizational goals, as well as pressing public health emergencies.

Putting Data at Researchers’ Fingertips

Data science teams consistently report that the majority of their time in machine learning projects is spent on mundane data wrangling. This is especially true when utilizing messy, non-standard imaging RWD—which, if it can be tamed, is a powerful component in R&D.

Organizations, therefore, need a framework for efficiently incorporating imaging RWD into their research workflows. This holds the potential for enabling rich pathways for drug discovery research and accelerating the development of AI applications. While imaging has previously been underleveraged due to the complexities of managing the data, modern tools are finally making it feasible to work with at scale.

For forward-thinking pharma leaders, the implications are clear: the efficiency with which an organization can handle data management is a major determining factor in the productivity and the cost of R&D. The promise that RWD holds for research breakthroughs demands thoughtful strategies for putting this data within reach for researchers as efficiently as possible.

Author Biography

Elif Sikoglu is the Senior Director of Life Sciences Solutions at Flywheel, a biomedical research data platform. She holds a Ph.D. in biomedical engineering and was previously the medical director of an imaging CRO. Elif has expertise in utilizing advanced imaging approaches in neuropsychiatric indications and significantly contributed to clinical trials with imaging-supported endpoints across multiple therapeutic areas.

Subscribe to our e-Newsletters
Stay up to date with the latest news, articles, and events. Plus, get special
offers from American Pharmaceutical Review delivered to your inbox!
Sign up now!

  • <<
  • >>

Join the Discussion