Use of Statistical Models to Uncover Fraud, Shorten Timelines and Improve Patient Safety

Sylviane de Viron, Laura Trotta, Sebastiaan Höppner, Steve Young, and Marc Buyse - CluePoints

Unsupervised statistical monitoring could have uncovered fraud at a clinical trial site around a year earlier than traditional methods were able to do.

That was the main finding of an article recently published in the Journal of Therapeutic Innovation & Regulatory Science (https://doi.org/10.1007/s43441-021-00341-5) in collaboration with Boerhinger-Ingelheim. By looking at a clinical trial with known fraud the method proved that it can save time and money, while, crucially, protecting data quality and integrity.

Risk-Based Monitoring

Risk-Based Quality Management (RBQM) and Risk-Based Monitoring (RBM) are shifting the clinical research landscape.

Endorsed by both the FDA and the EMA, and encoded in the ICH E6 (R2) GCP update, it encourages limited and more targeted use of traditional on-site monitoring practices such as frequent site visits and 100% source data verification/review (SDV/SDR), and an increasing reliance on sophisticated centralized monitoring.

This proactive approach uses statistical models to compare sites and look for outliers that could point to quality-related risks, such as unusual or unexpected data patterns.

Highlighting the possibility of issues such as fraud, inappropriate training, or poor understanding of the protocol during the study gives sponsors the opportunity to investigate and take corrective action before they might have a significant negative impact on data quality. In turn, this can help prevent costly delays in getting important new therapies to patients.

In recent years, a growing body of evidence has emerged to show that centralized monitoring can help direct precious resources to those sites in most need of attention, and drive higher-quality outcomes overall. However, the practice has yet to break through into mainstream research.

Looking Back: Detecting Fraud

Working with partners at Boehringer Ingelheim, the International Drug Development Institute (IDDI), and the Interuniversity Institute for Biostatistics and Statistical Bioinformatics, CluePoints reanalyzed the database of a large randomized clinical trial, known to have been affected by fraud.

The Second European Stroke Prevention Study (ESPS2) was conducted in the early 1990s. The international, multisite, randomized double-blind trial compared acetylsalicylic acid and/or dipyridamole to matching placebos in the prevention of stroke or death in patients with pre-existing ischemic cerebrovascular disease. More than 7,000 patients across 60 sites in 13 countries took part in the trial.

Serious inconsistencies in the case report forms (CRFs) of one site, #2013, led the trial’s steering committee to question the reliability of the data. A for-cause analysis of quality control samples, and extensive additional analyses including blood concentrations of the investigational drugs, proved that the patients had never received these drugs as mandated by the protocol.

The site’s data, comprising 438 patients, was excluded from the trial analyses, and after lengthy legal proceedings the investigator was convicted.

Our objective in the reanalysis of the trial data was to assess whether the fraud could have been detected earlier with the use of an unsupervised statistical monitoring approach. This approach, referred to in the CluePoints platform as a data quality assessment (DQA), works on the principle that data from all the sites participating in a clinical trial should, aside from the random play of chance and systemic variations, be largely similar.

Analyses of ESPS2

The ESPS2 study database provided to CluePoints for this reanalysis comprised all clinical data from the CRFs and laboratory results, across all patients and sites. The project was divided into two phases. The objective of the first phase was to confirm if DQA would identify the known fraudulent site (#2013), while the second phase aimed to find out how much earlier this site would have been detected. While the fraudulent site was known to the sponsor team, CluePoints researchers remained blinded until after the analysis was complete and the findings were presented.

Figure 1. Bubble plot showing five sites with a Data Inconsistency Score (DIS) > 1.3 (P-value < 0.05). Site with known fraud is denoted B (Site #2013)

In phase one, CluePoints, advanced set of statistical tests was applied across the completed study database to identify unusual patterns across sites. This generated hundreds of p-values per site, a weighted average of which were computed and converted into an overall data inconsistency score (DIS) for each of the sites.

Site #2013 was assigned DIS of 4.14, which identified it as the second most atypical site across the 60 sites.

Closer examination of the site #2013’s data found they had not reported any non-serious adverse events (only serious adverse events had been reported), and that there was atypically low variability between and within patients’ laboratory results and vital signs.

In addition, the analysis revealed multiple atypical proportions, and missing values in domains such as study drug compliance and adverse events. These findings were consistent with the sponsor’s previously published conclusions.

In phase two the research team carried out the same analyses, but on versions of the study database representing incrementally earlier timepoints in the execution of the study, to reproduce the effect of the study being subject to regular on-going central statistical monitoring reviews (DQAs).

The phase two analyses revealed that site #2013 would have been detected as atypical when only 25% of its final subject data volume had accrued, which was in May 1991.

By contrast, the ESPS2 study team first developed suspicions and carried out a detailed statistical assessment of the site’s data 13 months later, in June 1992. At this point, approximately 75% of site #2013’s data had accrued.

This led to a for-cause audit in January 1993, and to an expert review of patient compliance through an analysis of blood samples in June 1993. It was only at this point that the fraud was confirmed, and the site’s data excluded from the trial. The exclusion of site #2013’s data did not materially affect the results of this trial.

Had CluePoints DQA been used in this study, the DQA findings would have enabled the sponsor to confront the investigator, and ask for an explanation of the erroneous data at least a year before the traditional approach had allowed.

CluePoints analysis of ESPS2 data did not only flag site #2013 as atypical but it also detected four additional outlying sites, all of which would have warranted further investigation if they had been uncovered at the time.

A closer look at the data of the Rank 1 most outlying site revealed that the patient population was atypical in comparison to the other sites. Patients from this site were older and had a higher number of medical histories. This shows that regular monitoring during the conduct of a trial might not only detect data error issues but also atypical populations for which an in-depth medical review might be needed.

These results clearly show that centralized statistical monitoring can provide the sponsor with a deeper, more holistic understanding of how the treatment affects different groups, and how trials are executed at different sites.

Looking Ahead: The Future Of RBM

There is a growing consensus that centralized monitoring, using statistical techniques, is more likely to detect data anomalies than on-site monitoring or source data verification (SDV).

As this article shows, advanced statistical tests provide real opportunities to spot, and therefore report, fraud in clinical trials, the true prevalence of which is largely unknown.

Very few publications have been made and no systemic reviews have been conducted – but sponsors and clinical investigators have an ethical obligation to report fraud. It is, quite simply, imperative to ensuring the transparency and integrity of clinical research, and maintaining public trust in the process.

CluePoints experience with central statistical monitoring suggests that overt fraud is relatively rare, although other causes of data errors including sloppiness and a lack of understanding or training are quite common. Increasing the use of DQA in clinical trials may also improve patient safety and enhance efficacy assessment by providing valid data. This is because it can spot sites with atypical populations or patients for whom deeper medical review might be needed.

Protect Safety, Avoid Delays

In conclusion, our latest study shows that an unsupervised approach to central monitoring, using statistical models, can effectively detect sites with fraud or other data anomalies in clinical trials. This demonstrates that increasing the use of such methods could enhance safety concerns and, due to more accurate study data, improve the reliability of study outcomes.

By detecting issues earlier than traditional methods and giving sponsors the opportunity to deploy quick and decisive remedial action, the approach can shorten the drug development timeline by helping to avoid unnecessary – and expensive – delays to approval and market.

Subscribe to our e-Newsletters
Stay up to date with the latest news, articles, and events. Plus, get special
offers from American Pharmaceutical Review – all delivered right to your inbox!
Sign up now!

  • <<
  • >>

Join the Discussion