An Interview with Dr.Arne Kusserow and Robin B.Shore, Merck, KGaA, Darmstadt, Germany

Dr. Arne Kusserow-Product Manager Converter, Connected Lab, Life Science Merck, KGaA, Darmstadt, Germany

Robin B. Shore-Content Development Manager, Connected Lab, Life Science MilliporeSigma, Burlington, MA. A business of Merck KGaA, Darmstadt, Germany

Chromatography is ubiquitous in pharmaceutical labs. Can you give us a general overview of the types of chromatographic analysis pharmaceutical companies are doing, and how important these chromatographic instruments are to these companies’ efforts to research and develop new pharmaceuticals?

As you said, chromatography is ubiquitous: it’s used in many different forms and for answering numerous analytical questions in Research and Development (R&D), production, and Quality Control (QC). There are High Performance Liquid Chromatography (HPLC), Gas Chromatography (GC), Ion Chromatography (IC), Supercritical Fluid Chromatography (SFC), and Capillary Electrophoresis (CE) instruments. Each of these instrument types has one or more detectors. The number of different detector types is too large for all of them to be listed here. In many cases, a Chromatography Data System (CDS) is used to control the instruments. Many instruments may be connected to the same CDS. These systems are extremely powerful tools, but it’s hard to get data out of them.

Chromatographic instruments are crucial to almost every step in the development and production of pharmaceutical products and their ingredients. Chromatography may be the most used and most widely distributed analytical technology, and it’s by far the one with the biggest market. The data produced by chromatographic instruments are key in drug development and have many other applications in basic academic research, food and beverage, environmental testing, and hundreds of other fields. There are hundreds of thousands of chromatographs in laboratories. All of them produce a lot of valuable data, but only the smallest part of the data is currently used or even available for any usage. Most data are deleted, stuck in an instrument’s software, or lost.

The amount of data collected by the industry has grown exponentially in recent years. What are some general industry issues that have resulted from this massive growth in data generated and collected?

First, data from different devices, databases, and other sources must be normalized to a common, open, human-readable standard format such as XML. Since in many cases devices use proprietary file formats, the data are tied to the instrument’s software—they can’t be read and used outside of it.

Second, the more data you have, the better you have to structure it to make use of it. Imagine a device that produces files automatically. Such files have generic file names. How can one find a particular file among hundreds of thousands of very huge files, all named generically?

Third, you need contextual data. Results data is fine, but you need metadata on users, samples, experiments, timestamps, and equipment to find the data and to perform AI or ML (Artificial Intelligence, Machine Learning). You also need metadata for audit trails.

Fourth, you need a good storage and archival solution plus data orchestration to handle such large amounts of data. In laboratories in particular, we’re talking about terabytes per file. Examples of such files are high-resolution 3D microscopic reconstructions and DNA/protein sequencing, but also files relevant to advanced chromatography applications and Mass Spectrometry (MS). Unlike in other industries, in laboratory-based research the data themselves are the assets instead of physical products.

And of course, there are the more general issues that Big Data has, such as upload/download times and archival solutions.

Looking specifically at labs and chromatographic data, what current issues are hindering pharmaceutical companies from properly leveraging data to make informed decisions?

Scientific instruments in almost all cases store their data in unreadable proprietary formats. Those data can’t be used for integration into leading systems, exchange with collaborators, long-term storage or any other secondary use. In Pharma you’d have to retain the instrument, its software, and an expert on both over the whole regulated period, which in pharma is up to 99 years, to remain in regulatory compliance and be ready for audits. Pharma companies typically avoid this by printing out all compliance-related data on paper or manually converting it to common file formats like .pdf, .txt, or .csv. But trying to find specific data in piles of paper or servers full of PDFs is a challenge and no raw or processed data is stored that can be accessed and reused. PDF files are not suitable for long term storage. To get access to all data, you need a converter for each of the instruments and software applications you want to pull data from. This is another good reason to use an open file format like XML.

If you don’t use an open format, you can’t use those data outside of the instrument software. It makes little sense to store or archive such data; they’re not available for AI, ML or any other application. In such a scenario, you can only integrate your instruments one by one with a Lab Information Management System (LIMS) or Electronic Lab Notebook (ELN) in close cooperation with the LIMS/ELN provider. This takes years to accomplish and costs a great deal of money—and, in the end, your data will still not be usable outside of the instrument/ LIMS/ELN environment.

But today, many converters are commercially available that support Analytical Information Markup Language (AnIML), an XML-based open data format with additional functionalities to meet the very special demands of laboratory instruments and their data. Using AnIML with available software tools makes all lab oratory instrument data FAIR: findable, accessible, interoperable, and reusable. These data can easily be visualized, organized, and shared with colleagues or third-party software.

If a pharmaceutical company wants to integrate its lab equipment for collecting and processing data in order to give itself a holistic view, what steps and technology should be implemented to accomplish this goal?

In general, there are two types of instruments, slightly different from one another. Relatively simple instruments like balances, pipettes, and pH meters produce small amounts of simple data. Some of them aren’t even controlled by software. For these instruments, you must use the existing interfaces such as RS232 and USB to establish a connection and to convert the data to AnIML.

Then there are complex instruments like chromatographic systems, mass spectrometers, Nuclear Magnetic Resonance (NMR), and many others. For such equipment, you need a converter that grabs data from the instrument software and converts them to AnIML, either by capturing and converting data from files saved on a computer, or extracted from a database using a specific SDK. With very complex software like CDS, it makes sense to establish bi-directional communication between your data producers (instruments) and data consumers (LIMS/ELN). In this scenario, using the open communication standard SiLA2 perfectly complements the use of AnIML as a standard file format.

Looking ahead, what must the industry do to be able to successfully collect, manage, and analyze data to ensure the best results for its R&D, quality control, and environmental-testing programs in order to bring new products to market quickly, efficiently, and safely?

First, they must understand their data and the value of those data. As stated earlier, data are the main assets of labs. It makes little sense to spend a lot of money on all this expensive equipment and use only a small fraction of their data. Why not use all the data and prepare your lab for applications of the future such as AI and ML? You can use those data to optimize your processes, your protocols, and the quality of your experiments and results. It’s simply investment in data. We all know that data have enormous value, but oftentimes the value reveals itself only when you have it and when you analyze it. And, therefore, you need all the things mentioned in the answer to question 2.

Second, the industry must understand that while digitalization of a lab to such an extent is not easy, it’s well worth the effort. To minimize the disruptions these changes can pose to existing IT infrastructures, a middleware layer is needed to orchestrate the flow of data between the different systems in the lab.

Learn more about Data Management Solutions: bssn-software.com

Subscribe to our e-Newsletters
Stay up to date with the latest news, articles, and events. Plus, get special
offers from American Pharmaceutical Review delivered to your inbox!
Sign up now!

  • <<
  • >>

Join the Discussion