Creating and Maintaining Validated Microbial Identification Libraries within a cGMP Quality Environment

Overview

Microbial identification technologies have become an essential tool for manufacturing operations that are required to maintain environmental monitoring programs. Obtaining accurate, species level identifications are a critical component of these programs, especially during root cause investigations. Accugenix is a leader in providing genotypic microbial identifications to these highly regulated industries because of the enhancements to the sequencing methodologies which we have implemented. Additionally, we have shown that the accuracy of an identification is not only dependent on these methods, but also equally dependent on the library against which you compare your data (1, 2). Containing approximately 5700 unique bacteria, mold and yeast species, Accugenix has established the most comprehensive databases for the organisms encountered in manufacturing environments allowing us to provide the highest quality identification results of unknown isolates (1). Equally important is ensuring that the library databases are created in and remain in a validated state. The Accugenix library generation and maintenance activities are designed to ensure that bacterial and fungal reference libraries contain all possible organisms that are relevant to the industries we serve. They are also intended to verify that existing library entries are classified in accordance with current literature so that tracking and trending of environmental and objectionable organisms can be performed efficiently. Incorrect library entries can lead to inaccurate and inconsistent identifications and misdirected remediation efforts. Accugenix maintains a validation plan that is in accordance with the current Good Manufacturing Practices specified in 21 CFR Part 210 and 211. As an organization that has been inspected by the Federal Drug Administration and hosted hundreds of client audits, Accugenix understands the high standards to which the technologies we offer must adhere. Our customers can be confident that the data provided to them are current, reliable and accurate.

Experience Matters

 Having access to the most relevant, accurate and compliant microbial libraries is crucial to obtaining correct species level identifications for pharmaceutical, biotechnology, medical device and other regulated manufacturing operations. Accugenix recognized early on that the libraries associated with commercial systems were not sufficient. To minimize errors and allow for expansion of the reference libraries, we undertook the construction of our own validated, proprietary sequence identification libraries in 2004. The entries used to build the libraries consisted primarily of type strains, when available, as these are the organisms which define each species. In addition in 2010, we began to perform real time maintenance updates incorporating taxonomic changes and the publication of novel organisms. We are committed to performing our library maintenance while adhering to validated processes and release updates 1-2 times per month (Figure 1).  We give utmost priority to accurately review and update our bacterial and fungal reference libraries by employing highly-qualified, knowledgeable Ph.D. scientists with expertise in microbial taxonomy, phylogenetics and bioinformatics to manage this process. This team is led by our CEO, Douglas Smith, who brings over 40 years experience as an innovator at several leading companies. During his career, he has achieved significant success in the development and commercialization of automated microbial identification systems including both the MicroSEQ® v1.0 and gas chromatography fatty acid systems. His insight into these commercially available identification technologies and their microbial reference databases has given Accugenix a strong foundation on which to expand our sequencing platform and libraries. The result of the combined expertise at Accugenix is a genotypic identification system which shows a six-fold increase in precision and a two-fold increase in accuracy in the identification of environmental monitoring samples.(2)

Sources of Bacterial & Fungal Strains and Information

Accugenix depends on quality sources to obtain reliable taxonomic information for the species that are being identified using our validated bacterial and fungal reference libraries. New organisms are described daily in peer-reviewed scientific journals such as the International Journal of Systematics and Evolutionary Microbiology (IJSEM), Systematic and Applied Microbiology, Journal of Clinical Microbiology, Applied and Environmental Microbiology, Mycology, etc. The most highlyregarded source for information about new taxa and changes in bacterial and yeast nomenclature is IJSEM. As the international standard for nomenclature and publication of new species, IJSEM publishes articles pertaining to all phases of the systematics of bacteria and yeasts including taxonomy, nomenclature, identification, characterization and culture preservation. IJSEM publishes the lists of valid species names twice a month. Additionally, type strains and the accompanying information are deposited by the original author in at least two different culture collections (ATCC, DSMZ e.g.). These collections also publish lists of valid species names.

Since the 1990's, authors publishing new species or changes to nomenclature are required to deposit type strain sequences in GenBank®. GenBank is an open access, annotated collection of all publicly available nucleotide sequences which is maintained by the National Center for Biotechnology Information (NCBI). GenBank accession numbers are routinely specified in the publication that describes the novel species. The author directly submits the sequences to GenBank and is the only one authorized to make changes to the content of the file and a revision code/history is maintained. The GenBank database is revision controlled but not curated or validated. As a result, we do not utilize it during our routine identification process since sequence comparisons made against GenBank can match organisms with incorrect nomenclature, clonal sequences or poor sequence quality etc., in addition to the originally described or type strains for each species.

Another prime source for published type strain information is the EzTaxon-e genetic sequences database. The EzTaxon-e server contains a manually annotated and curated database of 16S ribosomal RNA gene sequences for bacterial type strains with validly described species names. The sequence information is not only from entries present in GenBank but also from other relevant publications. An additional reliable source for bacterial sequence information is the Ribosomal Database Project. This database is curated and only contains annotated sequences that are correctly labeled and of sufficient accuracy. Because of the time involved in researching the accuracy of sequences, release of updates to this database is about six months behind GenBank.

Unlike yeast nomenclature which is also controlled by IJSEM, there is no centralized committee controlling filamentous fungi nomenclature. Filamentous fungi characterization was under the botanical codes and was historically based on observable morphological features of the multicellular structures. However, these structures can vary substantially when grown on different substrates. Different names for the teleomorph and anamorph forms (sexual stages) of the same organisms are also common as are synonyms for each organism. To further complicate nomenclature, a type strain for many species has not been defined. Recently, there has been a collaborative effort to improve fungal nomenclature, all of which has been published in multiple sources. Because the information is not centrally controlled, the accepted approach is to base fungal nomenclature on information currently appearing in sites maintained by the International Mycological Association (MycoBank) and the Index Fungorum, for example, and culture collections including Centraalbureau voor Schimmelcultures (CBS).

Library Generation and Maintenance

All library generation and maintenance activities are conducted in accordance with Quality Assurance approved Standard Operating Procedures (Figure 2).  A risk-based approach is utilized during the library entry generation process and when building and testing the completed library. The goal is to reduce the probability of an undetected error in the library to well below one out of the total number of entries. By acquiring type strains and their sequences from the multiple sources described above, each with known independent error rates, and cross checking sequence quality, the probability of an error going undetected is reduced. The published and deposited type strain, when available, and the type strain sequence for the organism are part of the official description of a species and are by definition correct. The current definition is used until a new definition for the species name, type strain and/or type strain sequence is published.  For high risk organisms, those that are frequently isolated from manufacturing environments, the library entry is built by purchasing the type strain from a culture collection and sequencing the organism in-house. For low risk organisms, those not encountered often, the library entry is built by acquiring the highest quality type strain sequence that can be found in public databases. In both cases, the sequence goes through a comprehensive evaluation to ensure it is consistent with the sequence published in the literature.

 After building an accurate library entry for a type strain, the sequence is tested for validity against at least one sequence having an independent source. This is most often the original sequence published in IJSEM or a higher quality sequence from a more recent taxonomic study. Phylogenetic justification requires that the new species entry clusters with other species in the same genus at a genetic distance that is consistent with the genetic distance that separates other closely related species in that genus (Figure 3).

  • For all purchased type strains sequenced by Accugenix, library entries are considered acceptable if their sequences are not inconsistent with the taxonomic publication that defines the species.
  • For species defined before sequence technology was used for this application, the sequence generated must be phylogenetically consistent with other related species in the same genus.
  • For all sequences acquired from public databases, library entries are considered acceptable if the sequence is referenced in the taxonomic publication that defines the species or the sequence is derived from the published type strain. In either case, the sequence generated must be phylogenetically consistent with other related species in the same genus.

All proposed entries are independently evaluated by trained scientists. For all entries, associated Identification Reports are generated and reviewed by trained Data Review Scientists. All documentation associated with each proposed entry is approved by Quality Assurance (Figure 4). All original library entries were built in this fashion and incorporated into our original validated library release in 2005.

Since the original library release, we have begun a maintenance program which incorporates the discovery and publication of novel species and the reclassification of existing species in our database. A library entry can be initiated in two ways (Figure 2). By reviewing the sequence data for customer samples that have failed to produce species level identifications, likely candidates for new library entries become apparent. The unique sequences are compared to public depositories to determine if the organism is novel and has been recently described in the literature. The organism’s validity is then verified by confirming that the species/type strain has been published in at least one peer-reviewed scientific journal. Additionally, we perform general library maintenance by regularly reviewing current publications. Accugenix can readily add newly published organisms or modify an existing entry due to a retraction published in a peer reviewed journal or a change to current microbial taxonomy. Once the type strain or other relevant strain has been identified, the entry building process proceeds as previously described, generating a Quality Assurance approved library entry.

Library Release

After QA approval, the library entry is certified as ‘valid for use’ and immediately available for generating identifications. A new library version is generated that includes all of the qualified, valid for use entries compiled since the last library version. The completed library is tested for phylogenetic consistency and performance. The phylogenetic consistency test is performed to verify that when a library entry is tested against itself, it results in an exact match to itself and exhibits the typical species/species distance in its genus. The performance test compares the new and old library to ensure that the historical sample data set of known species is identified properly and to show an improved ability to identify unknown customer samples. Quality Assurance approves all documentation associated with the release of the new library version prior to being used to generate identifications for customer samples.

Library Validation

The Accugenix validation approach is based on our expertise in the arena of microbial identification. The validation program is designed to be in accordance with the current guidelines for equipment, processes and computerized systems. Accugenix uses an integrated team approach to validation that includes expertise from multiple departments. Accugenix’s quality systems include a validation plan that specifies the studies or tests to use, the criteria appropriate to assess outcomes, the timing of qualification activities, the responsibilities of the relevant departments, and the procedures for documenting and approving the qualification. All validation activities are documented and summarized in a report with conclusions that address criteria in the plan. Quality Assurance reviews and approves the plan and report. Validation of the libraries is one component that assures the quality of the services offered by Accugenix. In order to confirm that the libraries are maintained in a state of control over the life of the maintenance process, verification activities are conducted for each release and more extensive validation testing performed annually.

Library validation consists of three sets of testing activities which are designed to ensure that the entries in the libraries are accurate and can reproducibly produce correct identifications. The first two tests are designed to check the capacity of the libraries to produce identifications that are consistent with the identities of the type strains. The third test challenges each entry to make sure that is it consistent with current nomenclature and taxonomy. Test one requires that a proportion of the most frequently identified organisms are tested to verify that the correct species level identification is obtained. The second test consists of selecting a subset of new and modified entries and processing live specimens to verify that they also produce correct species level identifications. The third part of the validation, which is the most labor-intensive, requires that all entries are searched against public databases to verify that each entry is valid and correct. We have found that approximately 4% of the results are discrepant due to errors in GenBank, from which EzTaxon-e obtained sequences, but in-house sequencing has clarified (such as undefined nucleotides designated as N, any base). The protocol, raw data, Identification Reports and Final Report are all maintained as part of the official validation package. The validations are approved by Quality Assurance management and are maintained on file for review by our customers during on-site auditing activities.

Conclusion

Accugenix understands the importance of having a compliant quality system in place to oversee all operations. The quality system at Accugenix is purposely designed to ensure that laboratory processes, computerized systems and equipment, as well as the reference libraries that are used to generate test results, are all adequately maintained in a state of control. For reference libraries, specifically, we recognize that it is not acceptable to rely on non-curated public databases or the outdated libraries that are distributed with commercial identification systems. These commercial systems are typically not designed for identifying environmental isolates found in manufacturing facilities. Having direct knowledge of the organisms that are routinely isolated from pharmaceutical, biotechnology, medical device and other manufacturing facilities worldwide has given us an advantage in determining what organisms are necessary in our reference libraries. We are committed to maintaining the most up-to-date, relevant and compliant reference libraries available. Having experienced leadership, highly trained, dedicated personnel and a robust quality system provides the resources for accomplishing this goal and being the leader in microbial identification.

1. Bacterial Library Comparison. 2011. Accugenix Technical Note.

2. Manual reference method versus commercial automated software for data analysis and result interpretation of 16S bacterial sequences. 2011. Accugenix Technical Note and Poster presentation.

Comments