DNA Extraction and Host Depletion Methods Significantly Impact and Potentially Bias Bacterial Detection in a Biological Fluid

ABSTRACT Untargeted sequencing of nucleic acids present in food can inform the detection of food safety and origin, as well as product tampering and mislabeling issues. The application of such technologies to food analysis may reveal valuable insights that are simply unobtainable by targeted testing, leading to the efforts of applying such technologies in the food industry. However, before these approaches can be applied, it is imperative to verify that the most appropriate methods are used at every step of the process: gathering of primary material, laboratory methods, data analysis, and interpretation. The focus of this study is on gathering the primary material, in this case, DNA. We used bovine milk as a model to (i) evaluate commercially available kits for their ability to extract nucleic acids from inoculated bovine milk, (ii) evaluate host DNA depletion methods for use with milk, and (iii) develop and evaluate a selective lysis-propidium monoazide (PMA)-based protocol for host DNA depletion in milk. Our results suggest that magnetically based nucleic acid extraction methods are best for nucleic acid isolation of bovine milk. Removal of host DNA remains a challenge for untargeted sequencing of milk, highlighting the finding that the individual matrix characteristics should always be considered in food testing. Some reported methods introduce bias against specific types of microbes, which may be particularly problematic in food safety, where the detection of Gram-negative pathogens and hygiene indicators is essential. Continuous efforts are needed to develop and validate new approaches for untargeted metagenomics in samples with large amounts of DNA from a single host. IMPORTANCE Tracking the bacterial communities present in our food has the potential to inform food safety and product origin. To do so, the entire genetic material present in a sample is extracted using chemical methods or commercially available kits and sequenced using next-generation platforms to provide a snapshot of the microbial composition. Because the genetic material of higher organisms present in food (e.g., cow in milk or beef, wheat in flour) is around 1,000 times larger than the bacterial content, challenges exist in gathering the information of interest. Additionally, specific bacterial characteristics can make them easier or harder to detect, adding another layer of complexity to this issue. In this study, we demonstrate the impact of using different methods for the ability to detect specific bacteria and highlight the need to ensure that the most appropriate methods are being used for each particular sample.

ABSTRACT Untargeted sequencing of nucleic acids present in food can inform the detection of food safety and origin, as well as product tampering and mislabeling issues. The application of such technologies to food analysis may reveal valuable insights that are simply unobtainable by targeted testing, leading to the efforts of applying such technologies in the food industry. However, before these approaches can be applied, it is imperative to verify that the most appropriate methods are used at every step of the process: gathering of primary material, laboratory methods, data analysis, and interpretation. The focus of this study is on gathering the primary material, in this case, DNA. We used bovine milk as a model to (i) evaluate commercially available kits for their ability to extract nucleic acids from inoculated bovine milk, (ii) evaluate host DNA depletion methods for use with milk, and (iii) develop and evaluate a selective lysis-propidium monoazide (PMA)-based protocol for host DNA depletion in milk. Our results suggest that magnetically based nucleic acid extraction methods are best for nucleic acid isolation of bovine milk. Removal of host DNA remains a challenge for untargeted sequencing of milk, highlighting the finding that the individual matrix characteristics should always be considered in food testing. Some reported methods introduce bias against specific types of microbes, which may be particularly problematic in food safety, where the detection of Gram-negative pathogens and hygiene indicators is essential. Continuous efforts are needed to develop and validate new approaches for untargeted metagenomics in samples with large amounts of DNA from a single host. IMPORTANCE Tracking the bacterial communities present in our food has the potential to inform food safety and product origin. To do so, the entire genetic material present in a sample is extracted using chemical methods or commercially available kits and sequenced using next-generation platforms to provide a snapshot of the microbial composition. Because the genetic material of higher organisms present in food (e.g., cow in milk or beef, wheat in flour) is around 1,000 times larger than the bacterial content, challenges exist in gathering the information of interest. Additionally, specific bacterial characteristics can make them easier or harder to detect, adding another layer of complexity to this issue. In this study, we demonstrate the impact of using different methods for the ability to detect specific bacteria and highlight the milk fractions (24). Unfortunately, these potential introductions of bias are overlooked in most investigations.
While a number of studies have investigated commercially available protocols for nucleic acid extraction of milk in terms of DNA concentration, quality, and ability to detect specific pathogens (25)(26)(27), their main objective was not the assessment of differential DNA extraction or biases in the representation of diverse bacterial populations, which would require the inclusion of mock bacterial communities of interest in a milk sample and comparison of several protocols.
Host DNA contamination is a challenge not exclusive to milk and other food samples, as mammalian DNA has been shown to dominate the number of sequencing reads in cerebrospinal fluid, skin, vaginal, and oral metagenomes in humans (28)(29)(30). To tackle the host DNA issue, enzymatic and immunomagnetic protocols aimed at decreasing host DNA contamination became commercially available and have been tested in select sample types (28,29,31). In addition, a number of "homebrew" methods have been tested to allow for successful depletion of host DNA (28,32,33). Nevertheless, these methods are not guaranteed to work with every sample type, and to the best of our knowledge, no host DNA depletion methods have been evaluated for their applicability in milk.
The goals of this study were (i) to evaluate commercially available kits for their ability to extract nucleic acids (e.g., DNA and RNA) from bovine milk, (ii) to evaluate host DNA depletion methods for use with bovine milk, and (iii) to develop and evaluate a selective lysis-propidium monoazide (PMA)-based protocol for host DNA depletion in milk. Our overarching hypothesis was that methodologies would differ with regard to the efficacy of nucleic acid extraction, and potential biases would be observed. Experiments were thus performed on raw bovine milk inoculated with mock bacterial communities, which included Gram-negative (Salmonella enterica), Gram-positive (Listeria monocytogenes), and mycobacterial (Mycobacterium smegmatis) organisms, as well as spores representing aerobic sporeformers (Bacillus wiedmannii).

RESULTS
The extraction method significantly impacts bacterial quantification through qPCR. We assessed seven commercially available DNA extraction methods for their ability to isolate DNA from milk samples inoculated with a mock bacterial community ( Fig. 1A) (for milk sample characteristics, see Table S1 in the supplemental material). Nucleic acid quantification and quality measurements (Table S2) need to be interpreted with caution due to the low overall DNA yield (,10 ng/ml for all samples), which placed readings below the linear range for most samples as measured via fluorescence with a Nanodrop. We were able to measure DNA for all kits via fluorescence with a high-sensitivity set of reagents, and all kits yielded sufficient DNA to allow for quantitative PCR (qPCR)-based quantification of the members of the bacterial mock community (i.e., Bacillus wiedmannii, Listeria monocytogenes, Mycobacterium smegmatis, and Salmonella sp.) as well as total bacterial 16S rRNA gene copies and bovine DNA.
The failure rates varied across kits (Table 1; Fig. 2), with the MagAttract mastitis kit having no failed reactions and the MagMAX CORE nucleic acid purification kit having only one failed reaction in the total-bacteria assay. From the column-based methods, the E.Z.N.A Food DNA (EZFood) kit had the largest number of failed reactions, with a total of 30 failed reactions across five assays (all except for bovine DNA); because of this, we could not include data from this method in any of our statistical models, as this would have caused nonidentifiability. The Power Soil Pro (PSoilP) kit had 5 failed reactions in the Listeria and Salmonella assays, the Power Food (PFood) kit had only one failed reaction in the Bacillus assay, and the ZymoBIOMICS DNA/RNA (ZymoDNA or ZymoRNA) minikit had no failures. Interestingly, the AllPrep PowerViral DNA/RNA (PowerViralDNA or PowerViralRNA) kit had all reactions in the first biological replicate fail for the Listeria assay. As a result, we could not compare the PowerViral method to other methods in the Listeria assay.
Of reactions that did not fail, the two magnet-based DNA extraction methods (the CORE and MagAttract mastitis kits, the two leftmost methods in Fig. 2) always yielded numerically higher log copy numbers (for all members of the mock community and for total bacterial 16S rRNA genes) than the other five kits. Bacterial copy numbers for the two magnet-based kits also had lower variability between technical replicates than column-based kits, with the E.Z.N.A Food DNA kit showing particularly large variability between technical replicates; DNA yields for this kit were also so low that no qPCR amplification was observed in the L. monocytogenes qPCR targeting rpoB.
Among the column-based extraction methods, Power Food, PowerViral, and Zymo were comparable with regard to (i) their ability to recover DNA of the members of the FIG 1 Study overview. Each panel depicts the design for each aspect evaluated, with respective numbers of biological and technical replicates. The qPCR assays were performed in duplicate for each experimental replicate. Copy numbers were calculated for bovine DNA, total 16S rRNA genes, Bacillus, Listeria, Mycobacterium, and Salmonella (A); bovine DNA, total 16S rRNA genes, Listeria, and Salmonella (C); and only bovine DNA and total 16S rRNA genes (B and D) because no inoculation was performed. mock community and (ii) variability within technical replicates ( Fig. 2; Table 1 provides detailed descriptive statistics). The Power Soil Pro kit (another column-based kit) generally provided lower bacterial DNA yields for each member of the mock community than these three kits.
We also evaluated the ability to isolate bacterial RNA for (i) two kits designed for isolation of RNA only and for (ii) three kits designed for isolation of both RNA and DNA ( Table 2). As with the DNA detection data detailed above, all kits yielded very low total nucleic acid concentrations based on spectrophotometry. Reverse transcriptase qPCR (RT-qPCR) amplification of gene targets for Salmonella sp. and L. monocytogenes as well as bovine RNA did not yield amplification from samples treated with DNase (these initial tests were performed on two technical replicates for each of the four kits). Control amplifications on nucleic acids before DNase treatment, however, yielded amplification with cycle threshold (CT) values that did not differ between the RT-qPCR and the control qPCR targeting DNA, suggesting the presence of residual DNA in the extracted nucleic acids and demonstrating that, by following the protocols described here, we were not able to successfully isolate RNA from our milk samples.
Osmotic lysis followed by PMA treatment decreases host DNA but also impacts bacterial DNA. We compared protocols that combined osmotic lysis and treatment with multiple concentrations of PMA for their ability to reduce host DNA, compared to bacterial DNA copy numbers. To assess whether osmotic lysis alone impacted copy numbers, we compared untreated control (UTC) samples to samples that underwent centrifugation and osmotic lysis with double-distilled water (ddH 2 O) but without PMA addition (0 mM PMA); this comparison revealed significantly lower bovine and bacterial DNA copy numbers in 0 mM PMA samples, suggesting significant effects of sample processing (including lysis) independent of PMA addition (Fig. 3, UTC versus 0 mM comparisons). We also observed that PMA concentration significantly impacted bovine and bacterial DNA copy numbers in a dose-dependent manner; as the PMA concentration increased, a sharp decrease was observed in bovine copy numbers, whereas a less pronounced, but still noticeable, decrease was observed in bacterial copy numbers (Fig. 3). This trend was confirmed in the model estimates, which showed a significant and negative effect of PMA in both linear models, with a greater estimate for bovine copy numbers (20.022) than for bacterial copy numbers (20.012). Treatment with 20 mM PMA was the concentration deemed optimal, as it yielded the greatest decrease in host DNA without critically compromising bacterial DNA recovery and thus was chosen for subsequent comparisons with commercially available host DNA depletion protocols.
We also selected a small subset of milk samples from each of the three biological replicates to undergo deep untargeted sequencing, including one sample each of the three biological replicates representing (i) untreated samples, (ii) samples treated with 20 mM PMA, and (iii) samples that underwent osmotic lysis but were not treated with PMA (they underwent centrifugation and selective lysis with ddH 2 O but were not treated with PMA), which were included to allow us to determine if differences observed would be due to PMA treatment or sample processing prior to PMA addition.
Sequencing had an average read depth of 51,563,707 reads per sample (range, 25,488,728 to 127,203,413 reads). Overall, less than 1% of all reads that passed quality control (QC) were determined to be of microbial origin regardless of PMA treatment (Table 3; Fig. 4), and around 3% remained unclassified.
While we observed a pronounced increase in the number of bacterial reads per 1 million sequenced reads with the PMA treatment (Fig. 5A), attempting to deplete host DNA with PMA also appeared to influence the resulting microbial profiles of samples, leading to visible differences between replicates of the same biological samples (Fig. 5B). For example, PMA treatment of sample Bio1 resulted in greater relative abundances of Cutibacterium organisms than occurred in aliquots of the same sample (Bio1) that were not treated (UTC) or underwent selective lysis with ddH 2 O but received no PMA treatment (0 mM). Similar differences between the UTC and the 0 mM and 20 mM  5B).
Host depletion methods successfully decrease host DNA but not to the extent needed to make a significant impact for use in milk sequencing studies. We compared the results of host DNA depletion methods and osmotic lysis followed by PMA treatment in three independent biological replicates. In each biological replicate, a freshly collected milk sample was homogenized and divided into five separate aliquots that were processed in parallel for each of the five methods (e.g., PMA method, Molzym [Molz] kit, NEBNext microbiome DNA enrichment [NEB] kit, QIAamp DNA microbiome [Qia] kit, and untreated control [UTC]). The means by which host DNA depletion was accomplished varied between methods; some kits depended on host DNA degradation followed by treatment with different enzymes (e.g., Qiagen and Molzym kits), while the NEB kit was based on the capture of methylated host DNA (for details on kits and methods, see Table 4; for descriptive statistics, see Table 5). Overall, host DNA removal treatments significantly decreased bovine copy numbers compared FIG 3 Effect of osmotic lysis of raw milk followed by treatment with various PMA concentrations on host and bacterial DNA counts as determined by qPCR. Boxplots represent normalized log copy numbers per milliliter of milk. UTC, untreated control sample; 0 mM, milk that underwent osmotic lysis but was not treated with PMA. Scatterplot shapes represent each of three independent biological replicates (circles, first; triangles, second; and squares, third), whereas colors represent concentrations evaluated. Asterisks represent significant differences at a P of ,0.05 in linear-model comparisons. Osmotic lysis significantly decreases log 10 copy numbers in non-PMA-treated samples (UTC versus 0 mM), and increasing PMA concentrations significantly decrease log 10 copy numbers in a dose-dependent manner.
to those of an untreated control ( Fig. 6B) (P , 0.001). However, we also observed a significant loss of bacterial copy numbers across methods ( Fig. 6A) (P , 0.001). Notably, the Molzym kit caused the largest reduction of both bovine and bacterial DNA copy numbers, to levels that likely would be insufficient as inputs for shotgun metagenomic sequencing. Treatment of raw milk with Molzym reagents specifically resulted in a pellet that was challenging to bring back to solution, which likely led to considerable nucleic acid losses and renders this method unreliable for use with milk.
As all host DNA depletion methods also led to the depletion of bacterial DNA, we chose to use the calculated log difference between bovine and bacterial DNA copy numbers as key metrics for evaluating the different host DNA depletion methods. Untreated samples showed an average of 0.77-log-higher bacterial DNA copy numbers than bovine DNA copy numbers (Fig. 6C). By comparison, the log difference after different host depletion methods ranged from 1.22 log (PMA method) to 2.09 log (Molzym kit). If treated samples underwent untargeted sequencing, this decrease in bovine copies would translate into negligible differences in terms of relative abundances of bacterial reads versus bovine reads given the approximately thousandfold difference between the bovine genome size and the average bacterial genome sizes.
Enzymatic selective lysis of inoculated milk followed by PMA treatment differentially affects detection of Gram-positive and Gram-negative bacteria through qPCR. In a final attempt to optimize a host depletion protocol that would be FIG 4 Effect of host DNA depletion with PMA on the percentage of sequencing reads assigned to the bovine genome or microbial genomes. Raw milk was collected on three separate days and divided into aliquots that underwent each of the processing methods. UTC, untreated control sample; 0 mM, milk that underwent osmotic lysis but was not treated with PMA; 20 mM, milk that underwent osmotic lysis and PMA treatment at 20 mM. applicable to raw milk, we tested a lysis protocol that included a mild protease, with the objective of more efficiently permeabilizing mammalian cells while keeping bacterial cells intact. Because we were aware of potential biases that could be introduced by adding an enzymatic lysis step prior to PMA treatment, we decided to inoculate the samples tested with Gram-positive and Gram-negative bacteria to address potential differential bacterial permeabilization by subtilisin that would result in DNA inactivation by PMA binding. We also investigated whether the light source would have an effect on the efficiency of PMA binding by processing samples in parallel and exposing PMA-treated sample duplicates to either a halogen light source or a commercial apparatus designed for use with PMA-treated samples (BLU-V; Qiagen). We prepared two biological replicates with two technical replicates for each comparison, from which duplicate qPCRs were done. Treatment with subtilisin decreased the number of culturable bacteria as assessed through CFU plate counts of milk that had been treated with lysis solution prior to PMA exposure (Fig. 7A) (P , 0.001). However, we did not observe a difference in CFU counts in milk samples treated with different enzyme concentrations. The light source did not affect copy numbers (P = 0.74); therefore, qPCR comparisons were performed  with combined data from both the halogen light source and BLU-V apparatus for Fig. 7B. As expected, we observed a decrease in bovine copy numbers in subtilisin-PMA-treated samples compared to those in the negative control that was much greater in extent than what we observed with osmotic lysis (Fig. 7B, first segment) (P = 0.04). However, we also observed a dose-dependent decrease in Gram-negative copy numbers in treated samples in the first replicate and were unable to detect Salmonella copy numbers at the higher enzyme concentration in the second replicate (Fig. 7B, third segment), while no differences were observed in Gram-positive copy numbers (Fig. 7B, second segment) (P= 0.3). This trend was confirmed by a less steep but still no-  ticeable decrease in total bacterial copy numbers compared to the decrease in Gramnegative copy numbers (Fig. 7B, fourth segment) (P= 0.001). These data led us to conclude that enzymatic treatment followed by PMA inactivation differentially affects Gram-negative bacterial copy numbers, leading to biased results if used to prepare DNA for sequencing purposes.

DISCUSSION
In this study, we comprehensively evaluated both nucleic acid extraction and host depletion protocols for bovine raw milk. The rationale of using raw milk as a model included the recent publication of several studies highlighting the variability of raw milk microbiotas (34)(35)(36), the potential influence of raw milk microbial quality on processed dairy products (37)(38)(39), and the fact that patterns of the milk microbiome represent potential "biomarkers" that can be tracked (20,21). All of these studies highlight milk as a potential candidate for using HTS as part of quality assurance and risk assessment in the food industry in the future. Our results indicate that the following magnetbased DNA extraction methods are superior for extracting DNA from milk. Although host DNA depletion methods decrease the amount of bovine DNA in a given sample, the reduction is not sufficient for effectively depleting bovine DNA in HTS studies of raw milk. We also observed potential for bias introduction by certain protocols on the overall microbial profile, as well as selective bias against Gram-negative bacteria.
Magnet-based DNA extraction seems to provide better results than columnbased methods for raw milk samples. We compared several nucleic acid extraction protocols used on bovine raw milk. While all protocols evaluated for extraction of DNA were able to successfully extract (albeit small amounts of) total DNA, we observed higher variability in replicates in some protocols than in others. The low DNA concentration from milk extracts is in agreement with those of previous reports (25,26,40). One particular aspect of this study is that we inoculated different types of bacteria that would be of interest in a dairy processing environment into raw milk and performed targeted qPCR to assess differential extraction of Gram-positive and Gram-negative bacteria. While most protocols were able to successfully extract DNA from the bacteria inoculated, magnet-based DNA extraction approaches had the best recovery and the lowest interreplicate variability, as evaluated by qPCR. It must be noted that the differences observed in the recovery of bacterial DNA are a characteristic of processing steps rather than a flaw or a sign of the inefficiency of a given protocol (i.e., while some extraction methods carry the entire sample lysate from one step to the following, some protocols require fixed or maximum volumes to be transferred to the next step due to volume limitations in the reaction tubes used). It is important to highlight that different DNA extraction methods have advantages and drawbacks, which can also vary according to the material to which they are applied. As no general gold standard method exists to date for this application, the selection of an extraction method should be based on the objectives of each particular study (41). Nevertheless, there is a growing consensus on the importance of mechanical disruption, such as bead beating for microbiome applications (42), and magnet-based extractions have been demonstrated to be particularly effective in the diagnosis of tuberculosis in the sera and plasma of patients (43) and in the extraction of algal DNA for next-generation sequencing (NGS) studies (44), highlighting magnetic methods as a common theme around diverse nucleic acid-based applications.
Extracting bacterial RNA from high-quality raw milk is challenging. The methods used here copurify both DNA and RNA or each nucleic acid separately, and based on the inoculation of bacterial loads in some cases exceeding regulatory standards for raw milk, we expected bacterial RNA to be detectable. We observed very low concentrations of extracted DNA and were unable to successfully detect bacterial RNA from our inoculated milk samples. This was a surprising finding, as literature on the milk metatranscriptome is available (19). However, the protocols previously used to successfully isolate RNA from milk involved extensive centrifugation of large sample volumes followed by several wash steps (45), as well as samples with much greater bacterial loads, including dairy products (such as cheese), which contain high levels of organisms responsible for fermentation of the products (19,21).
Current methods for host DNA depletion are not suitable for application in untargeted HTS studies of milk. Because the bovine cells are present in milk along with the microbes of interest, and due to the striking difference between bovine and bacterial genome sizes, developing efficient yet unbiased host DNA depletion methods is critical for the adoption of HTS technologies in food safety. While we observed a decrease in the number of bovine copies compared to bacterial copies as assessed through qPCR in all protocols evaluated, this decrease is not sufficient to effectively change the relative abundance of reads being assigned to the bovine genome in HTS studies. This was confirmed when we performed deep untargeted sequencing of a subset of samples that had demonstrated promising results in qPCR but still detected over 99% of reads mapping to the bovine genome independent of host DNA depletion.
Among the other host DNA depletion protocols evaluated were immunoprecipitation of methylated eukaryotic DNA and selective lysis of mammalian cells followed by DNase treatment. One reason for the lack of efficiency observed in the treatment of milk with either of these methods may be the challenges associated with extracting DNA from milk in the first place. Such protocols require large amounts of good-quality DNA, which is practically impossible to obtain from milk samples without extensive centrifugation and pellet washes, potentially proving impractical in industry settings and representing added opportunities for sample contamination or unintentional bias of the microbiota for lipid-or protein-bound microorganisms. The characteristics of milk by themselves can also pose a challenge (presence of fats, proteins, and ions) (8,9). Challenges specific to nucleic acid extraction from milk have been discussed in the literature to some extent, specifically by Metzger et al., who reported challenges in amplifying bacterial DNA from DNA extracted from milk (46).
To the best of our knowledge, this was the first study to evaluate host DNA depletion methods in a food matrix. The literature that tries to address host contamination in clinical sequencing applications for pathogen detection is limited, although improvements in the detection of malaria (47) and pathogens in infected tissue samples (48) and sputum (32) have been reported. Reports describing attempts to use host DNA depletion in cerebrospinal (30) and arthroplasty (29) fluids, as well as human saliva (28), are also available. The common theme around these studies was the various efficiencies across methods and sample types, highlighting the need for individual assessment of host depletion methods using the desired sample type.
Enzyme-based selective lysis followed by PMA treatment differentially affects Gram-negative bacteria in inoculated raw milk. Propidium monoazide has been extensively used to differentiate between live and dead bacterial cells due to its inability to penetrate intact cell membranes and has recently been indicated to be an effective method for host DNA depletion for use in HTS studies (28,49,50). We thus developed an enzyme-based method for host DNA depletion (which utilized a mild protease) combined with subsequent PMA treatment in an effort to better lyse mammalian cells while retaining intact bacterial cells.
While we detected an effect of the enzymatic treatment on total bacterial numbers as measured through plate counts that appeared to be independent of enzymatic concentration, we did not observe a decrease in Gram-positive bacterial numbers as measured by qPCR after PMA exposure. Nevertheless, in our spiking experiments, we observed an enzyme dose-dependent decrease of Gram-negative copy numbers in samples that underwent selective lysis followed by PMA exposure. These data led us to conclude that enzymatic treatment might have differentially permeabilized the membranes of Gram-negative cells, allowing PMA to bind to DNA without observable differences in overall bacterial survival. Taken together, these observations suggest that potential biases may occur against detection of Gram-negative bacteria through qPCR by this method.
It is imperative to optimize protocols for samples with different characteristics. The generalizability of this study lies in the importance of the standardization and validation of methods for each specific food matrix. It is also important to highlight the challenges associated with low bacterial biomass samples and the fact that various foods have a wide range of biomasses, from very low microbial content in raw milk to relatively high microbial content in some cultured dairy products. Efforts to standardize and recommend best practices in HTS studies, particularly pertaining to low-biomass samples (51)(52)(53), have recently begun and must be continued.
Conclusions. Our results suggest that magnet-based extraction methods are superior for bacterial nucleic acid isolation from bovine milk. Host DNA remains a challenge for untargeted sequencing of milk, highlighting that the food matrix characteristics should always be considered whenever planning HTS studies. Enzymatic-lysis-based PMA host depletion introduced dose-dependent biases against Gram-negative bacteria, suggesting that selective lysis permeabilized Gram-negative organisms to PMA, which subsequently hindered our ability to detect Gram-negative bacteria through qPCR without affecting counts of live bacteria. While it is not possible to test all methods available at any given time, we focused on kits and protocols that have been the most widely used for the extraction of nucleic acids from milk. As procedures are improved or new methods are developed, a reevaluation of available protocols would prove useful, as the development of HTS-based tools to aid and improve quality assurance and food safety programs continues to hold great promise.

MATERIALS AND METHODS
Comparison of nucleic acid extraction protocols in spiked milk. (i) Strain selection. We selected a Gram-positive bacterium (Listeria monocytogenes), a Gram-negative bacterium (Salmonella enterica), and a sporeformer (Bacillus wiedmannii) previously isolated from milk or the dairy environment to create a mock microbial community that would be inoculated into raw milk. We included an additional Gram-positive bacterium, a member of the Mycobacteriaceae family (Mycobacterium smegmatis), because of the potential public health implications and the uniqueness of the cell structure of mycobacteria. Milk samples were specifically inoculated with Salmonella enterica, Listeria monocytogenes, and Mycobacterium smegmatis vegetative cells grown to stationary phase and Bacillus wiedmannii spores. Specific strain information is available in Table 6 and in the Food Microbe Tracker database (www.foodmicrobetracker.com). The bacterial load of the inoculated milk sample was chosen to represent the largest bacterial concentration allowed for raw milk in the United States, which is 300,000 CFU per ml, in an attempt to simulate the highest legal bacterial load of incoming milk in a dairy processing plant (54).
(ii) Inoculum preparation. Bacillus wiedmannii spore suspension was prepared according to Buehler et al. (55). Briefly, the bacterial isolate was streaked from frozen culture into brain heart infusion (BHI) agar (Becton, Dickinson and Co., Sparks, MD) and incubated for 24 h at 37°C. Following incubation, a single colony was selected to inoculate a tube containing 5 ml of BHI broth, followed by incubation at 37°C for 72 h. Next, 100 ml of inoculated BHI was spread plated in duplicate on a sporulating medium, AK agar number 2 (Becton, Dickinson and Co.), which was incubated for 120 h at 37°C. Sporulation was confirmed via microscopy with a 7.5% malachite green endospore stain (JT Baker, Phillipsburg, NJ) as detailed in Gaillard et al. (56). Spores were harvested by flooding the agar surface with 10 ml of phosphate-buffered saline (PBS) (Weber Scientific, Hamilton, NJ) and scraping the bacterial culture with a cell scraper. Harvested cells were transferred to a sterile centrifuge tube and washed with 10 ml of sterile water three times by centrifugation at 10,500 rpm for 15 min and resuspension of the pellet. Following the third wash, 5 ml of sterile water and 5 ml of 100% ethanol (Decon Labs, King of Prussia, PA) were added to the tube, and the pellet was resuspended by vortexing. The bacterial pellet resuspended in 50% ethanol was incubated for 12 h at 4°C in a rotating platform to eliminate any remaining vegetative cells. After ethanol treatment, the spore suspension was washed another three times with 10 ml of sterile water as described above. The final spore suspension was kept in sterile water at 4°C until used for spiking experiments.
Mycobacterium smegmatis cells were streaked from frozen stocks into BHI agar, followed by incubation at 37°C for 48 h. A single colony was used to inoculate a 5-ml tube containing BHI broth with 1% Tween 80 (Fisher Scientific, Hampton, NH, USA), followed by incubation at 37°C for 48 h. The final inoculum was prepared by inoculating 100 ml of liquid culture onto 100 ml of prewarmed BHI broth with 1% Tween 80, followed by incubation at 37°C for 72 h. The resulting stationary-phase culture was kept at 4°C until it was used to spike milk samples.
Listeria monocytogenes and Salmonella enterica cells were streaked separately from frozen stocks into BHI agar, followed by incubation at 37°C for 24 h. For each strain, a single colony was used to inoculate a 5-ml tube containing BHI broth, followed by incubation at 37°C for 24 h. The final inoculums were prepared by inoculating 100 ml of each liquid culture onto 100 ml of prewarmed BHI broth, which was subsequently incubated at 37°C for 12 h.
At harvesting, bacteria were spiral plated on agar using an Eddy Jet 2W spiral plater (IUL Micro, Barcelona, Spain) at various dilutions to determine bacterial concentrations. Bacterial liquid cultures were kept at 4°C until bacterial enumeration (B. wiedmannii, L. monocytogenes, and Salmonella sp. were kept for 24 h, and M. smegmatis was kept for 48 h at 4°C). Inoculum volumes were calculated based on CFU.
(iii) Inoculation of raw milk. Raw milk was collected from the Cornell University Ruminant Center (CURC; Hartford, NY) bulk tank into sterile 10-oz lock tab containers (Capitol Plastics, Amsterdam, NY) and transported on ice to the Milk Quality Improvement Program laboratory in Ithaca, NY. Samples were combined into a single sterile 1,000-ml glass bottle and homogenized by inverting the container 50 times. From that bottle, one aliquot (1 ml) was used to make dilutions and determine the initial bacterial count using a standard plate count (SPC) agar (Millipore Sigma, Burlington, MA), which was incubated for 48 h at 32°C. Bacterial enumeration was performed using an automated colony counter (SphereFlash automatic colony counter, IUL Micro, Barcelona, Spain). A second aliquot (60 ml) was transported on ice to the DairyOne laboratory (DairyOne, Ithaca, NY) for determination of somatic cell counts (SCC) and milk contents (fat, protein, lactose, and total solids). The remaining milk was used for inoculation with the strains described above. To allow for interaction between inoculated bacteria and milk components and to mimic conditions similar to those when raw milk is stored in dairy silos, inoculated milk was held at 4°C for 24 h prior to use as starting samples for nucleic acid extraction comparisons.
(iv) Bacterial enumeration of inoculated milk samples. Total bacterial enumeration was performed through serial dilutions of each milk sample in PBS, which was spiral plated in duplicate on standard plate count agar as described above and incubated for 48 h at 32°C.
(v) Nucleic acid extraction. Raw (uninoculated) milk samples were processed in parallel with inoculated milk samples for all kits. Additionally, a no-template nucleic acid extraction was carried out as a negative control to assess cross-contamination and potential reagent contamination. As a positive control, a mock bacterial community was created in PBS with the same bacteria described above at the same concentration as the inoculated milk samples; this was included on each extraction plate or run as a control. Inoculated milk samples were extracted in three technical replicates in each of three independent biological replicates, which were performed on separate days with a different raw milk sample used for each biological replicate.
Kits, manufacturers, and protocol details are described in Table 2. For each of the nine protocols evaluated, samples were processed in parallel (uninoculated milk, inoculated milk triplicates, mock a NCBI accession numbers of genomes used for primer design. Primers for the detection of the total bacterial DNA target 16S rRNA gene were designed based on conserved regions of the 16S rRNA gene. bacterial community in PBS, and a negative control without a starting sample ["kit buffers only"]) according to the manufacturer's instructions. Exact versions of protocols followed can be found in a GitHub repository as supplemental material (https://github.com/ErikaGanda/MilkDNA). Extracted nucleic acids were frozen at 280°C until quantification and qPCR assays were performed.
(vi) Nucleic acid quantification. Nucleic acid quantification was performed with both a spectrophotometer (Nanodrop 2000; ThermoFisher Scientific) and a fluorescence-based method. Absorbance was measured at 280, 260, and 230 nm for DNA and RNA. Total DNA was also measured with a Quant-iT double-stranded DNA (dsDNA) high-sensitivity (HS) assay kit (ThermoFisher Scientific); fluorescence measurements were performed using a Synergy H1 plate reader (BioTek Instruments, Winooski, VT, USA) with wavelengths of 490 nm for excitation and 535 nm for emission.
(vii) qPCR primer development and assay conditions. For each bacterial strain used to inoculate milk samples, qPCR primers were designed to target the RNA polymerase subunit beta gene (rpoB) because it is a single-copy gene and allows for a more accurate comparison between bacterial numbers than 16S rRNA genes. Primers were also designed to target a conserved region of the 16S rRNA gene, and calculated 16S copy numbers were used as a proxy for total bacterial numbers. Primer details are described in Table 7. Reactions were carried out in duplicate using 2 ml of extracted DNA. The final qPCR volumes totaled 25 ml and contained 12.5 ml SYBR green master mix (ThermoFisher Scientific), 2 ml extracted DNA, 0.5 mM forward primer, 0.5 mM reverse primer, and 9.5 ml nuclease-free water. Reactions were carried out in a QuantStudio 6 instrument (ThermoFisher Scientific), with the following cycling conditions: 95°C for 10 min followed by 40 cycles of 95°C for 15 s and 60°C for 1 min and a melting curve of 95°C for 15 s, 60°C for 1 min, and 95°C for 15 s.
Bovine genome copy numbers were calculated using a commercial TaqMan assay targeting the UXT gene (ThermoFisher Scientific). The final qPCR totaled 20 ml and included 1 ml of the gene expression assay mixture, 10 ml TaqMan Fast advanced master mix (ThermoFisher Scientific), 7 ml of nuclease-free water, and 2 ml of the template. Reactions were carried out in a QuantStudio 6 instrument, with the following cycling conditions: 95°C for 20 s, followed by 40 cycles of 95°C for 1 s and 60°C for 20 s. For each of the three biological replicates, samples extracted with all kits were amplified in a single PCR plate and compared using the same standard curve.
(viii) qPCR data analysis. Amplification data were exported from QuantStudio real-time PCR software (ThermoFisher Scientific) into Excel (version 16.0.11325.20156; Microsoft Corp., Redmond, WA). Standard curves were built with serial dilutions of purified bacterial DNA that was quantified using a Nanodrop spectrophotometer (ThermoFisher), and genome equivalents in each reaction mixture were calculated as described in Brankatschk et al. (57). Standard curves with an R 2 of ,0.9 and an efficiency of ,70% were discarded, and reactions were repeated. For 2 out of 24 reaction plates, one outlier was removed based on visual inspection of deviating standard curve data points.
Copies per microliter of the DNA input were calculated for each reaction. Melt curves were visually inspected, and cycle threshold (CT) values of samples in which melt curves did not match standard samples' peaks were not used for final copy number calculations. Because different kits required various amounts of sample input and nucleic acid output (Table 2), data were normalized to allow for comparison between kit protocols. We chose to simulate copy numbers in 1 ml of milk input and 100 ml DNA output for each kit, using the following equations:  (ix) Statistical analysis. To describe the differences observed between extraction methods, linear models were fit using the lm function in R comparing log copy numbers between kits. A separate linear model was fit for each assay, and all linear models included kit/method, biological replicate (SpikeSet), and their interactions. Two observations (one from PowerFood in the Bacillus wiedmannii assay and one from CORE in the total bacterial DNA assay) had Cook's distance greater than 0.5, were flagged as outliers, and were removed from subsequent analyses. The linear model was then refitted excluding these two observations. For each assay, a two-tailed pairwise comparison of means was performed using the R package multcomp (58). Two kits were removed from these comparisons due to identifiability issues stemming from too few nonmissing observations (too many failed reactions). E.Z.N.A. Food DNA had failed reactions across all assays except the bovine assay, and PowerViral had failed reactions in all of the first biological replicates of the Listeria assay. All P values (combined across the 6 assays) were corrected to achieve an overall family-wise error rate less than or equal to 0.05 using the Bonferroni correction (59). Raw data and code are available in a GitHub repository (https://github.com/ErikaGanda/MilkDNA).
Host depletion protocols. (i) Selective osmotic lysis of host cells and host DNA depletion through PMA treatment of uninoculated raw milk. An osmotic lysis-based host DNA depletion protocol was adapted from Marotz et al. (28). Briefly, 500 ml of uninoculated milk were centrifuged at 10,000g for 8 min, whey was discarded while fat and pellet were kept in the microcentrifuge tube. Five hundred microliters of sterile double distilled water (ddH 2 O) were added to the pellet. The tube was vortexed until pellet dissolution, followed by incubation at room temperature for 5 min to allow for osmotic lysis of mammalian cells. We compared four different concentrations of propidium monoazide (PMA; catalog number 40019; Biotium, Hayward, CA): 10, 20, 40, and 50 mM. Untreated milk (milk that was not centrifuged or exposed to osmotic lysis) and milk exposed to osmotic lysis but not exposed to PMA were also included in comparisons. To account for biological variation and assess repeatability, experiments were performed in three biological replicates performed on three different days, using a different raw milk sample for each biological replicate.
After incubation at room temperature, the appropriate volume of PMA was added to each tube to achieve desired concentrations, followed by a brief mixing and incubation in the dark (in an aluminum foil-wrapped box) for 5 min on a rotating platform.
Following PMA incubation in the dark, PMA was inactivated by light exposure. Samples were placed horizontally on ice ,20 cm from a 500-W halogen light source (Woods halogen work light; Southwire Company LLC, Carrollton, GA, USA). Samples were exposed to light on a rotating platform for 5 min and frozen at 280°C until DNA extraction with a magnet-based method (MagMAX CORE nucleic acid purification kit; ThermoFisher). Quantitative PCRs and data analysis were performed as described above, with the exception that only bovine DNA and total bacterial copy numbers were quantified, as no bacteria were inoculated in the milk samples used. Linear models were fitted to assess the effect of selective lysis in non-PMAtreated samples (UTC versus treatment with 0 mM) and the effect of PMA concentration on copy numbers.
(ii) Selective enzymatic lysis of host cells and host DNA depletion through PMA treatment of inoculated raw milk. Because milk has more fat, protein, and minerals than saliva, we hypothesized that the 1:1 osmotic lysis included in the protocol adapted from Marotz et al. (28) was not optimal for lysing bovine cells in milk compared to that for lysing human cells in saliva.
We thus also evaluated a combination of a mild lysis solution followed by incubation with two concentrations of subtilisin, a protease from Bacillus licheniformis (Krackeler Scientific; catalog number 45-P5380-25MG). The lysis solution contained 7.6 g/liter sodium carbonate, 8.8 g/liter sodium bicarbonate, 2.43 g/liter disodium EDTA, and 2.71 g/liter tetrasodium EDTA. Reagents were solubilized in 1 liter of sterile double-distilled water using a stir plate with a magnetic rotating bar for 6 h, prior to pH measurement, titration to 9.5, and filter sterilization with a 0.22-mm filter (Corning disposable vacuum filter; Fisher Scientific, Waltham, MA).
Experiments were performed on two separate days with a newly collected milk sample on each day. To access potential lysis biases, we spiked milk with a Gram-positive and a Gram-negative bacterium as described above (L. monocytogenes and Salmonella sp.). Spiked milk samples were treated prior to DNA extraction as follows. Four hundred microliters of inoculated milk was added to 1.6 ml of lysis solution and incubated at room temperature for 5 min prior to the addition of subtilisin. Two concentrations of subtilisin were tested: 10 mg/ml (10Â) and 1 mg/ml (1Â) to achieve final concentrations of 0.73 and 0.073 units per reaction mixture, respectively. After addition of the enzyme, samples were incubated at 50°C for 5 min and placed on ice after incubation. Samples were then centrifuged at 10,000 Â g for 8 min. The supernatantcontaining lysis solution and enzyme were discarded, and the pellet was resuspended with 400 ml of PBS. Treated samples were exposed to PMA at 20mM as described above, photoactivated using two light sources (the halogen lamp described above or the BLU-V system from Qiagen) and stored at 280°C until DNA extraction with a magnet-based method (MagMAX CORE extraction kit; ThermoFisher).
Quantitative PCRs and data analysis were performed as described above, with the exception that bovine DNA, total bacterial copy numbers, L. monocytogenes cells, and Salmonella cells were quantified, as we hypothesized that the lysis solution could affect Gram-negative bacteria differently than Gram-positive bacteria.
(iii) Comparison of host DNA depletion methods in uninoculated raw milk. Based on initial experiments with various PMA concentrations we decided to include 20 mM in a comparison with three commercial host DNA depletion kits. Kits, manufacturers, and protocol details are described in Table 4. Samples were processed according to the manufacturer's instructions and frozen at 280°C.
For the NEB host depletion method, DNA was extracted with a magnet-based extraction procedure and quantified using a qubit fluorometer (ThermoFisher) prior to methylated host DNA capture reactions.
Final microbially enriched DNA cleanup was performed using AMPure magnetic beads (Beckman Coulter). Exact versions of protocols followed can be found in our GitHub repository (https://github.com/ ErikaGanda/MilkDNA). Three biological replicates (each with two technical replicates) were prepared using a new sample of raw milk for each biological replicate (except with the Molzym kit, which was performed only with duplicate milk samples in 1 day due to limited available reagent amounts).
Quantitative PCRs and data analysis were performed as described above, with the exception that only bovine DNA and total bacterial copy numbers were quantified, as no bacterium was inoculated in the milk samples used.
(iv) Sequencing. In addition to qPCR, we performed deep untargeted sequencing of a subset of nine samples at the Cornell Biotechnology Resource Center. Briefly, DNA quality control was performed in a fragment analyzer, and sequencing libraries were constructed using a Nextera DNA flex kit (now renamed Illumina DNA Prep; Illumina, San Diego, CA) and 2Â 150-bp paired-end sequencing was performed in an Illumina NextSeq500. Sequencing data quality control and in silico host signal removal were performed as previously described (3,4), and microbial data were processed as described in the work of Beck et al. (4). Briefly, adapter removal and quality trimming were performed with TrimGalore (60), and trimmed reads of at least 50 bp in length were classified using Kraken v0.3 (61) against a multieukaryote database of 31 common food ingredients and contaminants, including the bovine reference genome (GCF_000003205.7, Btau_5.0.1) for bioinformatic removal of the host signal. Then taxonomic profiling of samples was completed using Kraken v0.3 against microbial RefSeq genomes as described by Beck et al. (4). A study diagram is presented in Fig. 1.
Data availability. The data sets and R code are available in our GitHub repository: https://github .com/ErikaGanda/MilkDNA. Raw sequencing reads have been deposited for all 9 sequenced samples into SRA under the BioProject accession number PRJNA667736.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. FIG S1, TIF file, 13 MB.

ACKNOWLEDGMENTS
We thank members of the CURC for allowing us to sample, the VanDerVen Lab for providing us with the Mycobacterium smegmatis MC2-155 strain, the Mann Lab for providing bovine RNA used as a standard, and Rachel Silverman for her comments on the manuscript.
All experimental procedures were carried out at Cornell University, because no animal sampling was necessary for this study and an approved research protocol by the Cornell University Institutional Animal Care and Use Committee was not necessary.
E.G., K.L.B., N.H., B.K., and M.W. are part of the Consortium for Sequencing the Food Supply Chain. We declare that they have no conflict of interest.