Unique genotypic features of HIV-1 C gp41 membrane proximal external region variants during pregnancy relate to mother-to-child transmission via breastfeeding

Mother-to-child transmission (MTCT) through breastfeeding remains a major source of pediatric HIV-1 infection worldwide. To characterize plasma HIV-1 subtype C populations from infected mothers during pregnancy that related to subsequent breast milk transmission, an exploratory study was designed to apply next generation sequencing and a custom bioinformatics pipeline for HIV-1 gp41 extending from heptad repeat region 2 (HR2) through the membrane proximal external region (MPER) and the membrane spanning domain (MSD). MPER harbors linear and highly conserved epitopes that repeatedly elicits HIV-1 neutralizing antibodies with exceptional breadth. Viral populations during pregnancy from women who transmitted by breastfeeding, compared to those who did not, displayed greater biodiversity, more frequent amino acid polymorphisms, lower hydropathy index and greater positive charge. Viral characteristics were restricted to MPER, failed to extend into flanking HR2 or MSD regions, and were unrelated to predicted neutralization resistance. Findings provide novel parameters to evaluate an association between maternal MPER variants present during gestation and lactogenesis with subsequent transmission outcomes by breastfeeding. Importance HIV-1 transmission through breastfeeding accounts for 39% of MTCT and continues as a major route of pediatric infection in developing countries where access to interventions for interrupting transmission is limited. Identifying women who are likely to transmit HIV-1 during breastfeeding would focus therapies, such as broad neutralizing HIV monoclonal antibodies (bn-HIV-Abs), during the breastfeeding period to reduce MTCT. Findings from our pilot study identify novel characteristics of gestational viral MPER quasispecies related to transmission outcomes and raise the possibility for predicting MTCT by breastfeeding based on identifying mothers with high-risk viral populations.


Introduction
Mother-to-child HIV-1 transmission (MTCT) can occur during pregnancy, delivery (perinatally) or breastfeeding and contributes substantially to global morbidity and mortality for children under-5 years of age. Rates of perinatal MTCT range from 15% to 45% in the absence of any interventions but can be reduced to less than 5% with appropriate antiretroviral treatment [1][2][3][4][5]. HIV-1 transmission through breastfeeding accounts for 39% of MTCT, and continues to be a major route of pediatric infection in developing countries [6], where access to interventions for interrupting transmission is limited [7].
Cross-sectional as well as longitudinal studies of cell-free HIV-1 find persistent mixing and synchronous evolution of viruses between plasma and breast milk in the ZEBS and other cohorts indicating that HIV-1 quasispecies in plasma are representative of virus populations in breast milk [38,[41][42][43][44][45], although compartmentalization of cell-associated viruses in breast milk is reported in other studies [41,46]. A sophisticated phylogenetic analysis of longitudinal HIV-1 env V1-V5 sequences from plasma and breast milk of transmitting mothers suggests that the most common ancestral virus(es) in breast milk originate during the second or third trimester of pregnancy, close to the onset of lactogenesis [38]. Consequently, plasma HIV-1 variants during pregnancy might harbor genetic features related to subsequent breast milk transmission.
To examine the relationship between maternal viruses during gestation and subsequent transmission outcomes through breastfeeding, a pilot study of ZEBS maternal plasma subtype C HIV-1 from second or third trimester of pregnancy were evaluated by next generation sequencing (NGS) to provide broad coverage of HIV-1 quasispecies at the population level and sensitive detection of low-frequency variants. A custom bioinformatic pipeline was developed to assess biodiversity, amino acid substitutions within linear epitopes of known bn-HIV-Abs targeting gp41 MPER, and biochemical features (hydropathy and charge) of plasma subtype C HIV-1 gp41 MPER variants and compared to the adjacent heptad repeat region 2 (HR2) or membrane spanning domain (MSD) among mothers who transmitted or did not transmit HIV-1 through breastfeeding.

Study cohort
A nested, case-control study included a subset of eight women infected by subtype C HIV-1 enrolled in ZEBS [38][39][40]. All subjects were therapy-naive, except for a single peripartum dose of nevirapine according to the Zambian government guidelines during the enrollment period (2001)(2002)(2003)(2004). Written informed consent for participation in the ZEBS study was obtained from all participants. From the larger cohort, our study included plasma samples from four women who transmitted HIV-1 during the early breastfeeding period (TM) (defined by infants who became HIV-1 DNA positive after 42 days following prior negative tests), and four infected women who did not transmit HIV-1 (NTM) [defined by infants who remained HIV-1 DNA negative through the completion of all breastfeeding for a median (quartile range) (QR) of 6.5 (4.0-18.8) months] ( Table 1). Maternal plasma samples were collected prospectively during the second/third trimester of pregnancy [median (QR): 80 (32-164) days before delivery] ( Table 1) This genetic protocol was approved by the Institutional Review Boards of the University of Florida, the Sabin Research Institute, and Children's Hospital Los Angeles.

Sequence analysis
A bioinformatics pipeline was developed to facilitate analysis of large numbers of HIV-1 gp41 HR2-MPER-MSD sequence reads. The median (QR) number of raw reads was 56,647 (43,450) per subject. Sequences were submitted to NCBI public access database with accession numbers pending. A quality control step filtered a median (QR) of 7.5% (5.2%−13.2%) low quality reads with ambiguous nucleotides, more than one error in either primer tag, or a length outside mean ± 2 SD length range, leaving median (QR) of 52,408 (37,541-71,533) quality sequences per sample. Depth of sequencing provided median (QR) of 27 (19-36)-fold coverage of input 2,000 HIV-1 RNA copies with no significant difference in sequence number or fold coverage among the samples between the groups. Quality MPER sequences were extracted from the entire HR2-MPER-MSD sequences by aligning to HIV-1 HXB2 and to HIV-1 subtype C consensus sequence generated from HIV sequence database [50].

Statistical analysis
Groups were compared by unpaired t test. Statistical analyses were performed using SAS version 9.1 (SAS Institute, Cary, NC) with P <0.05 (two sided) defined as significant. Logistic regression was used to examine the effects of predicted hydropathy or charge of HIV-1 gp41 MPER and their interactions (exposures) on transmission (outcome).

Population structure
To evaluate the complexity of viral population structure within each individual, unrooted phylogenetic tree were constructed from maternal consensus MPER sequence clusters. Overall, the analysis showed that sequences were correctly assigned to each individual with no sequence mixing among subjects. Within each subject HIV-1 population were organized into one to three dominant clusters with thousands of sequences per cluster ( Figure 1). Dominant sequence clusters generally included a median (QR) of 47% (19% −63%) of sequences. Sequences representing 0.25% to 10% of the viral population within an individual also appeared in low frequency (0 to 4) clusters surrounded by swarms of clusters with less abundant variants, usually representing <0.25% of the population. The structure of viral populations based on gp41 regions was indistinguishable between TM and NTM and similar to HIV-1 populations based on gp120 V3 [49].
To determine if differences in biodiversity between TM and NTM were restricted to MPER or extended to adjacent regions in gp41, similar analyses were applied to HR2 and to MSD sequences ( Figure 2). Overall, mean estimated maximum biodiversity was more than 2-fold greater in HR2 than in MPER among TM or NTM groups, reflecting in part that the HR2 region (102 nucleotides) is almost twice as long as MPER (66 nucleotides between NTM and TM groups, although maximum biodiversity in MSD compared to MPER was reduced among TM group ( Figure 2B).

Amino acid substitutions in HIV-1 MPER
Biodiversity evaluated at the nucleotide sequence level was reflected in diversity among amino acid residues in MPER (Figure 3), as well as in HR2 and in MSD regions ( Figures  S2 and S3
Overall, polymorphic substitutions with predicted resistance phenotypes were identified with variable frequency in most individuals independent of transmission outcomes.

Distinct biochemical characteristics of HIV-1 MPER populations between TM and NTM
To evaluate if predicted amino acid substitutions might alter the biochemical features of MPER, distribution of hydropathy or charge at the population level within TM or NTM MPER was assessed ( Figure 4A). TM viral populations compared with NTM demonstrated a left-shift towards increased frequencies of hydrophilic MPER variants with a median (QR) hydropathy index of −10 (QR, −12.5 to −9.6), significantly lower than NTM variants with a median of −7.3 (QR, −10.4 to −5.1) (p <0.0001). The difference in hydropathy index between TM and NTM was concentrated among variants that appeared with reduced frequency (≤20%) (P <0.0001), but not among high frequency variants (>20%) (p=0.34). Low-frequency variants were uniquely identified by NGS, and not found when clonal or single genome sequences were analyzed [40]

Discussion
Breast milk is essential for infant development and health particularly in resource limited settings [78][79][80][81]. Unfortunately, breast feeding remains a major source of global pediatric HIV-1 infection reflecting, in part, limited parameters to identify women at high risk for viral transmission by breastfeeding and the challenges of providing therapeutic interventions for the duration of the breastfeeding period [82][83][84][85]. HIV-1 variants that establish new infections by breastfeeding generally occur at low frequency in the transmitting viral population, are characterized by shorter and underglycosylated gp120 Envelopes, and may represent escape from neutralizing antibodies targeting epitopes in both gp120 and gp41 MPER [9][10][11][12]86]. Our exploratory studies of HIV-1 variants by metagenomic approaches identified distinct features of gestational MPER populations that distinguished between women who did or did not subsequently transmit HIV-1 during breastfeeding. Transmission outcome groups in our study were well balanced in age, plasma viral load, CD4 T-cell counts and breastfeeding practices, which in combination with the depth of sequencing from each individual provided statistical sensitivity. As anticipated virus populations in plasma during pregnancy among women who subsequently transmitted HIV-1 via breastfeeding displayed greater biodiversity. A higher frequency of HIV-1 MPER variants with hydrophilic and positively charged amino acid residues among TM compared with NTM was discovered.
The characteristics could only be evaluated at the population level by NGS, as conventional clonal sequencing biases the population towards dominant variants. Phenotypic differences in peripheral blood viral populations overtime that related to subsequent transmission were evident by the third trimester of pregnancy about the time of lactogenesis [38]. While our current study was designed as a cross sectional comparison of maternal virus populations during gestation, whether or not biochemical differences among maternal viral populations present during pregnancy persist during breastfeeding and are related to infecting cell-free or cell-associated viruses in nursing babies are important questions for subsequent studies [87].
Positive selection for any single amino acid change was limited, as was modulation of glycan motifs across MPER. Sensitivity to bn-HIV-Ab, either alone or in combinations, by the novel amino acids in each MPER allele within an individual is difficult to predict with complete accuracy, may differ by subtype [86] and necessitates direct assessment for neutralization susceptibility [88]. Absence of clear bn-HIV-Ab resistance genotypic profiles during pregnancy that distinguish between TM and NTM does not rule out a subsequent role for neutralization resistance in MTCT by breast milk. Yet, polymorphic amino acid positions within MPER during pregnancy frequently mapped outside motifs associated with known bn-HIV-Ab, raising the possibility that factors other than antibody selection contribute to the differences in MPER characteristics between TM and NTM. For example, a significant role in membrane fusion played by MPER requires functional assays to evaluate the consequences by biochemical variants of MPER for viral entry into different host cells or for crossing mucosal barriers.
HIV-1 gp41 MPER plays a critical role in HIV-1 fusion by perturbing the architecture of the bilayer envelope [89][90][91]. Distribution of hydrophobic amino acid in MPER can modulate membrane fusion [90,92]. Electrostatic interaction between viral particle and negatively charged lipid membrane may also play a role in viral entry [93]. Antibody-membrane interactions for effective engagement with antigens is introduced as a relatively new concept upon the discovery of anti-MPER antibodies against HIV. Electrostatic and hydrophobic association of antibody to the viral membrane are reported to be essential for efficient epitope binding [94,95]. A study of 2F5 observed that the charge of amino acid residual affects ionic interactions between MPER and 2F5 particularly in core epitopes, while hydrophobic interaction between epitope residuals and/or between antibody and epitope is required for stability of epitope-antibody binding [94]. A recent study by Carravilla P et al. [95] demonstrated that 4E10 binding to virus-like lipid bilayer was disrupted by deletion of the hydrophobic residues or removal of charged lipids, and was enhanced by increasing the overall negative charge. In addition, nonspecific electrostatic antibody-lipid interactions increase 4E10 affinity to Env by providing extra contact sites on the viral surface, enlarging the interacting area, and/or facilitating the insertion of the Ab in the membrane after MPER engagement, thus stabilizing the 4E10-Env complex [95]. The decrease in hydrophobicity and increased in positive charge in MPER in MPER variants from TM mothers in this study may lead to reduced interaction between MPER and MPER targeting antibodies, and thus favored HIV-1 transmission. Logistic regression analysis indicated an interactive effect of hydropathy and charge of HIV-1 MPER variants on breast milk transmission outcome in our study. Similar to our study of gp41 MPER, a significant difference in hydropathy in gp120 between TM and NTM in intrauterine transmission was reported in another study [96], suggesting that intrauterine transmission is associated with maternal envelope quasispecies with altered cellular tropism or affinity for coreceptor molecules expressed on cells localized in the placenta. Together, both studies raise the possibility that antibody-independent mechanisms might contribute to transmission.
A novel aspect of our study is that differences in MPER were compared to flanking regions in gp41. While MPER regions displayed a trend toward increased maximum biodiversity, the striking biochemical characteristics of viral populations associated with MTCT by breastfeeding were restricted to MPER. Although HR2 and MSD segments that flank MPER were diverse, patterns of diversity were unrelated to transmission outcomes, perhaps reflecting HR2 interactions with HR1 or a role for MSD in anchoring gp41 in membranes [97][98][99][100][101][102]. Overall, deep sequencing coupled with an efficient bioinformatics pipeline provided unprecedented coverage of HIV-1 gp41 MPER quasispecies combined with sensitive detection of low frequency variants that can only be captured by high coverage of input viral copies. Low frequency variants within viral populations are particularly critical and clinically relevant as transmitting viruses. Our proof of principle studies identified months before transmission detailed characteristics of viral quasispecies related to transmission outcomes. By taking into consideration of biodiversity and amino acid polymorphisms increasing antibody resistant or altering the amino acid charges and hydropathies, results raise the possibility for identifying mothers with high-risk viral populations, who might benefit from MPER-targeted bn-HIV-Ab cocktails to reduce transmission during the breastfeeding period.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material. An unrooted neighbor-joining tree for each individual was developed from the deep sequencing data set clustered at 3% genetic distance. Each branch represents a consensus sequence of HIV-1 gp41 MPER within 3% genetic distance. Symbols represent the proportion of total deep sequences in a cluster: Ο, ≤ 0.25%; ■, > 0.25 % to 10%; , >10%.    Amino acid residue (a single letter code) which differs from HXB2 sequence was shown in each space with red letter representing amino acid residue resistant to bn-HIV-Ab(s) and black letter depicting amino acid with unknown effect on bn-HIV-Ab susceptibility. The K665A labeled by an * is resistant to 2F5 but increasing the sensitivity to 4E10 and PGZL1. Color scheme is used to define frequency of amino acid substitution with beige representing residues in >80% of HIV-1 MPER variants; green depicting residues in >10% to 80% of HIV-1 MPER variants; and grey representing residues in <1% to 10% of HIV-   Demographic, immune and viral characteristics of study subjects.