Discovery of α-l-arabinopyranosidases from human gut microbiome expands the diversity within glycoside hydrolase family 42

Enzymes of the glycoside hydrolase family 42 (GH42) are widespread in bacteria of the human gut microbiome and play fundamental roles in the decomposition of both milk and plant oligosaccharides. All GH42 enzymes characterized so far have β-galactosidase activity. Here, we report the existence of a GH42 subfamily that is exclusively specific for α-l-arabinopyranoside and describe the first representative of this subfamily. We found that this enzyme (BlArap42B) from a probiotic Bifidobacterium species cannot hydrolyze β-galactosides. However, BlArap42B effectively hydrolyzed paeonolide and ginsenoside Rb2, plant glycosides containing an aromatic aglycone conjugated to α-l-arabinopyranosyl-(1,6)-β-d-glucopyranoside. Paeonolide, a natural glycoside from the roots of the plant genus Paeonia, is not hydrolyzed by classical GH42 β-galactosidases. X-ray crystallography revealed a unique Trp345-X12-Trp358 sequence motif at the BlArap42B active site, as compared with a Phe-X12-His motif in classical GH42 β-galactosidases. This analysis also indicated that the C6 position of galactose is blocked by the aromatic side chains, hence allowing accommodation only of Arap lacking this carbon. Automated docking of paeonolide revealed that it can fit into the BlArap42B active site. The Glcp moiety of paeonolide stacks onto the aromatic ring of the Trp252 at subsite +1 and C4-OH is hydrogen bonded with Asp249. Moreover, the aglycone stacks against Phe421 from the neighboring monomer in the BlArap42B trimer, forming a proposed subsite +2. These results further support the notion that evolution of metabolic specialization can be tracked at the structural level in key enzymes facilitating degradation of specific glycans in an ecological niche.


ABSTRACT
Enzymes of the glycoside hydrolase family 42 (GH42) are widespread in bacteria of the human gut microbiome and play fundamental roles in the decomposition of both milk and plant oligosaccharides. All GH42 enzymes characterized so far have β-galactosidase activity. Here, we report the existence of a GH42 subfamily that is exclusively specific for α-L-arabinopyranoside and describe the first representative of this subfamily. We found that this enzyme (BlArap42B) from a probiotic Bifidobacterium species cannot hydrolyze β-galactosides. However, BlArap42B effectively hydrolyzed paeonolide and ginsenoside Rb2, plant glycosides containing an aromatic aglycone conjugated to α-L-arabinopyranosyl-(1,6)-β-D-glucopyranoside. Paeonolide, a natural glycoside from the roots of the plant genus Paeonia, is not hydrolyzed by classical GH42 βgalactosidases. X-ray crystallography revealed a unique Trp345-X 12 -Trp358 sequence motif at the BlArap42B active site, as compared to a Phe-X 12 -His motif in classical GH42 β-galactosidases. This analysis also indicated that the C6 position of galactose is blocked by the aromatic side chains, hence allowing accommodation only of Arap lacking this carbon. Automated docking of paeonolide revealed that it can fit into the BlArap42B active site. The Glcp moiety of paeonolide stacks onto the aromatic ring of the Trp252 at subsite +1 and C4-OH is hydrogen bonded with Asp249. Moreover, the aglycone stacks against Phe421 from the neighboring monomer in the BlArap42B trimer, forming a proposed subsite +2. These results further support the notion that evolution of metabolic specialization can be tracked at the structural level in key enzymes facilitating degradation of specific glycans in an ecological niche.
The complex microbial consortium in the human gastrointestinal tract, referred to as the gut microbiota, has an increasingly recognized vital role in human health (1). Human diet is rich in a wide variety of non-digestible saccharides from plants and animals, and glycan metabolism is a pivotal factor shaping the dynamics and the evolution of gut microbiota (2). Specific taxa gain a competitive edge in fitness in this highly competitive niche through metabolic 2 specialization, which is reflected by their enzymatic machinery being fine-tuned for specific glycans (3). Glycans containing β-galactoside are abundant in human infant and adult nutrition (4)(5)(6), and hydrolysis of the β-galactosidic bond by βgalactosidases is a prerequisite for their utilization. Key enzymes in the hydrolysis and utilization of β-galactosides in gut microbes assign into glycoside hydrolase (GH) family 42 in the sequence-based classification system of the Carbohydrate Active enZymes (CAZy) database (7). Genomes of probiotic strains from the Bifidobacterium genus often encode several GH42 enzymes (8) with distinct subspecificities matching diversity and abundance of βgalactosides. For example, B. longum subsp. infantis encodes three GH42 β-galactosidases related to the utilization of the human milk oligosaccharide lacto-N-tetraose (LNT), βgalactosides from bovine milk and arabinogalactan, respectively (9). A similar specialization exists in Bifidobacterium breve that encodes two GH42 β-galactosidases, one targeting LNT (10) and the other plant galactan (11). Notably, B. bifidum utilizes mucin-derived βgalactosides from the glycoprotein layer coating the human host in epithelial colonocytes (12), which is proposed to be possible through a key GH42 β-galactosidase (13).
The genome of B. animalis subsp. lactis Bl-04 encodes two GH42 enzymes (14). Galactooligosaccharides (GOS) were shown to upregulate gene locus balac_0848 (15), experimentally verified to be a β(1,6)/β(1,3) galactosidase (BlGal42A) (13). The second GH42 (BlArap42B) gene with locus tag balac_0053 was not differentially upregulated by GOS (15), and belongs to a distinct, not previously characterized clade of bifidobacterial GH42 enzymes (16). This distinct clade is here shown to form a novel GH42 subfamily present throughout the bacterial kingdom, which seems to have exclusive α-L-arabinopyranoside specificity as evaluated for the soluble intracellular BlArap42B using a panel of pNP-substrates and oligosaccharides. BlArap42B effectively hydrolyzed paeonolide ( Figure 1A), a plant glycoside which contains a non-reducing end α-L-arabinopyranoside and is found in the roots of the widespread plant genus Paeonia (17). This apparent metabolic specialization could be tracked at the structural level, as the crystal structure of BlArap42B suggested GH42 members with α-Larabinopyranosidase activity recognize their substrates through an active site Trp345-Trp358 motif as compared to a Phe-His motif in classical GH42 β-galactosidases.
Three-dimensional structure of BlArap42B−The three-dimensional structure of BlArap42B in its ligand-free form was determined by X-ray crystallography at 2.0 Å resolution with six chains (two trimers) in the asymmetric unit. Data collection, refinement and stereochemical statistics are summarised in Table 3. BlArap42B is homotrimer of a 701 residues subunit, with three subunits related by a 3-fold axis ( Figure 2A). Each subunit consists of three domains. Domain A is a (β/α) 8 barrel containing the catalytic residues. Domain B is a structural domain that packs onto Domain A of the adjacent monomer. Domain C, whose role is unknown, adopts an anti-parallel βsandwich fold. The first seven residues (Met1-Asp7) and the region Pro663-Thr672 in all six chains, as well as a variable number of residues in the C-terminus of each chain (Gly695-Asn701), were disordered and not included in the final model.
BlArap42B active-site architecture and ligand docking−Similar to structures of GH42 βgalactosidases, the active sites and the vicinity encompassing the proposed subsites +1 and +2 are formed at the interface of two adjacent monomers in the BlArap42B trimer (18). It was possible to dock an energetically preferred conformation of paeonolide, the preferred substrate for BlArap42B, into the active site with an estimated affinity of −8.8 kcal mol -1 ( Figure 2B). The docked paeonolide molecule makes an intramolecular hydrogen bond between the acetyl oxygen of the aglycon ( Figure 1A) and C2-OH of the Glcp. The Arap ring binds at subsite −1, with distances of 3.3 Å between the catalytic nucleophile Glu311 and the anomeric carbon and 2.9 Å between the catalytic acid/base Glu151 and the glycosidic oxygen, respectively. The Glcp ring bound at subsite +1, is stacking onto Trp252, and C4-OH is hydrogen bonded with Asp249. Furthermore, the aromatic ring of the aglycone and Phe421 from the neighboring monomer are almost parallel, making an aromatic stacking at this position, indicative of the presence of a subsite +2.
Comparison with the BlGal42A β-galactosidase structure−The structure of BlArap42B was compared to that of the other GH42 enzyme from the same organism (BlGal42A) that has specificity towards β(1,6)/β(1,3)-galactoside linkages and is one of the best characterized enzymes (both biochemically and structurally) among GH42 βgalactosidases (13). Figure  3A shows superimposition of the structures of BlArap42B and BlGal42A in complex with galactose (PDB: 4UNI). From the structural comparison, Glu151 and Glu311 of BlArap42B were inferred as the acid/base catalyst and the catalytic nucleophile, respectively. Although BlArap42B and BlGal42A only share 25% sequence identity their overall structures are very similar as reflected by the root mean square deviation (RMSD) for C α atoms of 1.75 Å when chain A (689/695 residues) of BlGal42A is superimposed on chain A (682/701 residues) of BlArap42B.
Remarkably, three residues at the substrate binding site are either variant or spatially differently located in BlArap42B as compared to BlGal42A ( Figure 3A). Trp345 is phenylalanine in BlGal42A (Phe362) and the aromatic side chain is involved in substrate recognition in subsite −1 by forming a hydrophobic platform for the C4 side of the arabino-or galactopyranoside. The change of a histidine in BlGal42A (His375) to Trp358 restricts the space in subsite −1 around the C6-O6 hydroxymethyl group of galactoside, which is not present in arabinopyranoside ( Figure 3B). Moreover, Trp332 of BlGal42A makes a hydrogen 4 bond to the C6-OH of galactose, and is also proposed to mediate aromatic stacking to substrate at subsite +1 in GH42 β-galactosidases. In contrast, the conformation of the loop containing Trp332 is different in BlArap42B and locates a tryptophan at the corresponding position in the sequence (Trp320, not shown in Figure 3A), distantly from the active site in space. Instead, a differently located Trp252, which is not conserved in previously characterized GH42 enzymes, is taking the role of stacking platform at subsite +1 in BlArap42B. Additionally, Met268 in BlGal42A replaces Asp249 that in the docked structure of BlAra42B forms a hydrogen bond to the Glcp C4-OH in paeonolide.

Mutational
analysis−From the structural comparison with BlGal42A, two non-conserved residues (Trp345 and Trp358) are identified in subsite -1. To examine the role of these residues in relation to substrate specificity, these single residues were replaced using site-directed mutagenesis with those found in the βgalactosidase (Phe or His). Table 1 shows specific activities of the single (W345F and W358H) and double (W345F/W358H) mutants towards pNPsubstrates, the single mutants having more than 10-fold reduced activity on pNP-α-L-Arap and the double mutant being almost inactive. W358H was slightly active on pNP-β-D-Fucp, but none of the mutants showed detectable activity on pNP-β-D-Galp or any of the other pNP-substrates. Clearly, the substrate specificity of the GH42 α-Larabinopyranosidase was not changed to βgalactosidase by these simple mutations.
Phylogeny and active site motifs of GH42−A phylogenetic and active site sequence motif analysis of GH42 α-L-arabinopyranosidases (subfamily A) and β-galactosidases (subfamily G), divides GH42 into two subfamilies, respectively ( Figure 4). BlArap42B assigns into a distinct uncharacterized clade, group one (G1) in the phylogenetic tree. Based on sequence analysis, it is evident that the two tryptophan residues (Trp345 and Trp358) in BlArap42B are conserved in G1 and subfamily A, and comprise a unique Trp-X 12 -Trp sequence motif through the active site as compared to subfamily G that includes G2-G4. All of the G2-G4 groups of subfamily G contain at least one characterized β-galactosidase that has a Phe-X 12 -His motif. Trp252 of BlArap42B in subsite +1 is conserved in G1 but not in G2-G4 (data not shown).

BlArap42B
is clearly a GH42 α-Larabinopyranosidase, distinctly different from previously characterized GH42 enzymes. The k cat (240 s -1 ) and K m (0.074 mM) values of BlArap42B towards its preferred substrate, paeonolide, are in the same range as those of previously characterized GH42 enzymes towards their preferred substrates (13,24).
Structural specificity determinants in GH42 α-Larabinopyranosidases−The residues creating the spatial and chemical environment at subsite −1 are invariant in characterized GH42 β-galactosidases (13,18,22,26,27). However, the change from a histidine residue in classical GH42 βgalactosidases to Trp358 in BlArap42B limits the space at subsite −1, which would cause clashing with C6 of galactose in accordance with BlArap42B being unable to hydrolyze βgalactosides ( Figure 3B). A similar structural change is observed between GH27 α-Dgalactosidases and β-L-arabinopyranosidases, the only other similar described activity. In GH27 β-Larabinopyranosidases, a mutagenesis study revealed that the single substitution from an aspartic acid to the slightly larger glutamic acid in the catalytic pocket around the C-6 of galactose as compared to GH27 α-D-galactosidases is critical for modulating the enzyme activity (28). However, a similar attempt to change the specificity of BlArap42B from α-L-arabinopyranosidase to βgalactosidase through the single and double mutations of Trp345 and Trp358 failed (Table 1). This may suggest that the active site of this enzyme is optimized to α-L-arabinopyranoside through accumulation of other substitutions around subsite −1 during the molecular evolution after it diverged from the β-galactosidases.
Notably, a major change is observed in the positioning of the subsite +1 stacking platform and potential recognition of hydroxyl groups between BlArap42B and BlGal42A ( Figure 3A). Superimposing the docked paeonolide from BlArap42B into BlGal42A shows the phenyl aglycone clashes with the backbone of Trp332, which is invariant and similarly spatially located in structurally characterized GH42 enzymes (13,18,22,26,27), and Phe226 of the neighboring monomer in BlGal42A, which can explain its lack of activity for paeonolide ( Figure 3C). The present findings indicate that GH42 α-Larabinopyranosidases have evolved from a common scaffold to target specific α-Larabinopyranosides present in plant oligosaccharides.
BlArap42B defines a novel GH42 α-Larabinopyranosidase subfamily−Applying a phylogenetic and sequence analysis approach links the α-L-arabinopyranoside specificity of BlArap42B with a unique Trp-X 12 -Trp sequence motif at the active site, as opposed to Phe-X 12 -His in classical bifidobacterial GH42 β-galactosidases ( Figure 4). The Trp-X 12 -Trp sequence motif is present in enzymes with a GH42 catalytic domain from more than 150 different bacterial species and subspecies of various phyla, and uncovers a novel α-L-arabinopyranosidase subfamily (subfamily A) distinct from GH42 β-galactosidases (subfamily G). As a confirmation that the subfamily division reflects specificity, we have shown that RiArap42B, a homologue and subfamily A member from a human gut bacterium, Roseburia intestinalis M50/1 (29), shows the same activity profile on pNP-derived substrates, hydrolyzing pNP-α-L-Arap, but not pNP-β-D-Galp (Table 1), supporting the exclusive activity towards α-Larabinopyranoside of this subfamily. Noticeably, BgaA from Clostridium cellulovorans (30), belonging to subfamily G, contains a Phe-X 12 -Trpmixed sequence motif and is the only α-Larabinopyranosidase currently assigned to GH42 in the CAZy database. Clostridium cellulovorans BgaA, however, hydrolyzes pNP-β-D-Galp with about 10% activity of pNP-α-L-Arap (30) as opposed to BlArap42B that shows <0.15% activity of pNP-α-L-Arap on pNP-β-D-Galp.

Subspecificities
within the α-Larabinopyranosidase subfamily A−Distinct groups clearly exist within subfamily A (Figure 4), with members of different groups organized in different gene landscapes. The gene encoding BlArap42B is located adjacent to a gene encoding a putative GH30 enzyme, with no close characterized homologues based on sequence similarity, as well as in the near vicinity of a putative GH3 enzyme (BlGH3). The closest characterized homologue to BlGH3 is the GH3 Apy-H1 from B. longum H-1, showing 61% sequence identity and 74% similarity, and dual α-L-arabinopyranosidase / βgalactosidase specificity, releasing α-Larabinopyranoside from ginsenoside Rb2 found in Panax ginseng (31). Interestingly, BlArap42B is also able to release α-L-arabinopyranoside from ginsenoside Rb2 with high efficiency. This gene cluster in B. animalis subsp. lactis Bl-04 likely targets α-L-arabinopyranosides found in plants of especially Asian origin (17,32). Similarly, a GH3 6 member is encoded adjacent to the gene of RiArap42B in R. intestinalis M50/1, but it clusters with characterized xylan β(1,4) xylosidases based on a phylogenetic analysis of this family (data not shown).
The genes surrounding the gene encoding a putative GH42 α-L-arabinopyranosidase from Bacteriodes uniformis ATCC 8492 (Bu42B) of subfamily A (Figure 4) contains all glycoside hydrolase-encoding genes necessary for complete utilization of xyloglucan oligosaccharides (33), suggesting that α-L-arabinopyranosides exist in xyloglucan, but this has yet to be identified. Generally, α-L-arabinopyranoside is sparsely reported in plant oligosaccharides, which is in sharp contrast to the evident appearance of BlArap42B homologues in different gene landscapes found throughout the bacterial kingdom.
GH42 α-L-arabinopyranosidases therefore may prove useful in the identification of α-L-arabinopyranosides in plant glycans, oligosaccharides and glycoconjugates.

α-L-Arabinopyranosidase activity in the human gut niche−Recently, the human gut microbe
Bacteriodes thetaiotaomicron was shown to be capable to completely depolymerise the highly complex pectic Rhamnogalacturonan II, containing buried α-L-Arap-(1,4)-β-D-Galp, through the novel BT0983 GH2 α-Larabinopyranosidase (34). Additionally, the human gut microbe B. breve K-110, isolated from healthy adults, can utilize ginsenoside Rb2 (35). However, the internal amino acid sequence (VIYLTDA) of the purified α-L-arabinopyranosidase from B. breve K-110 match with an AAA + family ATPase but not with any GHs in the genome sequences, suggesting that the sequence was derived from a contaminated protein. Since ginsenoside Rb2 was effectively hydrolyzed by BlArap42B, B. animalis subsp. lactis Bl-04 is potentially conferred with the ability to utilize this compound. The present study reports the enzymology and structure of a GH42 α-L-arabinopyranosidase from the probiotic B. animalis subsp. lactis Bl-04, uncovering the existence of a distinct GH42 subfamily, with novel specificity towards α-L-arabinopyranosides and identifying a unique associated active site sequence motif. The molecular insights together with a deep phylogenetic analysis identify key structural elements discriminating the GH42 α-L-arabinopyranosidases from the classical GH42 βgalactosidases.
The phylogenetic analysis uncovered the existence of unexpectedly large number (>150) of potential GH42 α-Larabinopyranosidases in bacterial genomes. It is noteworthy that the plant glycosides effectively hydrolyzed by BlArap42B, paeonolide and ginsenoside Rb2, are contained in traditional herbal medicines in East Asian countries, such as Cortex Moutan (Mu Dan Pi in Chinese and Botanpi in Japanese) and Asian Ginseng, suggesting that other edible root vegetables may also contain certain amounts of glycosides with an α-L-arabinopyranoside. Altogether, these results indicate that α-L-arabinopyranoside metabolism evolved in different ecological niches and probably is widespread in important probiotic members of the human gut microbiota.

Biochemical
Characterization−Activity was assayed towards pNP-β-D-Galp, pNP-α-L-Arap, pNP-β-D-Fucp, pNP-α-L-Araf, pNP-β-D-Glcp, and pNP-β-D-Xylp (final conc. 5 mM) at 37 o C in 40 mM sodium citrate, 0.005% Triton X-100, pH 6.5 (50 µL) by addition of enzyme (2−2,000 nM) and stopping the reaction after 10 min by 1 M Na 2 CO 3 (200 µL) for BlArap42B wild-type and mutants, and RiAra42B. The amount of released pNP was measured spectrophotometrically at A 410 using pNP (0-2 mM) as standard. One unit of activity was defined as the amount of enzyme that released 1 µmol pNP min -1 . The kinetic parameters k cat and K m were determined from initial rates of pNP-α-L-Arap (0.5−7.5 mM) hydrolysis in the above buffer by non-linear regression fit of the Michaelis- HPAEC-PAD experiments were carried out using a Dionex ICS-3000 HPLC system (Chromeleon software version 6.80, Dionex) with a Dionex Carbopac PA20 column. The analytes were separated using 40 mM NaOH in isocratic mode at 0.5 ml min -1 . Initial hydrolysis activities were measured using 20 nM BlArap42B at 30 °C in 40 mM sodium citrate (pH 6.5) towards paeonolide, ginsenoside Rb2, and quercetin 3-O-α-L-Arap (100 µM) in 19.2 -65.2 min assays. The kinetic parameters k cat and K m were determined from initial rates of paeonolide, and ginsenoside Rb2 (7.812 -1000 µM) using 40 pM BlArap42B at 30 °C in the above buffer at 4 time points by integrating the area of the peaks corresponding to released L-arabinose.

Crystallization, Data Collection, and Structure
Determination of BlArap42B−BlArap42B was concentrated to 12.0 mg mL -1 in 10 mM MES pH 6.5, 150 mM NaCl and screened for initial crystallization conditions at 20 ˚C with the JCSG core I−IV screens (Qiagen). Crystals were observed in the JCSG core IV screen (1.5 M ammonium sulfate, 12% glycerol (v/v), 0.1 M Tris pH 8.5). The final crystallization condition was 12% glycerol, 1.6 M ammonium sulfate, 0.1 M MES buffer pH 6.5 obtained from optimization in 24-well VDX trays (Hampton Research) in sitting drops containing 1 µL protein stock and 1 µL reservoir at 20 ˚C. Glycerol (20% final conc.) was added as cryoprotectant before harvesting.
A data set to 2.0 Å resolution was obtained at the BL5A beamline, Photon Factory, Tsukuba, 8 Japan. Processing and scaling of the data was done with HKL2000 (38). The space group was determined to be P4 1 2 1 2 with six molecules in the asymmetric unit. Molecular Replacement was performed using Balbes (39) with a chain of the homotrimer of the PDB entry code 1KWG (18) as the initial search model. ARP/wARP (40) was used to partially build the structure, and refinement was completed using Coot (41) and Refmac5 (42).
The quality of the structures, including Ramachandran statistics, was verified by MolProbity (43), and PyMOL v1.8.5 (Schrödinger, LLC, New York) was used for structural analysis and rendering of molecular graphics.
Ligand preparation and docking−Paeonolide was prepared from the PubChem project (CID: 92043525) and PCModel v 9.20 (Serena Software) using energy minimisation with MMX force field. The energy minimised molecule was docked into the active site of BlArap42B using AutoDock Vina 1.1.2 (44) with the grid box (16 Å × 16 Å × 16 Å) centred on the scissile glycosidic bond oxygen.
The ligand structure was docked with flexible torsion angles, whereas the protein structure was fixed.