Mammalian Glutamyl Aminopeptidase Genes (ENPEP) and Proteins: Comparative Studies of a Major Contributor to Arterial Hypertension

Glutamyl aminopeptidase (ENPEP) is a member of the M1 family of endopeptidases which are mammalian type II integral membrane zinc-containing endopeptidases. ENPEP is involved in the catabolic pathway of the renin-angiotensin system forming angiotensin III, which participates in blood pressure regulation and blood vessel formation. Comparative ENPEP amino acid sequences and structures and ENPEP gene locations were examined using data from several mammalian genome projects. Mammalian ENPEP sequences shared 71-98% identities. Five N-glycosylation sites were conserved for all mammalian ENPEP proteins examined although 9-18 sites were observed, in each case. Sequence alignments, key amino acid residues and predicted secondary and tertiary structures were also studied, including transmembrane and cytoplasmic sequences and active site residues. Highest levels of human ENPEP expression were observed in the terminal ileum of the small intestine and in the kidney cortex. Mammalian ENPEP genes contained 20 coding exons. The human ENPEP gene promoter and first coding exon contained a CpG island (CpG27) and at least 6 transcription factor binding sites, whereas the 3′-UTR region contained 7 miRNA target sites, which may contribute to the regulation of ENPEP gene expression in tissues of the body. Phylogenetic analyses examined the relationships of mammalian ENPEP genes and proteins, including primate, other eutherian, marsupial and monotreme sources, using chicken ENPEP as a primordial sequence for comparative purposes.

The gene encoding ENPEP (ENPEP in humans and most mammals; Enpep in rodents) is expressed at high levels in the epithelial cells of the kidney glomerulus and proximal tubule cells. ENPEP participates in the renin-angiotensin system, by way of the conversion of the biologically active Ang II (angiotensin II) to angiotensin III (Ang III), as a result of the hydrolysis of the N-terminal aspartate (or glutamate) thereby removing biological activity of the Ang peptides [15,16]. In studies of blood pressure control in hypertensive rats, ENPEP is expressed in brain nuclei where ENPEP activity generates angiotensin III, one of the major effector peptides of the brain renin angiotensin system, causing a stimulatory effect on systemic blood pressure [7,17]. Genome wide association studies have examined blood pressure variation and atrial fibrillation risk in human populations and identified an association with ENPEP variants [9,12,13,18]. In addition, studies of Enpep/Enpep̄ knockout mice have shown that ischemia-induced angiogenesis is impaired in these mice, as a result of decreased growth factor secretion and capillary vessel formation [8]. Other studies involved in treating hypertension in animal models using inhibitors to block ENPEP activity have also supported a direct link between ENPEP and arterial hypertension in the body [19].
Biochemical and predictive structural studies of mammalian ENPEP proteins have shown that it comprises three major domains (human ENPEP numbers quoted): An N-terminus cytoplasmic sequence (residues 1-18); a transmembrane helical sequence (residues , the signal anchor for the type II membrane protein; and an extracellular domain (residues 40-957) [1,3]. A three-dimensional protein structure has been reported for the extracellular zinc-containing endopeptidase ENPEP domain and its complexes with different ligands, which identified a calcium-binding site in the S1 pocket of ENPEP [11]. In addition, inhibitor docking studies have identified specific amino acid residues (Asp213, Asp218 and Glu215) involved in enzyme catalysis and Thr348, in performing a key role in determining substrate and inhibitor specificity for this enzyme [20]. This paper reports the predicted gene structures and amino acid sequences for several mammalian ENPEP genes and proteins, the predicted structures for mammalian ENPEP proteins, a number of potential sites for regulating human ENPEP gene expression and the structural, phylogenetic and evolutionary relationships of these mammalian ENPEP genes and proteins.
This procedure produced multiple BLAST 'hits' for each of the protein and nucleotide databases which were individually examined and retained in FASTA format.
BLAT analyses were subsequently undertaken for each of the predicted ENPEP amino acid sequences using the UC Santa Cruz (UCSC) Genome Browser with the default settings to obtain the predicted locations for each of the mammalian M1 peptidase genes, including predicted exon boundary locations and gene sizes (Table 1) [23]. Structures for human isoforms (splicing variants) were obtained using the AceView website to examine predicted gene and protein structures [24]. points; the number of coding exons are listed.
Alignments of human and other mammalian ENPEP sequences examined were between 71-98% identical, suggesting that these are members of the same family of genes. The amino acid sequences for mammalian ENPEP proteins contained between 942 (pig) and 962 (Mouse lemur) amino acids, with human and most other primate ENPEP sequences containing 957 amino acids (Figures 1 and 2; Table 1).

Predicted secondary and tertiary structures for mammalian ENPEP
Predicted secondary structures for mammalian ENPEP sequences were examined, particularly for the extracellular sequences ( Figure 1) using the known structure reported for human ENPEP [11] (PDB: 4kx7A), with 35 α-helices and 28 β-sheet structures being observed. Of particular interest were α-helices 8, 9 and 14 which contained the active site residues for human ENPEP. A diagram showing the tertiary structure for human ENPEP is shown in Figure 3 which demonstrates the distinct secondary structures for the N-and Ctermini regions for the protein, with β-sheet structures predominating in the N-terminus region and with α-helices being the predominant structures for the C-terminus. These two major domains for human ENPEP, previously mentioned, were readily apparent, that enclose a large cavity previously shown to contain the enzyme's active site [11]. The N-terminal domain (residues 100-545) contains the active site residues and has been recognized as a member of the peptidase M1 aminopeptidase N family, whereas the C-terminal domain (residues 617-931, recognized as an ERAP1-like domain) [31] is composed of 16 alpha helices, organized as 8 HEAT-like repeats (2 alpha helices joined by a short loop) [32], which forms a concave face facing towards the peptidase active site. This C-terminal ENPEP domain has also been shown to function as an intramolecular chaperone contributing to the correct folding, cell surface expression and activity of this enzyme [33]. Figure 4 shows RNA-seq gene expression profiles across 53 selected tissues (or tissue segments) were examined from the public database for human ENPEP, based on expression levels for 175 individuals [16] (Data Source: GTEx Analysis Release V6p (dbGaP Accession phs000424.v6.p1) (http://www.gtex.org). These data supported highest levels of gene expression for human ENPEP in the small intestine-terminal ileum and the kidney cortex, which is consistent with the enzyme's role in digestive tract and renal sodium (Na+) reabsorption and the renin-angiotensin system [18,34]. Lower levels were also observed in the uterus, spleen, breast, visceral adipose tissue and coronary artery, whereas brain ENPEP levels were very low according to this method, even though ENPEP has been shown to contribute to the renin angiotensin system in brain nuclei [7]. Table 1 summarizes the predicted locations and exonic structures for mammalian ENPEP genes based upon BLAT interrogations of several mammalian and chicken genomes using the reported sequences for human and mouse ENPEP [1,8,35] and the predicted sequences for other ENPEP enzymes and the UCSC genome browser [23]. The predicted mammalian ENPEP genes were transcribed on both the negative strand (lower primates and most nonprimate genomes) and the positive strand (higher primates, dog and opossum genomes). Figure 1 summarizes the predicted exonic start sites for human, baboon, mouse, opossum and chicken ENPEP genes with each having 20 coding exons, in identical or similar positions to those predicted for the human ENPEP gene. Exon 1 encodes the largest segment for each of these genes, including the cytoplasmic N-terminus and signal anchor sequences and the first 10 β-sheet structures and four of the N-glycosylation sites for mammalian ENPEP. Figure 5 shows the predicted structure for the major human ENPEP transcript together with CpG27 and several Transcription Factor Binding Sites (TFBS), which are located at the 5′ end of the gene, consistent with potential roles in regulating the transcription of this gene and forming part of the ENPEP gene promoter. The human ENPEP transcript was 4,991 bps in length with an extended 3′-untranslated region (UTR) containing 7 microRNA target sites. The human ENPEP genome sequence also contained several predicted TFBS and a large CpG island (CpG27) located in the 5′-untranslated promoter region of human ENPEP on chromosome 4. CpG27 contained 412 bps with a C plus G count of 264 bps, a C or G content of 64% and showed a ratio of observed to expect CpG of 0.64. It is likely therefore that the CpG27 Island plays a key role in regulating this gene and may contribute to the very high level of gene expression observed in the small intestine-terminal ileum and the kidney cortex [36]. At least 6 TFBS sites were colocated with CpG27 in the human ENPEP promoter region which may contribute to the high expression of this gene in human kidney and intestine.

Gene locations, exonic structures and regulatory sequences for mammalian ENPEP genes
Of special interest among these identified ENPEP TFBS were the following: The chicken ovalbumin upstream promoter transcription factor II (COUP), which has been implicated in renin gene expression, a key member of the renin-angiotensin system [37] which is highly expressed in kidney cells [38,39] the ecotropic viral integration site (EVI1) is also highly expressed in the developing kidney distal tubule and duct in Xenopus and plays a key role in its formation [40,41] and nuclear protein c-Myc, which plays an important role in intestinal epithelial cell proliferation [11].
It appears that the ENPEP gene promoter contains gene regulatory sequences and a large CpG island (CpG27) which may contribute to the high levels of expression observed in intestine and kidney cells. Among the microRNA binding sites observed, miR-125b has been shown to act as a tumor suppressor in breast tumorigenesis by directly targeting the ENPEP gene [10].

Phylogeny and divergence of mammalian ENPEP M1 peptidase sequences
A phylogenetic tree ( Figure 6) was calculated by the progressive alignment of 19 ENPEP mammalian M1 peptidase amino acid sequences with the chicken (Gallus gallus) ENPEP sequence, which was used to 'root' the tree ( Table 1). The phylogram showed clustering of the ENPEP sequences into groups which were consistent with their evolutionary relatedness and showing distinct groups for primate, other eutherian (mouse/rat, cow/pig and dog/cat), marsupial (opossum) and monotreme (platypus) ENPEP sequences, which were distinct from, and progressively related to each other. It is apparent that the ENPEP gene existed as a distinct mammalian gene family which has evolved from a more primitive vertebrate ENPEP gene and has been retained throughout monotreme, marsupial and eutherian mammalian evolution.

Discussion
ENPEP is expressed at high levels in the epithelial cells of the kidney glomerulus and proximal tubule cells where the enzyme participates in the renin-angiotensin system: Renin cleaves substrate angiotensinogen forming the decapeptide angiotensin I (Ang I) [42].

1.
Ang I is cleaved by Angiotensin-Converting Enzyme (ACE) to produce the biologically active angiotensin II (Ang II) [43].

2.
Ang II activates its receptor (AT1) that mediates key physiological functions in the kidney (systemic regulation) and brain (central regulation), including vasoconstriction, renal sodium (Na+) reabsorption and aldosterone secretion, increasing blood pressure and contributing to hypertension [44,45].

3.
Ang II is converted to angiotensin III (Ang III) by ENPEP facilitating the hydrolysis of the N-terminal aspartate (or glutamate) thereby removing biological activity of the Ang peptides [15,16].
The results of the present study indicated that mammalian ENPEP genes and encoded proteins represent a distinct gene and protein family of M1 peptidase proteins which share key conserved sequences that have been reported for other M1 peptidases previously studied [6,46,47]. Human ENPEP contains the following sites: a cytoplasmic N-terminus region (1-18); a hydrophobic transmembrane 21-residue segment , a helical signal anchor for type II membrane protein; and an extracellular protein region (residues 100-545) containing the Zinc binding endopeptidase active site (the substrate binding site (223Glu); the Zinc binding site (1 Zinc ion per subunit) (393His, 397His, 416Glu); the proton acceptor (394Glu); and the transition state stabilizer (497Tyr); and the ERAP1-like C-terminal domain (residues 617-931) ( Figure 1) [28], which contain a large number of N-glycosylation sites, several of which are conserved throughout mammalian evolution. ENPEP plays a role in the catabolic pathway of the renin-angiotensin system and is a major contributor to the development of clinical arterial hypertension in the body [13,15,18,19,42,45].

Conclusion
ENPEP is encoded by a single gene among the mammalian genomes studied and is highly expressed in human small intestine-terminal ileum and kidney cortex cells, and usually contained 20 coding exons on the negative (lower primate and other mammalian) or positive (higher primate) strands, depending on the mammalian genome. The human ENPEP gene contained a large CpG island within the promoter region, as well as several transcription factor binding sites, which may contribute to the high level of gene expression in intestinal and kidney tissues. Alignments of mammalian ENPEP sequences demonstrated the high degree of conservation observed, particularly for those regions directing the catalytic functions and structural integrity for this enzyme, especially the extracellular sequences, containing two domains, including the N-terminal GluZincin Peptidase M1 (aminopeptidase N) domain (residues 100-545); and the ERAP1-like C-terminal domain (residues 617-931). Phylogenetic studies using 19 ENPEP mammalian M1 endopeptidase sequences indicated that the ENPEP gene existed as a distinct family which has apparently evolved from a more primitive vertebrate ENPEP gene which has been retained throughout monotreme, marsupial and eutherian mammalian evolution [48][49][50][51][52][53]. Amino acid sequence alignments for vertebrate ENPEP sequences. Table 1 for sources of ENPEP sequences; *Shows identical residues for ENPEP subunits; : Similar alternate residues;. Dissimilar alternate residues; N-glycosylated and potential N-glycosylated Asn sites are in red and numbered according to; human ENPEP active site residues are shown: Zinc binding sites, 393His, 397His, 416Glu; proton acceptor, 394Glu; and transition state stabilizer 497Tyr; other active site residues are shown as ^; α-helices for vertebrate ENPEP [11] are in shaded yellow and numbered in sequence from the N-terminus end; predicted βsheets are in grey and similarly numbered in sequence from the N-terminus; turns in the 3D structure are shown; bold underlined font shows residues corresponding to known or predicted exon start sites; exon numbers refer to human ENPEP gene exons; four major domains were identified as cytoplasmic (N-terminal tail) (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19); signal membrane anchor transmembrane (for linking ENPEP to the plasma membrane) (20-  N-terminal amino acid sequence alignments (A) and 5′-nucleotide gene sequence alignments (B) for mammalian ENPEP proteins and genes. A: N-terminal mammalian ENPEP amino acid sequence alignments; *Shows identical residues for ENPEP subunits; : Similar alternate residues;. Dissimilar alternate residues; predicted cytosolic and transmembrane helical residues are shown; Table 1   Tertiary structure for human ENPEP. The structure for human ENPEP is based on the reported structure [11] and obtained using the SWISS MODEL web site based on PDB 4KX7A (http://swissmodel.expasy.org/workspace/). The rainbow color code describes the 3-D structure from the N-(blue) to C-termini (red color); α-helices and β-sheets are shown; note the separation of 2 major domains: N-terminal M1 aminopeptidase N domain (in blue, with predominantly β-sheets); and C-terminal ERAP1-like domain (multicolored, with predominantly α-helical structures.   Gene structure and major gene transcript for the human ENPEP gene. Derived from the Ace View (http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/) [24]; shown with capped 5′and 3′-ends for the predicted mRNA sequences; NM refers to the NCBI reference sequence; coding exons are in pink; the direction for transcription is shown as 5′ ? 3′; a large CpG27 island is located at the gene promoter and the first exon; predicted transcription factor binding sites (TFBS) for human ENPEP are shown; 7 predicted miRNA target sites were identified within the extended 3′-UTR region of human ENPEP.  Phylogenetic tree of mammalian ENPEP amino acid sequences with the chicken ENPEP amino acid sequence. The tree is labeled with the ENPEP name and the name of the animal and is 'rooted' with the chicken (Gallusi gallus) ENPEP sequence, which was used to 'root' the tree (Table 1). Note the single cluster corresponding to the ENPEP gene family. A genetic distance scale is shown. The number of times a clade (sequences common to a node or branch) occurred in the bootstrap replicates are shown. Replicate values of 0.9 or more, which are highly significant, are shown with 100 bootstrap replicates performed in each case. A proposed sequence of gene evolution events is shown arising from an ancestral bird ENPEP gene.  Table 1 Mammalian and chicken ENPEP genes and proteins.  Table 2 Predicted locations of N-glycosylation sites for mammalian ENPEP proteins. The predicted N-glycosylation sites were numbered following alignments using Clustal Omega [29] from the N-terminal end; conserved N-glycosylation sites for all mammalian ENPEP sequences examined are highlighted in yellow; individual amino acid residues were identified using standard single letter nomenclature: Nasparagine; S-serine; T-threonine etc.