Send Orders of Reprints at Reprints@benthamscience.net Phylogenetic Analysis of Brassica Rapa Math-domain Proteins

The MATH (meprin and TRAF-C homology) domain is a fold of seven anti-parallel-helices involved in protein protein interaction. Here, we report the identification and characterization of 90 MATH-domain proteins from the Brassica rapa genome. By sequence analysis together with MATH-domain proteins from other species, the B. rapa MATH-domain proteins can be grouped into 6 classes. Class-protein has one or several MATH domains without any other recognizable domain; Class-protein contains a MATH domain together with a conserved BTB (Broad Complex, Tramtrack, and Bric-a-Brac) domain; Class-protein belongs to the MATH/Filament domain family; Class-protein contains a MATH domain frequently combined with some other domains; Class-protein has a relative long sequence but contains only one MATH domain; Class-protein is characterized by the presence of Peptidase and UBQ (Ubiquit-inylation) domains together with one MATH domain. As part of our study regarding seed development of B. rapa, six genes are screened by SSH (Suppression Subtractive Hybridization) and their expression levels are analyzed in combination with seed developmental stages, and expression patterns suggested that Bra001786, Bra03578 and Bra036572 may be seed development specific genes, while Bra001787, Bra020541 and Bra040904 may be involved in seed and flower organ development. This study provides the first characterization of the MATH domain proteins in B. rapa.


INTRODUCTION
Meprins are mammalian tissue-specific metalloendopeptidases of the astacin family implicated in developmental, normal and pathological processes by hydrolysing a variety of proteins [1]. TRAF (TNF-receptor associated factors) proteins were first isolated through their ability to interact with TNF receptors [2]. The MATH domain (meprin and TRAF homology domain) is found in cytosolic signaling molecules such as TRAF class proteins, which are characterized by a Cterminal region encompassing about 180 amino acids, forming a 7-8 anti-parallel -sheets fold (TRAF-C domain / MATH domain) [3][4][5], and the domain forms a new, lightstranded antiparallel beta sandwich structure [6]. A coiledcoil region adjacent to the MATH domain is important for oligomerisation, essential for establishing appropriate connections to form signalling complexes with TNF receptor-1 [7]. The ligand binding surface of TRAF proteins is located in beta-strands 6 and 7, and consensus motif (P/S/A/T)x (Q/E)E is the major motif of MATH domain [8]. The MATH *Address correspondence to these authors at the Hunan Provincial Key Laboratory of Crop Germplasm Innovation and Utilization, Hunan Agricultural University, Changsha, 410128 Hunan, China; Tel: 0086-731-84635294; Fax: 0086-731-84673765; E-mails: yingruan@hotmail.com; liucl100@126.com # These authors contributed equally to this work. domain seems to be important for protein-protein interaction and several studies on human and C. elegans MATH proteins indicate that they might have important functions in the regulation of protein processing [9]. In TRAF proteins, the N-terminal of the MATH domain has been shown to be necessary and sufficient for self-association (homodimerization) and receptor interaction [6,[10][11][12][13][14].
Many MATH domain proteins that are found in humans [15][16][17] and mammals [2,8,18] are mainly involved in individual development and cell growth, differentiation, and aging as molecular adaptors. For example, TRAF6 is a critical factor for dendritic cell maturation and development [19][20][21]. Recently, this type of MATH proteins are also found in plants (Arabidopsis, Medicago, rice), in lower eukaryotes (Trypanosoma, Plasmodium) and in lower metazoa (C. elegans) [22].
MATH domain containing proteins are found usually associated to a discrete set of other protein domains, including peptidases, filamin and RluA domains, broad-complex, tramtrack and brie-a-brae (BTB) domain, tripartite motif (TRIM), astacin domain and RING and Zinc finger domains [9]. The BTB domain is an evolutionarily conserved domain broadly distributed in eukaryotes [23,24]. At least 76 BTBdomain proteins exist in Arabidopsis belonging to 11 major families [25]. Proteins carrying both BTB and MATH motifs are common in plants. For example, there are at least 74 likely functional MATH-BTB genes in rice, but also at least another 40 MATH-BTB pseudogenes [26], whereas, only six members (AtBPM1~6) are annotated in Arabidopsis [27][28][29]. The MATH domain of BPM (BTB/POZ-MATH) proteins is used to assemble with members of the ethylene response factor⁄Apetala2 (ERF/AP2) transcription factor class [30]. In addition, MATH-BTB proteins may directly interact with and target the homeobox-leucine zipper (HD-ZIP) transcription factor ATHB6 (Arabidopsis thaliana homeobox gene 6) for proteasomal degradation [31].
Brassicaceae-specific PSV-embedded proteins (BPEPs) have an N-terminal signal peptide and two tandem MATH domains and are localized in PSVs (protein storage vacuole), and at least one BPEP is tightly associated with the phytate contained within PSV globoids [32]. Additionally, six MATH domain proteins were identified by SSH in Brassica napus, among them four proteins are involved in lipin metabolism and two proteins are related to sugar, and all have two MATH domains [33].
Phylogenetic analysis of MATH domain proteins has proven helpful as a guide for genetic and molecular studies of this large family of proteins. Zapata and colleagues reported that MATH domain proteins can be divided into the USP7 family, the MATHd-only family, the MATHd/BTB family and the MATHd/Filament family in Brassica, but rice lacks the MATHd/Filament family; and the TRAF, TRIM37 and Meprin families are only found in animals [9]. Phylogenetic and domain organization analysis may infer to putative functions, for example, the USP7 class containing a UBP domain at the C-terminal side of MATH domains has ubiquitin proteases (UBPs) activity [9].
Here, we identified and analyzed MATH-domain proteins from Brassica rapa whole genome sequence [34], together with most of Arabidopsis thaliana, Oryza sativa and Ostreococcus tauri [35][36][37], and also some from animals whose functions are known [9]. Our data provide a platform for future functional characterization of these genes in Brassica species.

Protein Domain Organization Analysis
The protein sequences were analyzed for domain organization using NCBI-CD searches (http://ncbi.nlm.nih.gov/ Structure/cdd/wrpsb.cgi). The low-complexity filter was turned off, and the Expect value was set at 1.0 to detect short domains or regions of less conservation in this analysis. Domains were also verified and named according to the SMART database (http://smart.embl-heidelberg.de/).

Phylogenetic Analysis
Multiple sequence alignments of MATH-domain proteins sequences were performed using the Clustal W program [38]. The full-length proteins were subjected to phylogenic analysis using the MEGA5.1 program [39]. The trees were constructed with the following settings: Tree Inference as Neighbor-Joining; Include Sites as pair wise deletion option for total sequences analysis; Substitution Model: Poisson correction; and Bootstrap test of 1000 replicates for internal branch reliability [40].

qRT-PCR Analysis
B. rapa plants were grown at 18-22°C under a 12h light (10,000 Lx)/12h dark photoperiod. Leaves were collected from 20, 30, 35 days-old plants after artificial pollination; roots, stems and flower buds were collected from blossoming plants. Total RNA was extracted using Trizol Reagent (Invitrogen, USA) from about 100 mg of collected plant tissue. The RNA preparation was then treated with DNaseı (Promega, USA) for 30 min at 37°C, followed by enzyme inactivation by incubation at 65°C for 5min. First strand cDNA was made using an RT-PCR Kit (RevertAid™ First Strand cDNA Synthesis Kit, Fermentas, CA). The RTsolution with first strand cDNA was stored at -80°C [40]. Fluorescence-based quantitative-PCR was performed used SYBR Premix Ex Tag Reagent (Takara, Japan). Primers used for Q-PCR reactions are listed in (Table 1). Conditions for the Q-PCR reactions were as follows: 95°C for 10 min; then 40 cycles of 95°C for 15 s, 60°C for 30 s, and 72°C for 30s (AB, USA).

Identification and Annotation of MATH-Domain Proteins from the Brassica rapa Genome
Using BLASTp and tBLASTn bioinformatic tools, we identified 90 genes encoding different MATH-domain proteins from the B. rapa genome (http://brassicadb.org/brad, Table 2) compared to 63 genes in Arabidopsis (NCBI), 36 genes in rice (NCBI), only 2 genes in O. tauri database (http://genome.jgi-psf.org) and 16 genes from different animals ( Table 2, Supplement 1). There are ten chromosomes in Brassica rapa, just twice to Arabidopsis. It suggested that genes encoding MATH domain proteins duplicated or lost during chromosome duplication. Those genes disperse to all ten chromosomes in B. rapa ( Table 2), except the localization of Bra029646, Bra040903 and Bra040904 are still be uncertain. Ch3 and Chr9 have much more MATH domain encoding genes than the others, and both have 16 genes (Table 2).

Phylogenetic Analysis of MATH-Domain Proteins
To examine the phylogenetic relationships among B. rapa MATH-domain proteins and group them within the established classes, we subtracted all MATH domain proteins from several species, including 63 proteins from A. thaliana, 36 proteins from O. sativa, 2 proteins from O. tauri, We also included the TRAF1~6 proteins, TRIM37, meprin A and B, SPOP and HAUSP from Human, MuBM-90, MmSPOP and MmPOZ4 from Mouse, SPOPL from Gallus gallus, BL2960 from Cryptococcus neoformans and USP7 from Drosophila melanogaster. According to the latest report, proteins encompassing MATH domains and their association with other protein domains are grouped in 8 families, and 4 families in Brassica [9]. However, based on our analysis, MATH domain proteins in B.rapa were divided into 6 classes, and have not TRAF and Meprin family mumbers ( Fig. 1, Fig. 2 and Supplement 2). The last 3 classes belong to a branches group in the tree, but had much difference in domain organization.

Conserved Domains in MATH-Domain Proteins
Additionally, we analyzed the architecture of 90 Brassica MATH domain proteins and found that they can be classified into six families based on existence of the other conserved domains combined with sequence similarity (Fig. 2). Class-, including 30 MATH domain proteins, only has two tandem MATH domains except Bra001787 whose domain organization is similar to Class-including longer sequence and more MATH domains. But alignment results showed that Bra001787 is much similar to Class-I member (Supplement 3), and 74% amino acid identity with Bra040904, 25% and 27% sequence identities with Class-III member Bra007802 and Bra034251 which also have 4 MATH domains, respectively. Therefore, we classified it into Class-, suggesting that Bra001787 might originate from Bra040904 by duplication of MATH domain tandem. All the 10 members of Classhave MATH and BTB domain, and some also have another BACK domain in the C-terminus. Class-was fallen into a branch which Arabidopsis proteins had Filament domain at the C-terminus [9], but this domain couldn't be de-tected in B. rapa and Arabidopsis in NCBI and SMART. Class-contains 37 members, and is the largest class. Besides the MATH domains, some proteins have another pearl1-4 domain and Bra003329 has another Ribosomal domain in the C-terminus. But in this class, 35 proteins only contained one MATH domain. In rice, MATH only domain class is the largest family and contains about 50% MATH domain proteins [26]. Taken together, MATH only proteins have the largest members. Class-proteins are larger than the members in Class-I to IV except that Bra023279 only have a MATH domain. Both Class-and Class-classes belonging to same branch (Fig. 1) and having similar MATH domain may have some uncertain relation. Class-numbers are also longer than first four classes and all have MATH, UBQ and Peptidase domains. Previous researches suggested that the last 2 classes have ubiquitin proteases (UBPs) activity and may be derived from a common ancestor, but some sequences lacked the ubiquitin protease domain during evolutionary process [9].

Expression Analyses of MATH Proteins Encoding Genes in Seed Development
SSH libraries from 20 /30 days seed development of Brassica napus were constructed [33], and Bra020541 is homolog to EST T200139, which was screened in the 20day-old seed SSH library, and was weakly expressed in root, stem, and leaf, but highly in young flower bud. During seed development, the expression level decreased more dramatically in seed at the 20-day than that at 35-day. Bra001786, Bra001787, Bra040904 and Bra035787 are all homolog to EST T350008, and Bra036572 is homolog to EST T350054 in the 30-day-old seed SSH library, respectively [33]. Expression of Bra001786 or Bra036572 was only detected in seed and gradually increased during seed development. Bra001787 had stronger expression in leaf and flower bud than in seed, and no expression in root and stem, and the expression level in seed development reached the peak at 30 days. Bra035787 was highly expressed in root, leaf, flower bud and seed, but less in stem, and the expression level increased with seed development process. Bra040904 was expressed in all detected organs, and much higher in flower bud and stem than in root and leaf, and the expression level increased and reached the top in seed at 30 days during seed development (Fig. 3). Among those genes, Bra036572 expressed much higher in seed at 30 and 35 days than other genes, especially Bra020541.
According to expression of genes in seed development process, there are three expression patterns. Bra020541 expresses in early stage of seed development. Bra001786 and Bra036572 have much similar expression pattern: the expression level increases with seed development process, reaches the top in seed at 35 days, and undetectable expression in root, stem and flower bud. Similarly, Bra03578 expresses much higher in seed than other organs, and also reaches the top in seed at 35days, suggesting that Bra001786, Bra036572 and Bra03578 are late developmental stage seed genes. Bra001787 and Bra040904 both express higher in seed at 30 days, suggesting that they may be the regulation genes at middle stage of seed development.    Bra001426  AT3G11910  3339  1112  8265  32  3   Bra005900  AT5G06600  3348  1115  8460  32  3   Bra009210  AT5G06600  3303  1100  7823  31  3   Bra028727  AT5G06600  3417  1138  8289  32  2   Bra034802  AT3G11910  3348  1115  8732  32  5   Bra038685  AT3G11910  3309  1102  7657  31  1 Data coming from web http://brassicadb.org/brad/. UN=uncertainly.

DISCUSSION
MATH domains seem to be very important for the regulation of protein processing [5,6,[9][10][11][12][13]. Except class-and class-, all have additional domains, such as BTB, UBQ, BACK, and GAF domains (Fig. 2). The BTB domain also known as the POZ (for Pox virus and Zinc finger) domain [41], is an evolutionarily conserved domain broadly distributed in eukaryotes [23,24]. Proteins carrying both BTB and MATH motifs are common in plants [23,24,41,42]. Arabidopsis has 6 numbers AtBMP1-AtBMP6, but Brassica has 10 numbers (Fig. 1), and rice has more than 30 numbers. It suggested that those genes divided and evolved after speciation. Ubiquitin domain proteins (UDPs) and ubiquitin-like (UBL) domain proteins belong to a diverse group of proteins which are characterized by an integral UBQ or UBL domain. The majority of UDPs described so far are components of the ubiquitin system which is crucial for the degradation of most cellular proteins [43,44], In Arabidopsis, the EVE1 protein containing a 52 amino acid ubiquitin domain (UBQ) is involved in the control of inflorescence stem formation [45], and so far 7 UBQ domain proteins have been identified ( Fig. 1 and Fig. 2).
Previous reports have shown that Brassica MATH proteins fall into four classes [9], but we suppose that six classes may be more reasonable according to phylogenetic analysis and specific domain architectures in the whole genome level. Class-MATH-domain-only proteins is a large number of hypothetical proteins containing multiple MATH-domains in tandem except only one in Bra003836. Proteins encompassing a MATH domain and a BTB/POZ domain are broadly represented in eukaryotes (Class-). Many reports showed MATH-BTB proteins interacted with CUL-3 carried had E3 ubiquitin ligase activity [26,31,46,47], and MAB1 exprsssed in the germ lineages and the zygote of maize [48]. Here, ten MATH-BTB domain proteins present in B.rapa, and conserved domain organization suggests that they might have similar function. Comparisons of sequence similarity and synteny of B. rapa and A. thaliana MATH-domain proteins revealed occurrence of recent gene duplication events (http://brassicadb.org/brad/searchSynteny.php). Bra00200, Bra006489 and Bra023700 show syntheny to AtBPM1, but the putative protein of Bra023700 has 170 aa without MATH domain, suggesting that it may shift its function during the evolution, and also fails to be clustered with others; Brao20764, Bra040233 and Bra001186 show syntheny to AtBPM2; Both Bra017053 and Bra000147 show syntheny to AtBPM3; Bra036437 show syntheny to AtBPM4; Bra 006591 and Bra020142 show syntheny to AtBPM5, but no sequence show syntheny to AtBPM6. It suggests MATH-BTB genes diploidization or triploidization or losing in B. rapa. Class-only has two Brassica proteins and two Arabidopsis proteins, and previous research showed that Arabidopsis proteins had FILAMENT domain, but they can't be detected here. Ubiquitin Proteases (UBPs) are also found Fig. (2). Domain architecture of the different classes of Brassica rapa MATH domain proteins. Domains were identified using the Conserved Domain Search service (http://ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) and verified and named according to the SMART database (http://smart.embl-heidelberg.de/). The low-complexity filter was turned off, and the Expect value was set at 1.0 to detect short domains or regions of less conservation in this analysis. Class-I only had two tandem MATH domain except Bra001787; Class-II had MATH and BTB domain; Class-III had Filament domain at the C-terminus in Arabidopsis [9], but this domain couldn't be detected in B.rapa; Class-IV had one or two MATH domains, in addition to that, some proteins also has a pearl1-4 domain; Class-proteins were longer than others, and Classmembers had MATH, UBQ and Peptidase domain. Fig. (3). The expression of encoding MATH domain proteins genes involved in seed development. Expression pattern suggest that Bra001786, Bra03578 and Bra036572 may be seed development special genes, but Bra001787, Bra020541 and Bra040904 may involve in seed and flower organ development. Material: seed, from 20, 25, 30, 35 days young seed after artificial pollination; Roots, from principal root of 15 days seedling; Stems, from young stem of 15 days seedling; Leaf, from young leaf of 15 days seedling; Bud: from young flower bud, length 0.5cm. in plants (Arabidopsis, Oryza) and in Metazoa [9]. Except that Bra003330 and Bra003331, Class-proteins have only one MATH domain, and about half of them as well as have unknown function PEARLI-4 domain (Arabidopsis phospholipase-like protein). AtRTM3 belonging to Class-, is the first biological function identified in a resistance mechanism in plant, encodes a MATH and CC domain protein [5], and restricts plant viruses long distance movement [49], and its syntheny gene Bra014574 may have similar function. All MATH domains in Class-are localized at the N-terminal except Bra030564 in the middle. Interestingly, Class-has 7 putative proteins with UBQ domain which have ubiquitin proteases (UBPs) activity and Peptidase-C19 domain.
MATH domain proteins are found as cytosolic signaling molecules in animals [3]. Less plant MATH domain proteins are identified, such as, RTM3 (belong to Class-IV) can restricts plant viruses long distance movement [5,49], and MATH-BTB domain proteins (Class-II) directly interact with and target transcription factor ATHB6 for proteasomal degradation [31]. Our previous research showed that 6 MATH domain proteins are involved in seed development of Brassica nupas [33], here 6 B.rapa homologous genes Bra001786, Bra001787, Bra020541, Bra036572, Bra035787 and Bra040904 are identified and all have only MATH domain and belong to Class-. The expression of Bra020541, homolog to EST T200139 in 20-day-old seed SSH library, Bra001786, Bra001787, Bra040904, Bra035787, homolog to EST T350008 and Bra036572, homolog to EST T350054 in 35-day-old seed SSH library [33]. The expression pattern suggest that Bra001786, Bra03578 and Bra036572 may be seed development specific genes, but Bra001787, Bra 020541 and Bra040904 may be involved in seed and flower organ development, indicating that MATH domain proteins may have common and/or separate functions during the evolution.