Complete genome sequence of Desulfomicrobium baculatum type strain (XT)

Desulfomicrobium baculatum is the type species of the genus Desulfomicrobium, which is the type genus of the family Desulfomicrobiaceae. It is of phylogenetic interest because of the isolated location of the family Desulfomicrobiaceae within the order Desulfovibrionales. D. baculatum strain XT is a Gram-negative, motile, sulfate-reducing bacterium isolated from water-saturated manganese carbonate ore. It is strictly anaerobic and does not require NaCl for growth, although NaCl concentrations up to 6% (w/v) are tolerated. The metabolism is respiratory or fermentative. In the presence of sulfate, pyruvate and lactate are incompletely oxidized to acetate and CO2. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a member of the deltaproteobacterial family Desulfomicrobiaceae, and this 3,942,657 bp long single replicon genome with its 3494 protein-coding and 72 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.


Introduction
Strain X T (DSM 4028 = CCUG 34229 = VKM B-1378) is the type strain of the species Desulfomicrobium baculatum, which is the type species of the genus Desulfomicrobium. Strain X T was first described as Desulfovibrio baculatus by Rozano-va and Nazina [1,2], and later transferred to the novel genus Desulfomicrobium (currently containing seven species) [3] (Figure 1) because several phenotypic traits were not consistent with the definition of the genus Desulfovibrio. In 1998 the species epithet was corrected to D. baculatum [4]. Three accompanying strains have been described in addition to strain X T : Strain H.L21 (DSM 2555) was isolated from anoxic intertidal sediment at the Ems-Dollard Estuary, Netherlands (16S rRNA gene accession AJ277895) [5], strain 5174 (DSM 17142) was isolated from a forest pond near Braunschweig, Germany (16S rRNA gene accession AJ277896) [6], and strain 9974 (DSM 17143) was isolated as a contaminating chemotrophic bacterium from a culture of a green sulfur bacterium designated 'Chloropseudomonas ethylica' N2 [6]. These strains were tentatively affiliated with the species D. baculatum based on some phenotypic traits. Although 16S rRNA gene sequence data are now available for two strains, a definitive affiliation of strains to the species Desulfomicrobium requires supplementary DNA-DNA hybridization experiments due to the observed high similarity values of 16S rRNA gene sequences among distinct species of this genus [7]. Other isolates and clones related to the species were isolated from production waters of a low-temperature biodegraded oil reservoir in Canada [8], and wastewater from penicillin G production in China (clone B19 EU234202). Screening of environmental genomic samples and surveys reported at the NCBI BLAST server indicated no closely related phylotypes that can be linked to the species. Here we present a summary classification and a set of features for D. baculatum strain X T (Table 1), together with the description of the complete genomic sequencing and annotation.

Classification and features
Cells of D. baculatum strain X T are short rods with rounded ends of 0.6 x 1-2 µm ( Figure 2). Cells stain Gram-negative, are motile by a single polar flagellum, and do not form endospores. The metabolism is strictly anaerobic and can be respiratory or fermentative [3,9]. Temperature range for growth is 2-41°C (optimum 28-37°C) and NaCl concentrations of 0-6% (w/v) are tolerated (optimum 1% w/v). Sulfate, sulfite and thiosulfate are used as electron acceptors and are reduced to H2S. Nitrate is not reduced. Simple organic compounds are incompletely oxidized to acetate [3]. Malate, fumarate and pyruvate can be fermented with succinate and acetate as end products. Carbohydrates are not fermented. Vitamins are not required for growth [3]. D. baculatum strain 9974 (DSM 1743) is also able to use ethanol as a substrate [10] and sulfur as an electron acceptor [6]. The use of ethanol as an electron donor for sulfate respiration depends on supplementing the medium with the trace elements tungstate or molybdate [10]. Sulfate uptake in symport with sodium ions has been shown in strain 9974, unlike in other fresh water sulfate reducers which use protons [11]. Distinctive features of D. baculatum strain X T are: (i) NaCl is not required for growth [3], (ii) fermentation of fumarate and malate to succinate and acetate is preferred against utilization of these substrates as electron donors for sulfate reduction [9], (iii) sulfur is not used as an electron acceptor and (IV) molecular nitrogen can be assimilated [3]. A desulfoviridin-type dissimilatory sulfite reductase, which is a hallmark feature of the genus Desulfovibrio, is absent in strain X T , however a sulfite reductase of the desulforubidin-type was reported for strain 9974 [12]. Cells of D. baculatum strain X T contain c-and b-type cytochromes [3]. The tetraheme cytochrome c3 of strain 9974, which is thought to play a role in sulfur reduction and the coupling of electron transfer to hydrogenases, has been analyzed in some detail using advanced biophysical methods [13][14][15]. Strain 9974 also contains several distinct [NiFeSe] hydrogenases that are located in different cellular compartments [16]. The crystal structure of the periplasmic [NiFeSe] hydrogenase of this strain has been determined [17] and it is proposed that the selenium ion in the active center plays a role in the oxygentolerant hydrogen production of this enzyme, which distinguishes it from most [NiFe] hydrogenases [18]. An active selenocysteine system for usage of the 21 st amino acid has been studied in detail for D. baculatum strain 9974 [19][20][21]. Pyridoxal-5'-phosphate, the prosthetic group of selenocysteine synthases, is bound to a distinct lysine residue (Lys295) within the active site of the enzyme of this strain [20]. Figure 1 shows the phylogenetic neighborhood of D. baculatum strain X T in a 16S rRNA based tree. Analysis of the two 16S rRNA gene sequences in the genome of strain X T indicated that the two genes are almost identical (1 bp difference), and that both genes differed by one nucleotide from the previously published 16S rRNA sequence generated from DSM 4028 (AJ277894).  [22,23] of the 16S rRNA gene sequence under the maximum likelihood criterion [24]. The tree was rooted with all members from the Desulfonatronaceae, another family in the order Desulfovibrionales. The branches are scaled in terms of the expected number of substitutions per site. Numbers above branches are support values from 1,000 bootstrap replicates if larger than 60%. Strains with a genome-sequencing project registered in GOLD [25] are printed in blue; published genomes in bold.

Chemotaxonomy
The cellular fatty acid patterns of D. baculatum strain X T and the accompanying strains 5174, 9974 and H.L21 [26] were found to be dominated by anteiso-(ai) and iso-methyl branched unsaturated and saturated fatty acids. The most abundant fatty acid is iso-17:1 cis7 (24.2-28.6%), followed by 18:1 cis11 (6.4-12.2%), iso-15:0 (8. Branched chain, hydroxylated fatty acids are also present, 3-OH iso-15:0 (1.4-2.4%), 3-OH ai-15:0 (0.7-1.2%), and 3-OH iso-17:0 (1.2-2.2%), which may be derived from a lipopolysaccharide. The polar lipid composition of D. baculatum strain X T has not been investigated. The respiratory quinone composition of D. baculatum strain X T has also not been investigated, but the presence of MK-6 has been reported in D. macestii and D. norvegicum [7,27]. Altitude not reported Evidence codes -IDA: Inferred from Direct Assay (first time in publication); TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [31]. If the evidence code is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements.

Genome sequencing and annotation Genome project history
This organism was selected for sequencing on the basis of its phylogenetic position, and is part of the Genomic Encyclopedia of Bacteria and Archaea project. The genome project is deposited in the Genomes OnLine Database [25] and the complete genome sequence (CP001629) is deposited in GenBank. Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI). A summary of the project information is shown in Table 2.

Growth conditions and DNA isolation
D. baculatum strain X T (DSM 4028) was grown in DSMZ medium 63 at 30°C. DNA was isolated from 1-1.5 g of cell paste using Qiagen Genomic 500 DNA Kit (Qiagen, Hilden, Germany) with a modified protocol for cell lysis, adding 100 µl lysozyme; 500 µl achromopeptidase, lysostaphin, mutanolysin, each, to standard lysis solution, but reducing proteinase K to 160µl, only. Incubation over night at 35°C.

Genome sequencing and assembly
The genome was sequenced using a combination of Sanger and 454 sequencing platforms. All general aspects of library construction and sequencing performed at the JGI can be found at the JGI website. 454 Pyrosequencing reads were assembled using the Newbler assembler version 1.1.02.15 (Roche). Large Newbler contigs were broken into 4,375 overlapping fragments of 1,000 bp and entered into assembly as pseudo-reads. The sequences were assigned quality scores based on Newbler consensus q-scores with modifica-tions to account for overlap redundancy and to adjust inflated q-scores. A hybrid 454/Sanger assembly was made using the parallel phrap assembler (High Performance Software, LLC). Possible mis-assemblies were corrected with Dupfinisher or transposon bombing of bridging clones [32]. Gaps between contigs were closed by editing in Consed, custom primer walk or PCR amplification. 731 Sanger finishing reads were produced to close gaps, to resolve repetitive regions, and to raise the quality of the finished sequence. The error rate of the completed genome sequence is less than 1 in 100,000. Together all sequence types provided 37.2 x coverage of the genome.

Genome annotation
Genes were identified using Prodigal [33] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using JGI's GenePRIMP pipeline [34]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, Uni-Prot, TIGRFam, Pfam, PRIAM, KEGG, COG, and In-terPro databases. Additional gene prediction analysis and functional annotation were performed within the Integrated Microbial Genomes (IMG-ER) platform [35].

Genome properties
The genome is 3,942,657 bp long and comprises one circular chromosome with a 58.7% GC content (Table 3 and Figure 3). Of the 3,565 genes predicted, 3,494 were protein coding genes, and 71 RNAs; 58 pseudogenes were also identified. 74.9% of the genes were assigned a putative function while the remaining ones were annotated as hypothetical proteins. The properties and the statistics of the genome are summarized in Table 3. The distribution of genes into COGs functional categories is presented in Table 4.