Sorting out the mix in microbial genomics

The relatively small number of microbial genomes completed in the past two months (Table 1) includes, however, representatives of two new bacterial phyla, Dictyoglomi and Nitrospirae. To highlight the first genome sequences from these poorly studied taxa, they have been placed in a new section at the top of Table 1.

The relatively small number of microbial genomes completed in the past two months (Table 1) includes, however, representatives of two new bacterial phyla, Dictyoglomi and Nitrospirae. To highlight the first genome sequences from these poorly studied taxa, they have been placed in a new section at the top of Table 1.
So far, little is known about either Dictyoglomus thermophilum or Thermodesulfovibrio yellowstonii. Both are Gram-negative thermophilic heterotrophs with an extremely low (29 mol%) G+C content of their chromosomal DNA (Saiki et al., 1985;Henry et al., 1994). Dictyoglomus thermophilum is an obligate anaerobe that grows optimally at 70-73°C. It was isolated from the Tsuetate hot spring in Japan (Saiki et al., 1985) and used to purify three extremely heat-stable amylases (Horinouchi et al., 1988). Thermodesulfovibrio yellowstonii was isolated from a thermal vent in Yellowstone Lake in Wyoming. It contains c-type cytochromes and grows optimally at 65°C using lactate, pyruvate or formate plus acetate as substrates and can use sulfate, thiosulfate and sulfite as terminal electron acceptors (Henry et al., 1994). Analysis of these genomes should provide a window into the physiology and evolutionary relationships of these new bacterial lineages.
Completion of these two genomes is a major step towards the goal of having at least one complete genome sequence from representatives of all major prokaryotic groups. Indeed, we now have at least one completely sequenced genome for 18 bacterial phyla out of the 24 listed in the taxonomic outline in the socond volume of the Bergey's Manual (Garrity et al., 2004; available at http:// www.bergeys.org/outlines.html). Representatives of five more bacterial phyla are at various stages of genome sequencing: Chrysiogenes arsenatis (phylum Chrysiogenetes), Denitrovibrio acetiphilus (Deferribacteres), Fibrobacter succinogenes (Fibrobacteres), Thermodesulfobacterium commune (Thermodesulfobacteria) and Thermomicrobium roseum (Thermomicrobia, which may be considered a class in the phylum Chloroflexi). Only one of those 24 phyla (Gemmatimonadetes) remains not a subject of any publicly announced genome sequencing project. This is obvious progress in coverage of microbial diversity compared with the status of the genome sequencing just two years ago (Galperin, 2006), However, this nice and clear picture of the bacterial taxonomy and the corresponding genome sequencing efforts is complicated by several different factors. First of all, high-rank bacterial taxonomy is still in the state of flux: new candidate phyla are being identified and new genome sequencing projects are being planned to characterize their representatives. There are ongoing sequencing projects for Lentisphaera araneosa and Thermanaerovibrio acidaminovorans, cultured representatives, respectively, of the recently recognized phyla Lentisphaerae (Cho et al., 2004) and Synergistetes (Aminanaerobia) (Hongoh et al., 2007;Jumas-Bilak et al., 2007). In addition, genomic sequencing is being performed on candidate phyla that were initially deduced based solely on the clustering of 16S rRNA sequences. Examples include a nearly complete genome from the candidate phylum TM7, which still has no cultivated members (Marcy et al., 2007) and two recently sequenced genomes from representatives of the candidate phylum Termite group 1 (TG1), one of which, "Elusimicrobium minutum", was in the meantime successfully cultivated. Sometimes genomic data reveal distant similarities between two or more phyla, which results in their unification into a group (e.g. Bacteroidetes/ Chlorobi, Fibrobacteres/Acidobacteria) or a superphylum, e.g. Chlamydiae/Verrucomicrobia/Planctomycetes/Lentisphaerae (Wagner and Horn, 2006;Hou et al., 2008). Besides, certain validly described bacterial groups still lack any sequence information (Yarza et al., 2008). Finally, there are several alternative classifications of bacteria that made their way into taxonomic literature but, for a variety of reasons, failed to gain acceptance in the community (Gupta, 1998;2000;Cavalier-Smith, 2002;2006). Another such example is the already mentioned (Galperin, 2008) recent transfer of Mollicutes from the phylum Firmicutes into a new phylum Tenericutes in the latest edition of Bergey's Manual (Ludwig et al., 2008). The phylogenetic trees that served as the rationale for that move show numerous inconsistencies and hardly justify the decision to create this new phylum.
It must be noted that back in 1992, Sneath and Brenner stated 'There is no such thing as an official classification' (see http://www.bacterio.cict.fr/Sneath-Brenner.html). This point was recently reiterated by J.P. Euzéby, whose List of Prokaryotic names with Standing in Nomenclature (http://www.bacterio.cict.fr/) includes an up-to-date listing of commonly recognized prokaryotic phyla (http:// www.bacterio.cict.fr/classifphyla.html). For a quick look at the current state of microbial genome sequencing, the easiest tool might be the NCBI's Tax Tree (http:// www.ncbi.nlm.nih.gov/genomes/MICROBES/microbial_ taxtree.html), which lists both completed and ongoing genome sequencing projects. However, for those interested in the emerging microbial diversity, the best source of information is probably the 'greengenes' website (http:// optimum at 63°C. Although C. proteolyticus was initially described as a Gram-negative bacterium and therefore suggested to belong to a deep bacterial lineage, potentially at the phylum level (Rainey and Stackebrandt, 1993), analysis of its 16S rRNA revealed that it is related to Thermoanaerobacter sp. It is currently assigned to the family Thermodesulfobiaceae (Mori et al., 2003) within the order Thermoanaerobacterales and is the first sequenced genome from that family. Phenylobacterium zucineum is an a-proteobacterium recently isolated from a human erythroleukemia cell line (Zhang et al., 2007). All close relatives of P. zucineum are free-living environmental organisms, and its 4.4 Mb genome is much larger than that of any intracellular parasite or symbiont characterized so far. Indeed, the genome sequence (Luo et al., 2008) revealed similarities with the genome of Caulobacter crescentus. However, fragments of P. zucineum genomic DNA were found among the EST libraries from breast cancer and lymphatic cell lines, suggesting that this organism might survive in proliferative tissues.
Acidithiobacillus ferrooxidans (previously known as Thiobacillus ferrooxidans; Kelly and Wood, 2000), is an obligately acidophilic chemolithoautotrophic g-proteobacterium, a popular model organism to study bacterial membrane energetics at acidic pH values (see Ferguson and Ingledew, 2008 for a recent review). It gains energy by oxidizing ferrous iron and is able to grow in the pH range from 1.3 to 4.0 using CO 2 as the sole source of carbon. Acidithiobacillus ferrooxidans is a major component of microbial consortia used in bio-mining to extract copper, zinc and other metals from low-grade ores. With the recent increase in the price of gold, A. ferrooxidansbased microbial consortia are increasingly used to improve recovery of gold from arsenopyrite ores. Despite the importance of this organism (or maybe because of it), sequencing of the A. ferrooxidans genome had a long and convoluted history. The first (incomplete or 'gapped') genome sequence of the type strain A. ferrooxidans ATCC 23270 was produced at the Integrated Genomics in 1999. It consisted of 1353 contigs covering 2611 kb and coding for 2712 proteins; it was estimated to lack~100 kb (Selkov et al., 2000). This sequence was used for an analysis of the amino acid metabolism in A. ferrooxidans, which allowed an almost complete reconstruction of its metabolic pathways, leaving just 10 unassigned (missing) enzymes. Despite the initial intent of the authors to demonstrate that 'gapped' microbial genomes were almost as good as complete ones (Selkov et al., 2000), this paper actually succeeded in proving the opposite: a meaningful and unequivocal analysis is only possible with a complete genome sequence. Furthermore, only small pieces of the genome have been submitted to the GenBank, which prevented others from analysing this genome.
Shortly after that, sequencing of A. ferrooxidans genome has been undertaken at the Institute of Genomic Research (TIGR). The resulting incomplete genome sequence of 3081 kb was made publicly available in 2001 as RefSeq entry NC_002923 and was subsequently used for a variety of genome analyses (e.g. Valdés et al., 2003;Quatrini et al., 2007). Over the next two years, this sequence was updated more than a dozen times and was finally withdrawn at the end of 2003. Since 2006, a complete genome sequence of 2982 kb coding for 3217 predicted proteins has been available on the TIGR website but was not submitted to GenBank. Finally, a recent joint paper by Chilean and TIGR scientists (Valdés et al., 2008) reported a detailed analysis of this genome and its availability to the public.
Meanwhile, JGI scientists have released a 2885 kb genome sequence of another strain of A. ferrooxidans, which encodes 2826 proteins. This strain A. ferrooxidans ATCC 53993 was isolated from mine water of the Alaverda copper deposit in Armenia and initially assigned the name Leptospirillum ferrooxidans (Balashova et al., 1974;Hippe, 2000). Although its relation to the type strain ATCC 23270 is not known at this time, their 16S rRNA sequences are 100% identical. Thus, after 10 years of struggling with unfinished genome sequences, the public now has access to two complete genomes of A. ferrooxidans. This should allow further analyses of the properties of this remarkable organism and stimulate its use in energy research and bio-mining.
The list of completely sequenced spirochaete genomes has grown to include genomes of Borrelia duttonii and Borrelia recurrentis (Lescot et al., 2008). Both organisms are important human pathogens causing relapsing fevers. The first one is transmitted by the tick Ornithodoros moubata and is found primarily in east Africa. Borrelia recurrentis is transmitted by human body lice Pediculus humanus and is found in around the world. The sequenced strain B. duttonii Ly was isolated from a 2-year-old girl with tick-borne relapsing fever in Tanzania, whereas B. recurrentis strain A1 was isolated from an adult patient with louse-borne relapsing fever in Ethiopia.
Klebsiella pneumoniae ssp. pneumoniae is a wellknown human pathogen, and the first genome of its clinical isolate MGH 78578 was sequenced more than two years ago. A very interesting paper from the JCVI scientists now reports the genome sequence of an environmental N 2-fixing strain of K. pneumoniae (Fouts et al., 2008). Such strains are commonly found as endophytes that colonize tissues of rice, maize, sugarcane, banana and various grasses and improve the growth of the host plants by supply them with ammonia. The sequenced strain K. pneumoniae 342 was isolated from maize and later shown to colonize wheat and alfalfa sprouts. Comparative analysis of the two strains pro-vides interesting clues to the adaptation to the endophytic lifestyle, as well as into the evolution of pathogenicity in K. pneumoniae.
The list of organisms with recently sequenced genomes also includes the marine g-proteobacteria Alteromonas macleodii and Vibrio fischeri, d-proteobacteria Anaeromyxobacter sp. and Geobacter bemidjiensis, five new strains of Salmonella enterica ssp. enterica that include four new serovars (Thomson et al., 2008), Streptococcus equi ssp. zooepidemicus (also known as Streptococcus zooepidemicus), the cause of an acute nephritis acute epidemic in Brazil (Beres et al., 2008), and the wellstudied Helicobacter pylori strain G27 (Table 1).