Mycobacterium tuberculosis Transcriptome Profiling in Mice with Genetically Different Susceptibility to Tuberculosis

Whole transcriptome profiling is now almost routinely used in various fields of biology, including microbiology. In vivo transcriptome studies usually provide relevant information about the biological processes in the organism and thus are indispensable for the formulation of hypotheses, testing, and correcting. In this study, we describe the results of genome-wide transcriptional profiling of the major human bacterial pathogen M. tuberculosis during its persistence in lungs. Two mouse strains differing in their susceptibility to tuberculosis were used for experimental infection with M. tuberculosis. Mycobacterial transcriptomes obtained from the infected tissues of the mice at two different time points were analyzed by deep sequencing and compared. It was hypothesized that the changes in the M. tuberculosis transcriptome may attest to the activation of the metabolism of lipids and amino acids, transition to anaerobic respiration, and increased expression of the factors modulating the immune response. A total of 209 genes were determined whose expression increased with disease progression in both host strains (commonly upregulated genes, CUG). Among them, the genes related to the functional categories of lipid metabolism, cell wall, and cell processes are of great interest. It was assumed that the products of these genes are involved in M. tuberculosis adaptation to the host immune system defense, thus being potential targets for drug development.


INTRODUCTION
Genome-wide expression studies generate enormous amounts of data and have a virtually boundless potential for application. For instance, the data obtained in such experiments can be used to assess the effectiveness of antibiotics or to study changes in the bacterial metabolism during the course of an infection. Among the infections caused by pathogenic bacteria, tuberculosis ranks first in terms of mortality rate and is responsible for 1.5 million deaths annually. It is not surprising that the first study of the transcriptome of its causative agent, Mycobacterium tuberculosis, was conducted within a year after the publication of the whole-genome sequencing data for this bacterium [1,2]. Five years later, a large number of studies describing the results of using microarrays for the transcriptome analysis of mycobacteria under various conditions were published [3,4]. nevertheless, although microarrays and massively parallel sequencing have been widely used for microbiological studies and adapted to the study of whole-genome expression in mycobacteria, the majority of such studies have been carried out in vitro. Meanwhile, it is rather difficult to carry out an analysis of gene expression in mycobacteria during the progression of tuberculosis in vivo, which is of the greatest research interest [5,6]. the results of a study of gene expression in M. tuberculosis during experimental infections in two strains of mice differing in susceptibility to tuberculosis are presented in this work. Since the genotypic differences of animals directly affect gene expression of the pathogen, an increase in the expression of a number of bacterial genes under more unfavorable conditions (in the organism of a host resistant to the infection) attests to the fact that these genes significantly contribute to the adaptation of M. tuberculosis to the host's defense mechanisms.

EXPERIMENTAL PROCEDURES
Experimental infection and RNA extraction I/StSnegYcit (I/St) and c57BL/6Ycit (B6) mice were kept under standard conditions in the vivarium of the central research Institute of tuberculosis, russian Academy of Medical Science, in accordance with Order of the uSSr Ministry of Health № 755 dated August 12,1977, and the nIH Office of Laboratory Animal Welfare Assurance № A5502-11. the mice had ad libitum access to food and water. All the experimental procedures were approved by the Bioethics committee of the central research Institute of tuberculosis.
Female mice of both strains aged 2.5-3.0 months were infected with the virulent M. tuberculosis strain H37rv using an inhalation exposure system (Glas-col, terre Haute, uSA) at a dose of 100-200 cFu/mouse. the infected animals were euthanized 4 and 6 weeks after their infection. their lung tissues were immediately used to extract rnA. total rnA was extracted using the SV total rnA Isolation System (Promega, uSA). the rnA samples were then treated with DnAse I (MBI Fermentas, Lithuania) to remove trace amounts of DnA.

Synthesis of cDNA
cDnA was synthesized using the template-switching effect (clontech, uSA) according to the procedure described in [7]. cDnA was synthesized using PowerScript II reverse transcriptase (clontech, uSA) following the manufacturer's protocol. the oligonucleotide primers Br (5'AAGcAGtGGtAtc AAcGcAGAGtAc(n)9) and SMArt (5'AAGcAGtGGtAtcAAcGcAGAG-tAcGcrGrGrG) were added into 2 µg of total rnA from each sample in 11 µl of a buffer solution. the resulting mixture was incubated for 2 min at 70°С and placed on ice for 10 min. the mixture kept in the ice bath was supplemented with 11 µl of a solution containing 4 µl of a 5× reverse transcriptase buffer, 2 µl of a solution of 100 mM DDt, 2 µl of a solution of 10 mM of each dntP, and 1 µl (200 Au) of Pow-erScript reverse transcriptase (clontech, uSA). the control reaction (rt-) with no reverse transcriptase added was conducted simultaneously with the reverse transcriptase reaction (rt+). the reaction mixtures rt+ and rt-were incubated at 37°С for 10 min and for an additional 40 min at 42°С. 30 cycles of Pcr (95°c 20 s; 64°c, 20 s; 72°c, 2 min) with the 5S primer (5'GtGGtAtcAAcGcAGAGt) were used to amplify the synthesized cDnA. the amplified cDnA was puri-fied using the QIAquick Pcr Purification kit (Qiagen, uSA).
Coincidence cloning coincidence cloning was performed according to [8]. the genomic DnA of M. tuberculosis H37rv and the total cDnA samples (i.e., cDnA synthesized using total rnA as a template) were digested with the restriction endonucleases rsaI and AluI (MBI Fermentas, Lithuania). the suppression adapters were ligated to the resulting fragments of the genomic DnA and cDnA (adapter structures were given in [7]). the mixture containing 100 ng of the sample of genomic DnA with adapters and 100 ng of the corresponding sample of cDnA fragments with adapters in 2 µl of the hybridization buffer HB (50 mM HePeS, pH 8.3; 0.5 M nacl; 0.02 mM eDtA, pH 8.0) was incubated at 99°c for 5 min (denaturation) and subsequently at 68°c for 18 h (re-naturation). next, the reaction mixture was supplemented with 100 µl of the hybridization buffer HB preliminarily heated to 68°c. 1 µl of the resulting mixture was used as a template in a two-step Pcr. the first Pcr step was performed in a reaction volume of 25 µl containing 10 pmol of the outer primer t7. After the mixture was preincubated at 95°c for 5 min, 20 amplification cycles were performed (94°c, 30 s; 66°c, 30 s; 72°c, 90 s). the second amplification step was carried out using inner primers (94°c, 30 s; 68°c, 30 s; 72°c, 90 s; 25 cycles) [7]. the tenfold diluted amplicon obtained in step 1 was used as a template for the second amplification step. the amplicon obtained in step 2 was purified using the QIAquick Pcr Purification kit (Qiagen, uSA) and used for massively parallel sequencing.

Massively parallel sequencing
Prior to sequencing, equal amounts of each of the coincidence cloning products (500 ng of each amplicon) were combined to obtain a single sample. Massively parallel pyrosequencing of the sample was performed on the GS FLX genetic instrument (454 roche, Germany). the resulting sequences (reads) were tested for the presence of adapter sequences. the reads with truncated, incorrect, or missing adapter sequences were eliminated from further analysis. the remaining sequences (190, 031 reads) were divided into three groups (libraries) depending on the nucleotide sequence of their adapters; СС6(SuS) represented transcriptomes of M. tuberculosis from the lung tissues of I/St mice on week 6 post-infection; (СС4(reS) and СС6(reS) represented transcriptomes of M. tuberculosis from lung tissues of B6 mice on weeks 4 and 6 post-infection, respectively. A file in FAStA format, which contained nucleotide sequences from all three groups, was used for further analysis.
the authors can provide the FAStA files upon request.
Sequence mapping and statistical analysis the resulting sequences were mapped onto the genomic sequence of M. tuberculosis H37rv (assembly GenBank AL123456.2) using the blastn tool from the ncBI BLASt+ software package with the parameters set as follows: -perc identity 95 and -evalue 0.01. the sequences with at least 95% identity to the genome sequence of M. tuberculosis H37rv for the entire length were eventually selected. All the sequences shorter than 40 nucleotides were eliminated from processing. the sequences mapped onto the M. tuberculosis genome in a non-unique manner (at two sites and more) were also eliminated from further analysis. the reads mapped onto the intergenic sequences were not eliminated from further analysis, since it would have changed the library size and caused errors during the subsequent statistical analysis. next, the number of reads in each library was determined for each gene and intergenic region, and the comparative analysis of the abundance of cDnA fragments corresponding to the bacterial genes and intergenic regions was conducted using the Audic-claverie algorithm [9]. the differences in the expression of the genes (intergenic regions) were considered to be significant if the number of reads mapped onto the gene (intergenic region) in at least one of the two libraries being compared was no less than 20 and p < 0.05.

RESULTS AND DISCUSSION
In order to reveal the features of the expression profile of M. tuberculosis that correlate with infection progression, comparative quantitative and qualitative analyses of the sequences transcribed in infected mice, genetically susceptible (inefficient immune response), and re-sistant (efficient response) to these bacteria were conducted at different stages of the infectious process.
We have compared the transcriptomes of M. tuberculosis H37rv in infected mice of two strains, I/StSn-egYcit (I/St) and c57BL/6Ycit (B6). these mouse strains had been thoroughly described earlier [10]. In B6 mice, resistance to M. tuberculosis is higher as compared to I/St mice, which manifests itself in the less aggressive course of the infectious process in B6 mice and the longer survival of the infected animals.
Female mice of both strains were euthanized 4 and 6 weeks after they had been aerogenically infected with M. tuberculosis bacteria to isolate the total rnA from their lungs. the total rnA samples from the lung tissues in I/St and B6 mice were used to synthesize cDnA subsequently enriched in bacterial cDnA fragments via coincidence cloning [8]. A total of three sequence libraries representing M. tuberculosis transcriptomes were obtained from the tissues of I/St mice on week 6 after the infection (СС6(SuS)) and the tissues of B6 mice on weeks 4 and 6 after they had been infected (СС4(reS) and СС6(reS), respectively).
the nucleotide sequences of the cDnA fragments of these libraries were determined using 454 massively parallel pyrosequencing. the general scheme of the experiment is shown in Figure; the general characteristics of the analyzed libraries are listed in Table 1. the nucleotide sequences of 190, 031 cDnA fragments were identified. Among them, 73, 410 sequences were from the cc4(reS) library; 75, 655 were from the cc6(SuS) library; and 40, 966, from cc6(reS). the resulting sequences were mapped onto the genomic sequence of M. tuberculosis H37rv (Assembly GenBank AL123456.2) using the blastn command line tool from the ncBI BLASt+ software package.
Noncoding RNAs the recent studies employing the methods of massively parallel sequencing have demonstrated that the bacterial transcriptome is much more complex than it was thought to be. the noncoding rnAs have been shown to be an important part of the transcriptome and to regulate various cell functions (replication, energy metabolism, and regulation of the expression of virulence factors in a number of pathogenic bacteria) [11].
We have searched for transcripts from the loci lying in the intergenic regions, since such localization attests to the fact that these transcripts have the potential to belong to the class of noncoding rnAs. Based on data on the structural organization of the genome, we have identified 3,069 intergenic regions. the analysis of pyrosequencing data showed that the transcripts of 164 (5.3%), 221 (7.2%), and 356 (12.3%) intergenic regions were presented in the cc4(reS), cc6(SuS), and cc6(reS) samples, respectively. . It should be mentioned that transcripts from 27 (0.9%) of the intergenic loci were present in each of the three samples, while transcripts from 2,490 (81.1%) loci have not been observed.
the experimental data on the expression of intergenic regions were compared with the available literature and database information on the localization of small rnA genes.
A significant expression of sequences from a number of intergenic regions (in particular, from IGr3987 and IGr0629 in sample cc6(SuS) and from IGr1186 in sample cc6(reS)) was detected. the experimental results confirmed the presence of expression from the intergenic regions IGr3987, IGr0629, and IG1136, which correlates with the data obtained by Arnvig et al. [12] (according to the denotations used in [12], the intergenic regions IGr2975, IGr0479, and IGr0858, respectively). We believe that the detected expression of intergenic sequences can attest to the potential localization of small rnA genes in these loci. With allowance for their differential expression, it could be an indication of the compensatory response of a pathogen to external factors.

Genes whose expression increases with infection progression
We compared the transcriptomes of M. tuberculosis during the infection progression in the genetically resistant to tuberculosis mouse strain (СС6(reS) and cc4(reS)) and at a single time point in the genetically different mouse strains (СС6(reS) and cc6(SuS)). the comparison was aimed at searching for genes whose expression increases with infection progression (i.e., in B6 mice on week 6 after the infection) as compared to the other conditions. the comparison of СС6(reS) and cc4(reS) revealed 226 genes whose expression increased with infection progression in the tissues of B6 mice. the comparison of СС6(reS) and cc6(SuS) revealed 253 genes with increased expression in the cc6(reS) sample. the comparison of cc6(reS) and cc4(reS) revealed only 17 genes whose expression in cc6(reS) was higher than that in cc4(reS), while the expression of 44 genes in cc6(reS) was higher as compared to cc6(SuS). these results presumably demonstrate that the first comparison characterizes the dynamics of the changes in pathogen gene expression with time within a single micro-environment. In the latter case, the differences between two different micro-environments are revealed, which affects a larger number of genes whose expression increases in cc6(reS).
the genes whose expression in the cc6(reS) sample is increased only as compared to cc4(reS) mostly fall into the following categories: cell wall and cell processes, intermediary metabolism and respiration, and lipid metabolism. the protein products of 12 out of 17 genes were previously detected in the fraction containing the cellular membrane and/or cell wall, where they predominantly play roles in transport and protection. For instance, the embA gene encodes indolylacetylinositol arabinosyltransferase embA, which participates in arabinan synthesis. Mutations in this gene cause resistance to ethambutol. the Rv3273 gene encodes carbonate dehydratase, which is involved in sulfate transport (tubercuList). An analysis using the KeGG Pathways (http://www.genome.jp/kegg/pathway.html) and tB-cYc (http://tbcyc.tbdb.org/) databases has revealed no metabolic pathways that would be activated at the later stages of infection progression. this fact can either result from random fluctuations in pathogen gene expression, its response to random changes in the properties of the micro-environment, or it can be an indication of small, but significant changes in the functional activity of M. tuberculosis at different time points. the comparison of the cc6(reS) and cc6(SuS) samples has revealed that cc6(reS) contains a larger number of genes whose expression is higher in this sample as compared to that in cc6(SuS). the increased energy metabolism manifested itself as increased expression of the genes encoding three nADH-dehydrogenase subunits (nuoH, nuoI, nuoL); a higher activity of the tricarboxylic acid cycle (acn); and as increased expression of the Rv1916 gene. Rv1916 is the second component of the aceA (icl2) gene, which is divided in the genome of M. tuberculosis H37rv into two individually expressed modules, Rv1915 and Rv1916 (aceAa and aceAb). the other important differences include the increased expression of the genes whose products are responsible for the metabolism and catabolism of lipids and amino acids (lipV, lipF, Rv2531c), as well as the enzymes that participate in DnA repair (recO, recB). this is a rather predictable pattern, since the microenvironment of the resistant host is a hostile habitat, which explains the demand for higher activity from the repair systems. the increased expression of lipolytic enzymes (lipF, lipV, plcA), enzymes of the tricarboxylic acid cycle, and aceAb may attest to the fact that a greater amount of lipids is used as a source of energy and carbon.
We focused on searching for M. tuberculosis genes whose increased expression does not depend on the genetic features of the host organism. these genes form a certain basic set, whose genes are responsible for the universal compensatory response of the pathogen to adverse environmental conditions. these genes are referred to as commonly upregulated genes (cuG): a total of 209 genes whose expression was increased in both comparisons ( Table 2). According to the results of transposon mutagenesis, 44 genes of M. tuberculosis H37rv are essential [13]. rv3569c, rv3537, and rv3563 were earlier shown to be essential for survival of M. tuberculosis in mouse macrophages (tubercuList, http://tuberculist.epfl.ch).
We have grouped the GuG genes into functional categories (tubercuList) and compared their distribution to that of all the M. tuberculosis genes. these distributions show a general similarity, with the exception of the genes that belong to the lipid metabolism category, which can further demonstrate their importance for the processes of bacterial adaptation. two categories (conserved hypotheticals (59 genes) and unknown (2 genes) contained slightly less than a third of all the genes. Despite the absence of any known functions, the genes in this category are potential therapeutic targets, since their low degree of homology with the genes of other microorganisms means that they are typical of mycobacteria (in particular, of M. tuberculosis); thus, they presumably determine their virulence properties.
the high expression level of the genes of various nutrient uptake and accumulation systems (e.g., phosphate (pstS1) and iron (irtA, mbtC, mbtE, mbtF)) attests to the fact that mycobacteria exist under conditions of insufficient nutrient supply. Phosphorus deficiency is also indicated by the increased expression of the senX3 gene, the sensor component of the two-component regulatory system senX3/regX3, which regulates the so-called stringent response under conditions of phosphorus deficiency. the transition to the use of lipids as the main source of energy and carbon is indicated by the expression of lipid metabolism genes (fadD, fadE, lipU, lipJ). Another feature of the genes that belong to the cuG set consists in increased expression of genes whose products are somehow associated with amino acid metabolism (aspC, hisB, thrB, thrS). the reason behind this phenomenon remains unclear, since stimulation of enzyme expression can be caused by both the absence of the required amino acids (and, hence, the demand for their synthesis) and by their presence (and the possibility to use them by bacteria).
the transition to anaerobic nitrate respiration, which is typical of latent infection, is attested by the increased expression of the narH and narK3 genes [14]. We have also classified the atpF and atpH genes as cuG genes, although decreased expression of these genes during infection progression is reported, since the energy demand of a pathogen decreases as it acquires the state of latent infection [15,16].
the function of Pe/PPe proteins is not clear. they are considered to be required to ensure antigenic variability in mycobacteria [17]. nevertheless, the expression levels of the Rv0152c and Rv0355c genes in the cc6(reS) sample are high, and their expression has also been detected in cc4(reS) and cc6(SuS). the Rv3135 gene belongs to the group of essential genes in M. tuberculosis H37rv. All these observations suggest that Pe/PPe genes have some other functions in addition to ensuring antigenic variability.
Finally, the secA2gene is worth mentioning. this gene encodes translocase SecA2, the component of the M. tuberculosis Sec transport system that enables the secretion of superoxide dismutase SodA and catalase KatG. A live vaccine based on the ΔsecA2 mutant of M. tuberculosis has demonstrated high efficiency and safety in tests on animals [18].
A microarray-based comparative analysis of the expression profiles of 17 members of the M. tuberculosis complex in activated and nonactivated murine macrophages has recently been conducted [19]. As a result, 280 genes (168 with universally increased and 112 genes with universally decreased expression) were identified whose changes in expression were independent of the strain and the activation status of a macrophage. We compared the cuG genes with 168 genes from [19] characterized as having universally increased expression and selected eight genes (Rv0140, Rv0145, atsA, Rv2466c, fadD26, ilvC, Rv3067, and kstD) that were featured in both lists. Such an insignificant coincidence can be attributed to the fact that a) the infection of cultured macrophages is a relatively simplified model as compared to the complex relationships between the host cells and the pathogen during the infectious process in the host organism; b) the data obtained by a microarray analysis of expression can significantly differ from those obtained by massively parallel sequencing. In particular, Ward et al. [20] reported a divergence of the results obtained using these two methods. nevertheless, the results obtained by us and Homolka et al. [19] are not widely divergent. A functional analysis of the genes with increased expression has demonstrated that they are associated with intracellular stress factors, such as hypoxia and reactive oxygen and nitrogen species, cell wall remodeling, and fatty acid metabolism. the genes associated with iron deficiency (genes in the mtbA-F cluster) and those involved in the biosynthesis of valine and isoleucine (ilvB-ilvN-ilvC) and phthiocerol dimycocerosates (PDIM) of the cell wall (ppsA-D) can be mentioned as an example.

Products of the CUG family genes as potential therapeutic targets
Six genes which could potentially be used as therapeutic targets (or have already been proposed as such) were selected after a review of the available literature and databases. the products of these genes (hisB, aspC, PPE50, Rv1026, ilvC, and Rv1186c) have been mentioned as attractive therapeutic targets, since the disturbances in their functional activity has the maximum destabilizing effect on the metabolism of M. tuberculosis. thus, for example, aspartate aminotransferase Aspc encoded by the aspC (Rv0337c) gene was identified in the metabolic network of a mycobacterial cell as an enzyme whose inactivation affects a large number of other M. tuberculosis proteins, thus efficiently disintegrating a large number of biochemical cycles [21]. the protein products of two other genes, Rv1186c and PPe50 (Rv3135), have been included in the list of the most promising therapeutic targets, which was compiled based on data on the expression, participation in various metabolic pathways, and structural homology with other bacterial and human proteins [22]. the Rv1026, hisB (Rv1601), and Rv3001c genes are believed [23] to encode products suitable for designing specific inhibitors. the protein encoded by the Rv1601 (hisB) gene is independently considered to be a promising therapeutic target [24]. It should be mentioned that the expression level of the Rv1026 gene encoding pyrophosphatase is increased in the macrophages and lungs of infected mice [25,26], as well as under conditions of inhibited translation in mycobacteria [27]. It has recently been demonstrated that polyphosphate deficiency due to the hydrolytic activity of rv1026 can change the fatty acid content in the cell wall of M. smegmatis, thus affecting sliding motility and biofilm formation [28]. Hence, it can be expected that the products of the remaining cuG genes could be used to design therapeuticals or for diagnosing tuberculosis.

CONCLUSIONS
Infectious diseases caused by intracellular pathogenic bacteria pose a serious medical problem. the progression of the infection depends not only on host defense mechanisms, but also on the specific expression of bacterial genes. changes in the gene expression in response to various reactions of the host immune system are necessary for the survival and reproduction of pathogenic bacteria. the investigation of the changes in the transcription profile of M. tuberculosis in response to various stimuli and external factors allows one to describe the adaptative mechanisms required for an efficient infection of a host organism by a bacterium.
the investigation of the transcription profiles of M. tuberculosis under various conditions made it possible to reveal the gene set (cuG) whose expression increases with infection progression and is independent of the genetic features of the host organism. the expression of genes from this core set can be regarded as the universal response of mycobacteria to various environmental stress factors. Further accumulation and analysis of M. tuberculosis gene expression data will considerably simplify the development of efficient approaches to the diagnosis and treatment of tuberculosis.