Variations in genome-wide RNAi screens: lessons from influenza research

Genome-wide RNA interference (RNAi) screening is an emerging and powerful technique for genetic screens, which can be divided into arrayed RNAi screen and pooled RNAi screen/selection based on different screening strategies. To date, several genome-wide RNAi screens have been successfully performed to identify host factors essential for influenza virus replication. However, the host factors identified by different research groups are not always consistent. Taking influenza virus screens as an example, we found that a number of screening parameters may directly or indirectly influence the primary hits identified by the screens. This review highlights the differences among the published genome-wide screening approaches and offers recommendations for performing a good pooled shRNA screen/selection.


Introduction
RNA interference (RNAi) is a revolutionary technique for studying the biological functions of a particular gene by silencing its gene expression that can be applied in mammalian systems. The manipulation of gene expression of any particular gene by RNAi provides insight into the genetic networks related to the target gene. Thus far, the main approaches for RNAi screening can be divided into two screening formats, arrayed screen and pooled selection/screen. For arrayed screen, each RNAi reagent is assigned to a unique well in a microplate, and a variety of cell-based assays are performed in the microplate format. After the assay, cells with significant changes can be identified and the related RNAi reagents can be traced by using a database. For pooled selection/screen, all of the RNAi reagents are pooled together and added randomly to cells. There are two strategies for pooled selection: positive selection, which only detects surviving cells and does not require an untreated control, and negative selection, which includes an untreated control for comparison to allow the detection of RNAi reagents that make the cells resistant or sensitive to the selective reagent. The significant RNAi reagents are subsequently deconvoluted by using Next Generation Sequencing (NGS) or barcode microarray.
Influenza virus causes annual epidemics and recurring pandemics which potentially threaten public health and the global economy. Influenza A viruses (IAVs) are enveloped RNA viruses with single-stranded, negative-sense viral RNAs encoding 11 viral proteins [1]. Two of the viral proteins, neuraminidase (NA) and matrix protein 2 (M2), are the targets of the currently-used antiviral drugs. However, the high error rate of the viral RNA-dependent RNA polymerase (RdRp) leads to rapid changes in these two proteins and generation of drug-resistant influenza viruses. It has been widely recognized that the replication of influenza virus relies on host factors and cellular machinery to complete its life cycle. Accordingly, identification of the host factors involved in viral replication is of interest to understand the mechanisms of the virus replication cycle more comprehensively and to find new targets for the development of antiviral compounds. Several genome-wide RNAi screens (summarized in Table 1) have been conducted for influenza virus to identify host factors required for influenza virus replication and provide robust information regarding the screening method. Here, we review  [3] performed a genome-wide arrayed RNAi screen (17,877 genes; Dharmacon) in human U2OS cells and identified 133 host factors required for influenza virus replication as evidenced by the alteration of HA expression on the cell surface. In this study, the candidate genes were selected based on the criteria that the gene was knocked down by at least 2 unique siRNAs to minimize the probability of off-target effects. Three interferon-inducible transmembrane proteins (IFITM1, IFITM2 and IFITM3) were identified to be antiviral restriction factors during virus infection. Shapira et al. [4] used a pre-selected gene list (1,745 genes) based on integrated yeast two-hybrid and microarray data, which were obtained from influenza-related studies, to identify cellular modifiers of influenza virus replication in human HBEC cells. The authors identified 616 host factors that were proposed to be involved in modulation of virus replication and/or interferon production.
König et al. [5] and Karlas et al. [6] also utilized a modified genome-wide RNAi screening approach by using an arrayed siRNA library (Qiagen). König et al. generated recombinant influenza virus possessing renilla luciferase gene for the primary screen, and identified 295 host factors out of the 19,628 genes in mammalian cells. Among the 295 candidate genes, 219 host factors were further confirmed to be essential for efficient replication of the wild-type influenza virus (A/WSN/33) in human A549 cells. On the other hand, Karlas et al. took a two-step strategy to survey host factors involved in influenza virus replication. In the first step, A549 cells were transfected with siRNA, and then infected with wild-type influenza virus (A/WSN/33). In the second step, the virus supernatants were collected and used to challenge HEK293T carrying an influenza virus-driven luciferase reporter for primary screen. After screening of an arrayed RNAi library (22,843 genes), a total of 287 genes were identified as primary hits. The secondary screen was performed by using A/WSN/33 and A/Hamburg/04/2009 independently, and 168 out of the 287 primary hits were validated to modulate influenza virus replication. Notably, 72 of them were common host factors affecting the replication of both influenza virus strains.

Genome-wide pooled RNAi screens applied to identify host factors required for influenza virus replication
Rather than performing the arrayed RNAi screens (dsRNA or siRNA) as mentioned above, we established a genome-wide pooled RNAi screen (lentiviral shRNA expression system) based on a positive selection strategy to identify host factors required for influenza virus replication [7]. Under positive (survival) selection, cells with gene knockdown that provide resistance to influenza virus-induced cell death were selected and enriched in the population during pooled RNAi selection. The increase of the resistant cells in the population makes identification of the contents of shRNAs possible. In brief, A549 cells were infected with the pooled lentiviruses (multiplicity of infection; MOI = 0.3) generated from a genome-wide RNAi library from the RNAi consortium (TRC) [8], and subsequently challenged with influenza virus (A/WSN/33) at a cytotoxic dose. The surviving A549 cells were collected and subjected to deep sequencing to identify the embedded shRNA(s) that silenced host genes and protected cells from influenza virus-induced cell death. Ideally, host genes identified by this positive selection should be the essential factors supporting influenza virus replication. Out of the 16,368 genes screened, a total of 110 host genes targeted by at least two unique shRNA per gene were identified as our primary hits. The expression levels of these hits in A549 cells were further confirmed by EST or microarray analyses. We next carried out a high content image-based screen as a secondary screen and found that at least 38 candidate genes were involved in the early stages of the virus life cycle. Among them, E3 ubiquitin ligase Itch was proven to be an essential host factor for influenza virus "uncoating" [9]. Thus, we concluded that the genome-wide pooled RNAi screen via positive selection is a useful RNAi screen method for exploring host factors of lytic viruses.
Simultaneously, Tran et al. [9] also used a similar screening approach, by using a higher influenza virus dose with a shorter infection time, to explore specific host genes required for influenza virus-induced cell death. A genome-wide pooled RNAi library (Thermo Scientific Open Biosystems; also known as an RNAi library from the Hannon and Elledge Lab) targeting 21,415 host genes were screened and a total of 138 host genes were identified to be involved in influenza virus (A/NY/55/ 2004) replication. The authors validated these hits and identified APRIL, TWE-PRIL and USP47 as essential host factors for supporting influenza viral replication.

Factors that lead to successful identification of specific host factors by RNAi screens
Only a limited number of overlapping host factors were identified among these RNAi screens, suggesting that the parameters used in the different RNAi screening protocols likely affect the screening results. Relevant parameters may include: (i) characteristics of influenza virus (virus strain, virus quality, viral subtype, prototype or recombinant virus) and host cell line (cell type, cell quality, genetic profile); (ii) features of the RNAi library screened, particularly the knockdown efficiency of the RNAi resources, and the quality of the RNAi library; (iii) screening method/time points for analysis; and (iv) hitselection criteria and evaluation of proper controls, e.g., at least two siRNAs or shRNAs targeting to the same gene, and z-score of RNAi screen. Screens that used similar screening resources or approaches showed a higher degree of concordance.  (Table 3). It is also noteworthy that both groups analyzed the results at similar time points (2-3 days) after influenza virus infection. On the other hand, our screen strategy was based on cell survival and analyzed the hits at a later time point (4 weeks) after influenza virus infection [7]. Under such stringent selection conditions, some host factors targeted by effective shRNAs, such as factors for cell survival or cell proliferation, are likely completely excluded from our primary hits after long-term selection; as a result, our hit list includes fewer overlaps with those identified by other RNAi screens. Nonetheless, the hits identified are more likely to represent better drug targets as compared to those identified by the arrayed siRNA screens (see explanation below). Thus, both the arrayed and the pooled RNAi library screening procedures are able to identify specific host factors for viral replication. (A list of all candidate genes from each screen and overlapping genes from respective screens is available upon request).

Differences between arrayed RNAi screening and pooled RNAi screen/selection
So far, most of the reported studies have used an arrayed siRNA library. An assay with quantitative characteristics, such as reporter assay or detecting viral protein expression, is a common strategy for identification of host factors using the arrayed RNAi screens. After silencing by the arrayed siRNA, the reduction of reporter activity (or the decrease in viral protein expression) reveals the host factor(s) essential for supporting influenza virus replication.
Although several virus replication-related host factors have been successfully identified by the genome-wide arrayed RNAi screens, the high-cost of the siRNA library and the requirement for a robotic liquid handling system for high throughput screening are two major drawbacks of this approach. In addition, the duration of gene silencing by siRNA transfection usually lasts for less than 1 week, which limits the usage of siRNA-based screen if long-term knockdown is needed. The lentivirus-based arrayed RNAi screen or pooled RNAi screen, on the other hand, can overcome the short-term effect of siRNA to discover host factors crucial for influenza virus replication. However, the much higher cost (compared to an siRNA library) and the requirement for a robotic liquid handling system for genome-wide arrayed shRNA screen are again major concerns. On the other hand, the cost of the pooled RNAi reagent is much lower and screen/selection can be conducted in most laboratories.
We performed a genome-wide pooled RNAi screen and identified host factors using a survival-dependent screening strategy [7]. Influenza virus infection induces cell apoptosis in certain cell types [15][16][17], giving the theoretical basis for our positive selection strategy for the pooled RNAi screen. Taking advantage of inhibition of the lytic cycle of influenza virus by RNAi, we selected the surviving cells with specific genes knocked down by RNAi and identified the essential host factors required for influenza virus replication. In theory, using survival rate as a phenotypic change should be able to be applied to an arrayed siRNA screen; however, the window of opportunity for measurement is narrow making it difficult to determine hits. On the contrary, the genome-integration property of lentivirus-based shRNA expression maintains host gene knockdown over a long period of time; thus, there is plenty of time to select cells refractory to the cytopathic effect of influenza infection. As a consequence, the hits identified by pooled RNAi screen may be different from those obtained by arrayed siRNA screen. Points to consider when evaluating the pooled screen versus the arrayed screen include: (i) the negative factors for influenza replication, as well as anti-apoptotic factors for cell signaling cannot be selected by pooled RNAi screen/selection due to accelerated removal of virus-infected cells, while both positive and negative factors can be identified by arrayed siRNA screen; (ii) host factors essential for cell survival or cell proliferation required for influenza replication are excluded from the list of hits during long-term gene silencing; accordingly, the hits (genes) identified by pool selection are potentially more suitable as drug targets because they are required for influenza replication without  affecting host cell viability; (iii) conversely, the lentivirusbased RNAi screen provides an advantage in identification of host factors/proteins with long half-life, which might not be identified by siRNA transient transfection approach; (iv) cellular apoptotic factors would be easily selected out during positive selection since knockdown of these apoptotic genes will make cells more resistant to IAV-induced apoptosis; some of the selected cellular apoptotic factors may not be directly involved in influenza replication. In conclusion, performing screens using different approaches, such as various detection methods/parameters (i.e., RNA replication versus whole viral life cycle) and RNAi reagents (i.e., siRNA versus shRNA, or different siRNA libraries), should increase the chance of discovering essential host factors. Therefore, the gene lists from different screens should be able to complement each other. The optimal concentration of siRNA used for gene silencing varies (1-30 nM) from study to study, and is highly dependent on the cell type and its target. Transfection of siRNA by using improper concentrations of siRNAs may result in unwanted off-target effects or incomplete gene silencing during arrayed siRNA screens, which may produce misleading results. It may be difficult to determine an optimal siRNA concentration for performing an arrayed siRNA screen. On the contrary, pooled shRNA screens take advantage of lentiviral transduction at low MOI (usually between 0.1-0.3) to deliver single copy of shRNA into cells. This low MOI remarkably reduces the probability of off-target effects. However, heterogeneity of functional siRNAs may be generated by improper processing of a shRNA hairpin, thus causing offtarget effects. To minimize the off-target effects in both cases, targeted genes knocked down by at least 2 unique siRNAs (or shRNAs), which target different regions within the same gene transcript are recommended as the selection criteria, since the probability of off-target effects triggered by 2 independent siRNAs (or shRNAs) to the same target gene is extremely low. Unfortunately, the knockdown efficiency of individual siRNA in a given arrayed smart pool RNAi library is usually not known although the manufacturer guarantees that at least one of the provided siRNAs has good knockdown efficiency of 70% or more. Such knockdown information is less informative for researchers who intend to minimize the off-target effects by using two good individual siRNAs. By contrast, more than 40% of the shRNAs from the TRC RNAi library have been validated and the knockdown information of these shRNAs would provide great advantage for analysis of primary hits.
Other factors involved in genome-wide pooled shRNA screen/selection Genome-wide pooled shRNA screens are technically complicated, and numerous factors directly or indirectly affect the efficiency and accuracy of the screening results. Several experimental details have to be taken into consideration when performing a large-scale pooled RNAi screen. First, individual shRNA (shRNA-expressing lentivirus) should be enlarged to more than 250 representatives to guarantee that the effective shRNA(s) is not lost in subsequent experimental procedures and to compensate for the low expression level resulting from random integration sites of the shRNA expression cassette [18]. Second, an MOI of shRNA-expressing lentivirus at 0.1-0.3 is recommended to ensure that most of the surviving cells under puromycin selection receive only one copy of shRNA (one lentivirus) [18]. Reducing the MOI is not recommended, as further reduction of MOI only slightly reduces the probability of a cell being infected by two or more viruses, but significantly increases the number of cells needed for transducing the pooled shRNA lentiviruses (our unpublished data).
Third, when performing a pooled RNAi screen, the use of polybrene (hexadimethrine bromide) during lentivirus transduction should be avoided. Polybrene is a common polycation that increases the infectivity of lentivirus 3-5 fold (our unpublished data). The cationic polymer enhances virus adsorption and transduction by neutralizing the charge between viral envelope and cellular membrane [19]. Moreover, polybrene has the potential to facilitate virus aggregation which increases the possibility of multiple virus infection during the pooled RNAi screen. By using lentivirus carrying EGFP fluorescence or mCherry fluorescence, we mimicked a pooled screening and showed that the proportions of EGFP (+)/mCherry(+) cells in the presence or absence of polybrene were 13.6% and 8.5%, respectively (Figure 1, MOI = 0.3). Assuming that the probability of cells infected by multiple lentiviruses expressing EGFP or mCherry alone is, at most, half the value of EGFP/mCherry, the probability of multiple infections in the whole population would be equal or less than 27.2% and 17% in the presence or absence of polybrene, respectively. It should also be noted that the theoretical value (calculated by the Poisson distribution equation) of multiple virus infections in the condition of MOI 0.3 is 14.3%, which is very close to the actual proportion of multiple virus infection (17%) in performing a pooled lentivirus infection without polybrene. This result demonstrates that the use of polybrene augments the proportion of multiple virus infection by at least 1.5 fold, suggesting that the use of polybrene in the pooled RNAi screen may cause multiple virus infection and thus increase the noise of hits.
Unlike the arrayed RNAi screen, the potential hits (the content of shRNAs) selected/enriched by a pooled RNAi screen must be further identified by other methods, such as barcode microarray or next generation sequencing (NGS). The barcode microarray-based method employs PCR amplification of shRNA template sequence pools, labeling of fluorophore, and hybridization to complementary DNA microarray chip(s) from an experimental group as well as a reference group. After hybridization, the fluorescent signal intensity reflects the abundance of cells expressing a certain shRNA under test conditions as compared to the reference. However, the dissimilar amplification resulting from self-annealing of hairpin sequences of shRNA may cause detection bias during microarray analysis. The external barcode tags help to optimize hybridization conditions for each probe and avoid technique bias of PCR amplification caused by the secondary structures of the shRNA template sequence. Artifacts caused by cross-hybridization, also known as unspecific probe-target interaction, are the major concern of barcode microarray hybridization in identification of target shRNA from genome-wide pooled RNAi screen. NGS or namely deep sequencing technology provides a new approach for the analysis of the hits selected by pooled RNAi screens. In brief, the experimental procedure of NGS consists of: (i) isolation of genomic DNA; (ii) PCR amplification of shRNA; (iii) restriction enzyme digestion and gel purification of the PCR products; and (iv) ligation of PCR products to an adaptor for subsequent NGS analysis. In general, maintaining the fidelity of shRNA population during preparation of PCR fragments for NGS is important for subsequent hit determination. Our recommendations for preparing shRNA-containing PCR fragments for NGS are as follows: (i) Prepare appropriate amounts of genomic DNA without loss of shRNA complexity. Low amounts of genomic DNA may not cover the whole population of shRNA in a typically genomewide-scale negative selection experiment; (ii) Avoid creating conditions that generate sequencing bias during PCR amplification, such as over-amplification and two-rounds of PCR. Exponential PCR cycles without generating heteroduplex that are derived from annealing of two shRNA sequences occurring at the later stage of PCR are recommended; (iii) Prepare PCR fragments with half-arm of shRNA cassette for NGS, if possible, because full-length shRNA may impede sequencing; (iv) Templates should contain a 5' end phosphate and a 3' end hydroxyl group for efficiently adding adaptor for NGS; (v) Purify restricted PCR fragments by gel electrophoresis for deep sequencing. A protocol for genome-wide pooled RNAi screen can be found at: http://rnai.genmed.sinica.edu.tw/file/protocol/ PooledScreen_SequencingProtocol.pdf.
By detection of the amounts of the shRNA sequence, the NGS approach may hold greater promise for accuracy than that of the barcode microarray-based method for deconvolution of genome-wide pooled RNAi screening output. There are at least four advantages of using the NGS method in place of the barcode microarray hybridization. First, NGS technology is a cost-effective approach because it measures the presence of large quantities of distinct shRNA sequences in a short time. Secondly, NGS greatly improves the sensitivity and efficacy of detection by offering a digital readout of even very small amounts of shRNA species. Third, the NGS method provides a greater detection range and better resolution of measurements, which enables clear discrimination of true hits from background noise. Finally, NGS is a more flexible approach for the identification of hits from a genome-wide RNAi screen, which can be easily incorporated into any high-complexity shRNA library without generating new information of barcode and linkage to individual shRNA sequence.