Perspectives on Systematic Analyses of Gene Function in Arabidopsis thaliana: New Tools, Topics and Trends

Since the sequencing of the nuclear genome of Arabidopsis thaliana ten years ago, various large-scale analyses of gene function have been performed in this model species. In particular, the availability of collections of lines harbouring random T-DNA or transposon insertions, which include mutants for almost all of the ~27,000 A. thaliana genes, has been crucial for the success of forward and reverse genetic approaches. In the foreseeable future, genome-wide phenotypic data from mutant analyses will become available for Arabidopsis, and will stimulate a flood of novel in-depth gene-function analyses. In this review, we consider the present status of resources and concepts for systematic studies of gene function in A. thaliana. Current perspectives on the utility of loss-of-function and gain-of-function mutants will be discussed in light of the genetic and functional redundancy of many A. thaliana genes.


INTRODUCTION
Because of its small size, short life cycle and compact genome, Arabidopsis thaliana (hereafter Arabidopsis) has been developed into a model system for plants. Since its nuclear genome was completely sequenced 10 years ago [1], the quest to decipher the functions of all its genes has been a major strand in Arabidopsis research. Forward (fromphenotype-to-gene) and reverse (from-gene-to-phenotype) genetics have been the main workhorses en route to this goal. Even before the Arabidopsis Genome Initiative got underway, numerous forward genetic screens had identified mutants with abnormal phenotypes that resulted in the identification and functional characterization of many genes for specific plant functions. Identification of the mutated genes is usually the most labour-intensive step in classical forward genetics. With increasing knowledge of the sequences of Arabidopsis genes and their transcripts, reverse genetic screens have become possible, which complement the classical approach by permitting direct analysis of mutants for specific genes, selected for their potential impact on a biological function of interest. According to the Annual Report 2010 of The Multinational Coordinated Arabidopsis thaliana Functional Genomics Project (http://www.arabidopsis.org/ portals/masc/masc_docs/masc_reports.jsp), the experimentally verified functions of just over 9000 of the ~27,000 Arabidopsis genes had been annotated as of March 2010.
In silico approaches can assess the functions of Arabidopsis genes whose homologs have been characterized in other species, or which code for proteins with conserved domains, and gene functions can be predicted by correlating data derived from functional genomics analyses, such as large-scale mRNA, protein or metabolite profiling [2][3][4]. However, to decipher the biological function(s) of every gene in Arabidopsis will require much further effort, and many projects have been initiated to create the resources required for this formidable task [5]. Technological advances in recent years have led to the assembly of large collections of Arabidopsis mutants bearing molecularly mapped insertions. Besides total inactivation of the gene function, some of the newly developed mutagenesis techniques allow one to alter the dosage or function of gene products, making refined analyses of gene-function relationships possible. Several databases are now available that permit researchers to scan the Arabidopsis genome for mutations in genes of interest and obtain the corresponding mutant lines from public stock centres.
To elucidate the function of the entire complement of Arabidopsis genes requires ways to systematically assess a wide range of phenotypes, which can be applied in a highthroughput manner to large numbers of mutants. Saturated collections of loss-of-function and gain-of-function lines will also be very valuable. A further challenge arises from the recognition that the vast majority of gene knock-outs in Arabidopsis do not give rise to obvious phenotypes, and might require functional characterisation by -omics-type analyses or the simultaneous inactivation/down-regulation of more than one gene. In this review, we summarize the state of the art in systematic gene-function analyses and highlight tools and trends in the application of systematic forward and reverse genetics to Arabidopsis.

TOOLS FOR THE ANALYSIS OF GENE FUNC-TIONS
Several methods have been developed that enable one to change the amount or the nature of gene products either by altering the original gene through the introduction of mutations or by introducing transgenes. Since the Arabidopsis genome sequence was published most efforts have focused on generating loss-of-function lines for reverse genetics. In addition, methods for overexpressing genes have gained in importance more recently [6,7].

Loss-of-Function Mutants (Knock-Out Approach)
The most straightforward approach to investigating the function of a gene is to characterize the phenotypic changes associated with its total inactivation [8]. Loss of gene function can be achieved by introducing point mutations or short insertions/deletions, typically by means of chemical or physical mutagenesis, or by disrupting genes by inserting larger DNA sequences, such as T-DNAs or transposable elements.

Point Mutation and Short Insertions/Deletions
Point mutations can be introduced with high frequency by mutagenization with chemicals such as ethyl methanesulfonate (EMS), but only a small fraction of such mutations actually lead to total loss of gene function, e.g. when they generate a premature stop codon or alter functionally critical amino acids [9]. Similarly, short insertions/deletions usually eliminate gene function only if the coding sequence is affected and a frameshift results. Point mutations and short insertions/deletions have the disadvantage that the location of the mutation is random and cannot be predetermined. Therefore, in order to have a reasonable probability of finding loss-of-function mutations for a given gene, large numbers of mutants must be generated. To remove the unwanted background mutations, several rounds of backcrossing to the wild type are usually required.
The identification of individual, randomly induced point mutations or short insertions/deletions within the genome of a mutant is a laborious process involving map-based cloning [10], although recent advances in high-throughput sequencing technology [10,11] might make resequencing of the entire mutant genome the method of choice in future. Because Arabidopsis reference genome sequences (such as the Col-0 ecotype) are already available, new generation sequencing of mutant genomes will be time and cost effective. Another approach to detect small changes within the genome is an adaptation of the microarray technology, which detects single nucleotide polymorphisms (SNPs) through hybridizing genomic DNA to oligonucleotides representing the entire genome spotted on chips [12]. With continuous advances in sequencing technologies, genome-based methods can replace the conventional marker-based genotyping approach. Nevertheless, because EMS-mutated plants contain many additional point mutations, it might be recommendable to narrow down first the mutation to a chromosome arm with mapbased approaches and conventional techniques.
For reverse genetic approaches employing EMS-induced mutations, the "Targeting Induced Local Lesions IN Genomes" (TILLING) method has been developed [13]. Here, the endonuclease CEL I identifies mismatches in heteroduplexes between wild-type and mutant DNA sequences after PCR-based amplification of genome sequences of interest [14]. For systematic reverse genetics screens, each target coding sequences must be covered by PCR amplicons [15]. By September 2010, 9550 TILLING lines in which the mutation has been assigned to a specific gene had been donated to public stock centres ( Table 1).
Most point mutations do not cause complete gene knockout, but rather a decrease in gene expression (see also Section 2.2) or a change in the function of the gene product due to alterations in its sequence. Amino acid exchanges can lead to hypomorphic alleles displaying a range of phenotypes, allowing the analysis of important domains of the protein.
Such variants can be pinpointed by searching the TILLING database ( Table 2). The recent release of the genome sequences of 80 different Arabidopsis ecotypes (with more to come when the 1001 Genomes project is completed in 2011) makes this an important tool for understanding natural variation of sequences and correlating them with evolutionary processes [16,17] (Table 2).

Insertional Mutagenesis
Insertional mutagenesis has two major advantages: the mutations are labelled by the inserted fragment of known sequence ('tag') and insertions within the coding region have a high probability of eliminating the gene function. In Arabidopsis, T-DNA or transposons -mainly the Activator (Ac)/Dissociation (Ds) system -are usually employed as insertional mutagens [18]. In an international effort involving many laboratories using both systems, a large number of insertion mutants has been generated ( Table 1) which now provide nearly genome-wide coverage -about 96% of all Arabidopsis genes according to the Annual Report 2010 of The Multinational Coordinated Arabidopsis thaliana Functional Genomics Project (http://www.arabidopsis.org/portals/ masc/masc_docs/masc_reports.jsp). The regions flanking the insertions (flanking sequence tags, FST) in over 325,000 lines have been sequenced by several laboratories and mapped to the Arabidopsis reference genome to identify the precise insertion sites. These lines have been indexed, deposited in public stock centres and are accessible for Internetbased searches ( Table 2).
Because stock centres distribute T-DNA or transposon lines usually as populations that segregate for the mutation, a genotyping step is necessary to identify homozygous plants for the analysis of phenotypes. Recently, populations of homozygous SALK lines were generated, which can be used directly for phenotypic screens [19]. To date (September 2010) these homozygous lines represent 18,318 genes, with about 9000 genes covered by two alleles. However, a certain proportion of hemizygous lines escaped systematic genotyping, so that a certain fraction -perhaps as large as 20%might still not be homozygous (our unpublished results). Moreover, because T-DNA lines often contain more than one insertion, for the unambiguous assignment of phenotypes to mutations either at least two insertion alleles for each gene should be analysed or the mutation be complemented with the wild-type gene.  [131] amiRNA (CSHL-based 2010 project) amiRNA 17,699 clones see Table  2 Seattle Arabidopsis TILLING Project EMS Col-0 erecta 9,550 (in progress) see Table  2 gain-of-function CRES-T chimeric repressors 395 (in progress) - [132] Chromatin charting luciferase Col-0 277 (in progress) in progress [42,133] Collections with less then 1000 lines are not listed unless specifically referenced in the main text. Numbers of publicly available lines have been obtained from the corresponding websites/references, the stock centers NASC, ABRC, RIKEN Bioresource center and http://signal.salk.edu/Source/AtTOME_Data_Source.html (as of September 2010). T-DNAAT, T-DNA activiation tagged; En/SpmAT, En/Spm activation-tagged.

Mutants with Reduced Gene Expression (Knock-Down)
The utility of loss-of-function lines is limited when mutations result in lethality, or in cases of genetic redundancy, when more than one copy of the gene exists as a result of tandem or segmental duplications or the evolution of multigene families. To overcome these drawbacks, transgene-mediated gene silencing can be used to reduce but not completely abolish ("knock-down") the expression of gene(s) of interest. Such "knock-down" lines allow one simultaneously to down-regulate several, sequence-related genes and thus avoid the complications associated with genetic redundancy. Because knock-down lines are not null mutants and show a diverse pattern of phenotypes due to differences in the levels of activity remaining, the more subtle phenotypes induced by the partial or conditional inactivation of essential genes can also be studied.
Silencing is normally achieved by post-transcriptional down-regulation of transcript accumulation via small RNAs that act in a sequence-specific manner by base pairing to complementary mRNA molecules. Various strategies for small RNA-based gene silencing have been developed [20]. The first to be developed was the antisense approach, in which part of a gene was expressed in reverse orientation, usually under the control of the Cauliflower Mosaic Virus (CaMV) 35S promoter [21]. Later the RNA interference (RNAi) approach was introduced. This uses a binary hairpin RNA vector into which gene-specific tags (GSTs) are cloned [22]. Transformation of Arabidopsis with RNAi constructs is being systematically conducted by the AGRIKOLA (Arabidopsis Genomic RNAi Knock-Out Line Analysis) Consortium to create a collection of silenced Arabidopsis lines which are available through the Nottingham Arabidopsis Stock Centre (NASC) [23] (Tables 1 and 2). A newer approach involves the use of artificial microRNAs (amiRNAs) [20,24,25]. The systematic design of amiRNA has led to the generation of 17,699 clones for use in Arabidopsis (as of June 2010), providing an important tool for future functional genomics studies ( Table 1). The Chimeric Repressor Silencing Technology (CRES-T) represents a novel method for creating loss-of-function mutations for the analysis of redundant plant transcription factors [26]. With CRES-T, a transcription activator of interest can be fused to a peptide or protein that converts it into a transcriptional repressor, which dominantly suppresses the expression of its target genes even in the presence of the redundant transcription factor. The phenotypic information obtained from the application of CRES-T in Arabidopsis is stored in the Web-based interface FioreDB [27] (Tables 1 and 2).

Gain-of-Function Lines
Gain-of-function lines can be used to dissect the functions of genes that are genetically or functionally redundant, i.e. which are members of gene families or are tandemly or segmentally duplicated, or whose function can be compensated for by alternative regulatory pathways [28]. The phenotypes caused by gain-of-function mutations segregate as dominant traits [8,29] and individual members of a gene family can produce a gain-of-function mutant phenotype that cannot be modified by other members of the family [30,31].
Often the mutant phenotypes induced by loss-of-function and gain-of-function approaches are complementary to each other. Because of the dominant nature of the gain-of-function phenotypes, there is no need for an additional genotyping step to identify lines homozygous for the transgene.
Overexpression of a gene driven by a strong promoter can result in gain-of-function phenotypes, especially when expression is also ectopic. This strategy requires the use of full-length cDNA clones. The RIKEN Arabidopsis fulllength (RAFL) collection now makes large numbers of such cDNA clones available. The RAFL collection was generated by inserting full-length cDNAs in the correct orientation between the CaMV 35S promoter and the NOS terminator and verifying their sequences [32]. About 10,000 different RAFL cDNA clones were then pooled and transformed into Arabidopsis to generate so-called 'FOX' (Full-length cDNA Over-eXpressing gene) lines [33]. The FOX hunting system is useful for systematic phenotypic analysis of the function of each inserted gene. Visible phenotypes were induced in about 9% of transformed lines and details of these have been collected in a database (see Section 3.2 and Table 2).
In the activation tagging approach, DNA sequences carrying enhancers (e.g. four tandemly arranged copies of the CaMV 35S enhancer) are inserted randomly into the genome [34]. Large numbers of activation-tagging lines have been produced, and phenotypes observed could be attributed to specific genetic events by sequencing the regions flanking insertions and mapping these to the reference genome [28,29,[34][35][36][37][38][39]. The distances between insertions and the genes affected were found to vary from 0.4 to 8.2 kb. Enhancers can control the expression of genes on both sides of the insertion and, indeed, in some cases more than one gene is affected [29,40]. In addition to dominant gain-of-function phenotypes, knock-outs are also observed in these lines, when an insertion is located within a coding region. Between 0.1 and 1 % of the activation tagging lines showed visible phenotypes [28,29,[35][36][37][38]. Recently, a new activationtagging method has been developed which uses pEnLOX lines containing four tandemly repeated CaMV 35S transcriptional enhancers flanked by two loxP sites, and pCre lines that contain the CRE gene [41]. By crossing pENLOX lines to pCre lines, the CaMV 35S enhancers can be deleted, causing reversion to the wild-type phenotype, and allowing for speedier confirmation of gene-function relationships.
The Chromatin Charting Project has set itself the task of producing over 15,000 Ds-tagged lines of Arabidopsis, each of which contains within the Ds-tagging cassette a CaMV 35S::luciferase gene (which allows monitoring of position effects) and a lac operator repeat (which allows the tagged loci to be visualized by expressing fusions of fluorescent proteins with LacI within the nuclei of living plants). The selected lines will be used to evaluate transgene position effects and how these are affected by developmental or externally applied cues; since position effects on gene expression arise from alterations in chromatin architecture, this approach provides insights into global epigenetic control [42,43] (Table 1).

SYSTEMATIC FORWARD GENETIC SCREENS
Forward genetic screens have traditionally been used to identify genes involved in a specific biological process of interest. A range of mutant collections have been analysed, including chemically or physically mutagenized populations -which have the advantage that it is, in principle, possible to attain saturation, i.e. to generate multiple mutant alleles for each gene -and tagged mutants, which allow the mutated gene to be identified in a relatively straightforward manner. Screens for a wide variety of phenotypes have been carried out in Arabidopsis, and only a selection can be discussed here.

Generic Screens
The RIKEN Activation Tagged Lines, comprising about 50,000 independently transformed Arabidopsis lines [28,40] (see Table 1), have been screened for visually discernible phenotypes. In all, 1262 lines exhibited visible phenotypes (e.g. abnormalities in morphology, growth rate, plant coloration, flowering time or fertility) and sequence information is available for 1172 mutant loci ( Table 2). Of four dominant mutants that showed hyponastic leaves, downward-pointing flowers and decreased apical dominance, two were caused by activation of the gene ASYMMETRIC LEAVES2 (AS2) and two by activation of ASL1/LBD36 [28].
The FOX lines (see Section 2.3, Table 1) were also screened for abnormal morphologies, fertility and leaf coloration. A total of 1487 candidate morphological mutants were found among 15,547 transformants, and of 115 pale green T 1 mutants, 59 lines displayed the mutant phenotypes in more than 50% of the T 2 progeny, suggesting that the phenotypes were caused by gain of function mutations (see Section 2.3). Two leaf coloration mutants were further characterized and one mutant was found to result from upregulation of AtPDH1 (Arabidospsis prokaryotic DEVH box helicase1); knock-out of this same gene resulted in an albino phenotype [33].

Screens for Developmental Phenotypes
The Seed GenesProject [44][45][46][47] was designed to identify EMBRYO DEFECTIVE (EMB) genes required for embryo or seed development, and the December 2007 release of its database SeedGenes ( Table 2) includes information on 358 genes required for seed development and 605 mutant alleles with known disruptions in these genes. EMB genes encode proteins involved in basic cellular processes, such as DNA replication, RNA processing and protein synthesis, as well as chloroplast proteins, indicating that a functional chloroplast is needed for normal embryo development in Arabidopsis.
Furthermore, over 500 seedling-lethal mutants were identified in a set of 38,000 insertion mutants, of which 54 were molecularly characterized and 22 mutants were described in detail [48]. Many of the mutants displayed altered pigmentation and affect genes encoding proteins predicted to reside in the chloroplast.
To isolate mutants defective in early steps of meiotic recombination, a specialized two-step genetic screen employing 55,000 T-DNA insertion lines has been carried out [49]. In the first step, all lines were screened for fertility defects on the basis of reduced silique elongation. In the second step, differential interference contrast microscopy was used to analyse male meiotic products in developing pollen mother cells. Four genes involved in the repair of meiotic DNA double-stranded breaks were identified, together with five genes necessary for formation of meiotic DNA doublestrand breaks [49].

Screens for Genes Important for Male or Female Gametophyte Development and Reproduction
The life cycle of plants alternates between a haploid gametophyte and a diploid sporophyte phase. Mutations that eliminate or disrupt the function of the haploid pollen (male gametophyte) or of the embryo sac (female gametophyte) can only be maintained in the heterozygous state, because such mutations cannot be transmitted through the defective gamete (reviewed in [50][51][52]). In addition, mutations in sporophytically expressed genes might result in male or female sterility [53,54].
Visual inspection of siliques for 50% or more desiccated ovules, as well as screens for aberrant transmission of an antibiotic resistance marker gene, led to the identification of gametophytic mutants [55][56][57][58][59][60][61]. In Arabidopsis, due to the widely used "floral dipping" transformation protocol the primary target of T-DNA integration is the embryo sac. Therefore, it can be difficult to recover T-DNA insertional knock-out lines, in which female gametophyte function is severely compromised [56]. To overcome this problem, transposon insertion lines have been used. Ds transposable elements are mobilized during the sporopyhtic stage of a plant's life cycle and most Ds transposon insertion lines possess further advantages like single-copy insertions without generating DNA rearrangements or truncations [62]. 24,000 Ds insertion lines containing either a Ds gene trap or an enhancer trap element were screened and 130 Arabidopsis mutants with defects in female gametophyte development and function were identified [63]. The inheritance of the Ds element could be followed by the linked antibiotic resistance marker. Given the dominant mode of inheritance of the resistance phenotype, a segregation of 3:1 is expected in the progeny of a heterozygous Ds insertion line. Aberrant segregation ratios lower than 2:1 are often the result of a gametophytic defect. For 1.38% of the screened lines a segregation ratio of 1.5:1 or less was found and a wide variety of mutant phenotypes were observed, such as defects in different stages of embryo sac development and in processes such as pollen tube guidance, fertilization or early embryo development. Besides female gametophytic mutants, 109 lines showing predominant defects in male gametophyte transmission were also identified in this collection [64].

Screens for Altered Metabolism or Stress Tolerance
In a screen of the TAMARA gain-of-function collection [37] for mutants with enhanced levels of phenolic compounds, up-regulation of an R2R3-MYB transcription factor was found to be responsible for the phenotype [28,37,65]. Genes controlling proanthocyanidin biosynthesis have also been identified in both the TAMARA and Saskatoon collections ( Table 2) [39].
A small subset of the FOX lines has also been assessed for mutants affecting stress-inducible transcription factors. Among 43 cDNAs tested in a transgenic plant library, the authors identified salt-tolerant lines, all of which overex-pressed the same bZIP-type transcription factor, AtbZIP60 [66].
The COS (controlled cDNA overexpression) system is based on a cDNA expression library that is driven by an inducible promoter. Sets of 20,000 to 40,000 transgenic seeds/seedlings were screened in three different ways, namely for ABA insensitivity, salt tolerance and for activation of a stress-responsive alcohol dehydrogenase-luciferase reporter system [67,68]. Twenty-seven cDNAs conferring dominant, inducible stress-tolerance phenotypes were identified, and recloning of several of these cDNAs confirmed the observed phenotypes [67].

Chemical Genetic Screens
Treatment of mutant collection with a chemical compound to elucidate gene functions and signal transduction pathways are widely applied [69]. Limitations of this approach can be limited uptake of the compound by the plant or transport over membranes, and that possible modifications such as acetylation might inactive the compound. A prerequisite for these so called "chemical genetic screens" is the availability of a library of chemicals. The chemicals are usually used to screen for changes of a specific physiological parameter. For strigolactones, for example, that play a role during parasitic seed germination of plants such as Striga, a collection of related small molecules -cotylimides -that act as genetic suppressors of strigolactone levels were identified this way [70]. These cotylimides were used in a suppressor screen with T-DNA mutants, and light-signaling genes were identified as positive regulators of strigolactone levels [70]. Another example is the compound pyrabactin. Pyrabactin has been isolated as a synthetic seed germination inhibitor that mimics abscisic acid (ABA) in a highly selective way, thereby activating the ABA pathway. Arabidopsis EMS mutants were screened with pyrabactin and a new component of the ABA pathway, the PYRABACTIN RESISTANCE 1 (PYR1) gene was identified. It codes for a START protein, which was characterized as the long-elusive ABA receptor [71].
Besides EMS or insertion mutants multiple Arabidopsis natural accessions are a rich source for applying chemical genetic screens. When Arabidopsis natural accessions were subjected to a library of more than 10,000 compounds, twelve accession-specific molecules were identified [72]. The natural resistance to hypostatin, an inhibitor of cell expansion, was identified in this way and characterized as HYR1, a UDP glycosyltransferase.
Chemical molecules that inhibit a specific protein function result in a similar phenotype like corresponding loss-offunction mutants. Vice versa, plants treated with a compound that activates a protein function resemble gain-of-function mutants. Variation in concentration of the chemical compound can act similar to allelic series of mutations. If a chemical is able to interfere with the function of all members of a protein family, it can substitute for mutants of a higher order. The ability to perturb protein functions only in the presence of a chemical and therefore for a limited time allows the study of essential genes that would display a lethal phenotype if permanently mutated. Nevertheless, currently only a limited number of chemical compounds with a de-fined function are available to be applied for biological questions. Helpful are databases such as ChemMine [73] or Pubchem (http://pubchem.ncbi.nlm.nih.gov/).

Non-Invasive Screens
Ideally, high-throughput screens should not damage the plants investigated so that they can be used directly for the next steps in forward genetics. Besides visual screens for altered growth or coloration phenotypes, other non-invasive screening procedures have been employed. For instance, systematic measurements of chlorophyll fluorescence have been used to determine the efficiency of photosynthetic electron flow. To this end, image analysis [74,75] and a semiautomated pulse-amplitude modulated fluorometer device [76] have been developed. Chlorophyll fluorometry has also been used for the quantitative assessment of drought survival [77]. The luciferase and green fluorescent protein (GFP; and its derivatives) reporter genes allow one to monitor changes in the expression of genes by measuring luminescence and fluorescence, respectively, and these have been utilized to identify mutants with deregulated gene expression or altered subcellular localisation of gene products (e.g. [78,79]). Especially the luciferase screen with its very short response time can be employed to test for transient inductions. This system has been applied in stress or ABA-related mutant screens [80,81], as well as in circadian rhythm related mutant screens [82][83][84].
Some of the lines generated within the SAIL project were transformed with a vector that carries the LAT52:GUS reporter gene, a visible cell-autonomous marker for pollen grains and tubes [44]. This population was generated in the qrt1-2 mutant background -a mutant, which maintains the four male meiotic products in a tetrad [85] and which is very useful for non-invasive screens during male gametophyte development. Mutant lines can be classified as either homozygous or hemizygous for their T-DNA insertion by staining their pollen grains. Consequently, the hapless (hap) mutations was isolated, which impairs the development or function of haploid gametophytes [86].
The collection of GAL4 enhancer trap lines harbors a GAL4-responsive GFP gene as a marker to tag specific cell types and to reveal developmental transitions [87]. The lines can be screened microscopically or even viewed as images (http://www.plantsci.cam.ac.uk/Haseloff/geneControl/catalo gFrame.html) and are very useful to identify enhancers for a specific cell type. The screen has been successfully applied and resulted in the isolation of Dof elements, which are involved in the regulation of guard cell gene expression [88]. In a further step some well characterized lines with specific expression patterns can be used as starter lines for EMS mutagenesis. This approach was applied to dissect the mechanisms that specify the different populations of pericycle cells. Thus, a pericycle-specific enhancer trap line was mutagenized and several mutants exhibiting qualitative or quantitative alteration of the GFP expression pattern were isolated [89].

SYSTEMATIC REVERSE GENETICS SCREENS
Efficient systematic reverse genetics requires the availability of gene-indexed mutant populations, and the use of robust, high-throughput assays for phenotypic characteriza-tions. Studies of this type in Arabidopsis have benefitted from previous experience with the model organisms Caenorhabditis elegans [90] and Saccharomyces cerevisiae [91]. In particular, T-DNA mutant collections have been widely used for reverse genetics, as indicated by the more than 2000 cumulative citations in the literature [9]. The majority of genes in the Arabidopsis genome are members of gene families [1], so that redundant gene functions often have to be assessed by generating double or higher-order mutants from T-DNA lines.
In the course of systematic reverse genetic screens, lines from different collections (see Table 1) are usually combined, in order to achieve maximum coverage of the target genes. In addition, for genes that are not covered by existing mutant collections, partial loss-of-function lines are often generated by the knock-down approach (see Section 2.2).

Small-and Medium-Scale Screens
Numerous small-and medium-scale reverse genetic screens have been performed which have addressed specific biological functions or multigene families. Only a small selection can be discussed here ( Table 3). One of the first systematic screens with insertion-tagged populations was performed more than 20 years ago. A collection of 8,000 Arabidopsis plants carrying 48,000 insertions of the maize trans-posable element En-1 was screened for knock-out alleles of genes involved in flavonoid biosynthesis, utilizing a PCRbased screening protocol [92]. Another early example involved the systematic analysis of nuclear genes for photosystem I subunits, and used T-DNA or transposon insertion lines, together with knock-down lines [93,94].
For reverse genetic analysis of the nine members of the 1-aminocyclopropane-1-carboxylate synthase (ACS) gene family, seven T-DNA mutants and two amiRNA lines were employed [95] (Table 3). A larger screen was conducted for subtilisin-like serine proteases (subtilases), which comprises 56 members in Arabidopsis. Here 144 T-DNA insertion lines for 55 subtilase genes were identified. With the exception of SDD1, none of the lines displayed an obvious visible phenotype in the homozygous state [96]. This result highlights the need to use robust, high-throughput phenotypic screening methods to assign a phenotype, i.e. a biological function, to each gene, even in medium-scale screens.
In some cases, where genes with a known overall function were characterised by reverse genetics, an in-depth description could be provided for every member of the set. This was achieved, for instance, for all genes coding for photosystem I subunits (see above; reviewed in: [93]). The opposite case is represented by the so-called "guilt-byassociation" approach, which emerged recently and is based Table 3
For instance, it was shown that nuclear genes for chloroplast proteins involved in photosynthesis or plastid gene expression indeed exhibit a high level of co-expression at the transcript level [99]. This finding was then exploited to systematically characterize by reverse genetics genes of unknown function that exhibited photosynthesis gene-like transcriptional profiles, and led to the identification of PGRL1, a central component of cyclic electron flow around photosystem I (PSI) [100]. A gene for a plastidic bile-acid transporter was also isolated based on a co-expression approach [101]. Moreover, a set of genes, which are specifically expressed in pollen tubes in response to their growth in the pistil but not expressed during other stages of pollen or plant development, was identified. For 33 pollen tube-expressed genes the respective mutants were analyzed in a reverse genetics approach. The mutants were derived from a subset of the SAIL collection that was generated in the quartet (qrt) background containing a LAT52 promoter driven GUS reporter gene [44]. Indeed, seven out of 33 investigated genes were critical for pollen tube growth [102]. Other examples are reviewed in [103]. However, the "guilt-by-association" approach often involves screening large numbers of genes, and very few mutants exhibit the "desired" phenotype.
An example of chemical genetic screening in combination with reverse genetics involves Arabidopsis lines overexpressing 91 P450 cytochrome oxidases. In the course of this screen two P450 cytochrome oxidases that hydroxylate the therapeutic compound 8-methoxypsorelen were identified [104]. This demonstrates the potential of chemical genetic screening in combination with reverse genetics to assign specific functions to individual members of a large protein family.

Large-Scale Screens
In the course of the Chloroplast 2010 Project mutations in nuclear genes that were computationally predicted to encode chloroplast-targeted proteins were subjected to a diverse set of phenotypic screens [105,106]. During this study a total of 2733 mutants were found to be homozygous for the inspected T-DNA insertion, and 85 phenotypic characters were evaluated in each. The phenotypic tests involved quantitative measurements of metabolites, such as fatty acid methylesters in leaves or amino acid profiles in seeds, and quantification of photosynthetic parameters by a chlorophyll fluorescence assay [106]. Data are available on the project's website (see Table 2). Among the confirmed mutants obtained by the Chloroplast 2010 Project are mutants for At1g10310 (previously described as a pterin aldehyde reductase involved in folate metabolism) that were found to have an altered fatty acid desaturation phenotype. Single insertions in the gene Acyl Carrier Protein4 (ACP4) were responsible for defects in growth, fatty acid profiles and chlorophyll fluorescence, previously associated with the knock-down of the Acyl Carrier Protein gene family. However, the case studies of the Chloroplast 2010 Project also stressed the difficulty of securely establishing genotype-phenotype relationships. It became clear, that -as in traditional reverse genetic screens -it is essential to confirm the genotype in advance and to exclude the possibility that a secondary mutation causes the phenotype. A second independent collection of 3246 Ds/Spm-or T-DNA-tagged homozygous lines for 1369 nuclear-encoded chloroplast proteins has been initiated [107]. Phenotypes were monitored in 3-week-old seedlings grown on agar plates and images are presented on the Chloroplast Function Database (http://rarge.psc.riken.jp/ chloroplast/).

The Unimutant Collection -not Quite the Ultimate Tool?
The isolation of two homozygous insert lines for every gene in the Arabidopsis genome would make it possible to systematically test the phenotypic consequences of gene loss under a wide variety of conditions by forward genetics [19]. The SALK unimutant collection includes 18,318 genes, but so far only about 9000 of these are covered by two alleles. Once the unimutant collection is complete, it can be screened on several large-scale platforms for high-throughput assays e.g. [108][109][110][111]. The major advantage of screens based on the unimutant collection is obvious: exhaustive screens can be carried out with high efficiency, because loss-of-function lines for all Arabidopsis genes can be tested relatively quickly. But loss-of-function lines have their limitations. For instance, they provide only the first step in the analysis of essential genes (see Section 2.2). Secondly, the function of many genes will be very difficult to assess because of their apparent phenotypic silence in many assays. Thirdly, genetic redundancy will continue to hamper attempts to assign functions to members of gene families (see Section 5.2).

From the Unimutant to the Uni-Multimutant Collection: A Closer Approach to the Ultimate Tool
How can the problem of genetic redundancy be solved for thousands of Arabidopsis genes? In the simplest case, when two unlinked genes code for the same protein, the systematic generation of double mutants will make such genes manageable. In fact, in the course of the German Plant Genome Initiative (GABI), one ongoing project is dedicated to the systematic generation of segmentally duplicated genes. Segmental duplications account for the duplication of around 2900 genes in the Arabidopsis genome and can encompass chromosomal segments with up to about 300 genes which have been duplicated as blocks [112,113]. In the GABI-DUPLO project, corresponding double mutants are being generated for 750 such gene pairs by crossing of T-DNAbased single mutants, selfing and PCR-based genotyping, representing a valuable complement to the unimutant collection (https://www.gabi-kat.de/duplo.html).
Closely linked gene duplications, including tandem duplications as well as gene families containing more than two closely related genes, can only be handled by gene silencing approaches, and here the AGRIKOLA collection or amiRNA lines (see Section 2.2) promises to meet this need.

Targeted Mutagenesis
Site-directed mutagenesis remains a significant technical challenge in higher plants. Many attempts have been made to establish targeted mutagenesis in Arabidopsis, but no routine approach is available yet. Recently, zinc-finger nucleases (ZFNs) were used for targeted gene inactivation in Arabidopsis [114]. In this study it was shown that ZFNs engineered to act on two different genes and driven by a heatshock promoter induced mutations in their target genes at frequencies of 3% and 2.6%, respectively.
The use of transcription activator-like effectors (TALEs) represents another approach with great potential. TALEs are injected into plant cells via the type III secretion pathway found in many plant pathogenic Xanthomonas spp. Once inside, they may contribute to disease or trigger resistance by binding to DNA and turning on TALE-specific host genes. Recent advances in understanding of the mechanistic basis of specific DNA binding by TALEs [115] should permit engineering of TALEs for arbitrary gene activation or as translational fusions for directing other proteins, such as negative regulators, methylases or nucleases, to specific DNA targets (reviewed in [116]).

Cataloguing of Mutant Phenotypes
Because very many research groups generate data from forward and reverse genetics experiments, a central depository for documentation of mutant phenotypes is highly desirable. To this end, The Arabidopsis Information Resource (TAIR) recently began to assign a phenotype to each mutant allele and made pictorial records of the mutant phenotypes available online ( Table 2; [117]). Curators at TAIR have been capturing phenotypic information from the Arabidopsis literature since 2005. As of July 2009 the TAIR database contained 7352 distinct free-text phenotypes associated with 11,381 distinct genotypes derived from almost 1500 publications.
Phenotypic databases already exist for individual mutant collections. These include, for instance, the Riken Arabidopsis Phenome Information Database (RAPID) ( Table 2), which is a searchable database describing morphological phenotypes of ~4,000 Ds transposon mutant lines [118]. Phenotypic descriptions of these lines have been classified into eight primary categories (such as seedlings, leaves, stems, flowers and siliques) and fifty secondary categories. Images of individual plants are also available and can be searched by the line number or the phenotypic categories. Similarly, the Bioassay and Phenotype Database (BAP DB) ( Table 2) is a database for exploring gene functions based on available phenotypes and for screening data from mutant, transgenic and wild-type organisms. It currently contains some project-specific phenotypic data for mutants under various abiotic stresses. BAP DB also allows users to upload their own assay and phenotypic data.

The Future of Systematic Phenotypic Screens
Assuming that one needs to screen a minimum of two homozygous individuals for two different mutant alleles for each of the approximately 27,000 genes in Arabidopsis, a total of 108,000 individual plants would have to be planted to analyse the completed unimutant collection (see Section 5.1). When grown in standard trays, a total greenhouse area of 420 m 2 would be needed -a requirement which becomes feasible if plants are screened consecutively and not in parallel. A space-saving alternative are screens which can be performed on seedlings using either media plates or liquid assays in media-containing multi-well plates, to which different compounds to generate abiotic or biotic stresses can be easily added. Moreover, because the unimutant collection would need to be complemented with double mutants for segmentally duplicated genes (like the GABI-DUPLO lines) and RNAi lines for members of very similar members of multigene families (like AGRIKOLA or amiRNA lines), the numbers listed above might have to be increased by 25% to screen a uni-multimutant collection. Exhaustive characterisation of such populations can only be achieved by the collaborations involving many laboratories, each covering a specific set of phenotypic screens according to their specific expertise. Such screens would in many cases be repetitions of already established phenotypic assays but, because of the large number of plants to be considered, automated, innovative and non-invasive screens will have to be developed. Such screens will combine the capture of 'classical' visible phenotypes (such as growth rate, plant and root architecture and flowering time), specialized non-invasive assays (such as the ones to measure photosynthesis) and 'chemical' phenotyping (such as the profiling of metabolites, proteins and transcripts). In addition, the response of plants to different environmental conditions will have to be assessed. Finally, data collection and processing has to be streamlined to allow efficient and reliable assignment of phenotypes to individual plants. This will have to include the capability of following and documenting the fate (and phenotype) of each individual plant from sowing to disposal, preferably in an automated way. New resources for the computational analysis of images and data sets are important. In addition, standardized storage of data that allows systematic interpretation of the phenotyping results is necessary. The use of ontology terms to describe the phenotypes -like the use of Gene Ontology to describe biological processes, molecular functions and cellular components -will have to be employed.
In spite of these many challenges, the prediction that genome-wide phenotypic data will soon become available for Arabidopsis does not appear to be all that far-fetched. Its realisation will no doubt stimulate a wealth of in-depth genefunction analyses.