Plasmodium falciparum antigenic variation. Mapping mosaic var gene sequences onto a network of shared, highly polymorphic sequence blocks

Plasmodium falciparum erythrocyte membrane protein 1 (PfEMP1) is a potentially important family of immune targets, encoded by an extremely diverse gene family called var. Understanding of the genetic organization of var genes is hampered by sequence mosaicism that results from a long history of non-homologous recombination. Here we have used software designed to analyse social networks to visualize the relationships between large collections of short var sequences tags sampled from clinical parasite isolates. In this approach, two sequences are connected if they share one or more highly polymorphic sequence blocks. The results show that the majority of analysed sequences including several var-like sequences from the chimpanzee parasite Plasmodium reichenowi can be either directly or indirectly linked together in a single unbroken network. However, the network is highly structured and contains putative subgroups of recombining sequences. The major subgroup contains the previously described group A var genes, previously proposed to be genetically distinct. Another subgroup contains sequences found to be associated with rosetting, a parasite virulence phenotype. The mosaic structure of the sequences and their division into subgroups may reflect the conflicting problems of maximizing antigenic diversity and minimizing epitope sharing between variants while maintaining their host cell binding functions.

. Matching of var sequences from the 3D7, HB3 and IT4 genomes with 14aa PSPBs from block-sharing groups 1-7 defined within the 14aa PSPB network ( Figure 4C). Gene classifications are from {Kraemer, 2007 #1289}. Block-sharing groups 5 and 6 are combined because one of the sequences from HB3 carried PSPBs from both these blocksharing groups. Note the tendency of this combined block-sharing group to carry PSPBs that match central var genes (group BC or C).

Figure S4. Multiple alignment of sequences associated with parasite rosetting.
Sequences were selected for alignment if they were expressed in the seven rosetting parasite isolates (see Figure 6 B and D) and fell in sub-groups that showed evidence for an association with parasite rosetting. The sub-groups were cys/PoLV group 2 sequences in block-sharing group 1 ("1/2"), cys/PoLV group 6 sequences in block-sharing group 1 ("1/6") and cys/PoLV group 2 sequences in block-sharing group 2 ("2/2").

Figure S5. Comparisons of var gene expression in different clinical parasite isolates.
Expression levels of each gene were assessed by sequencing multiple clones from a library of RT-PCR amplified DBLα sequences from RNA (see Table S1). The percentage representation of each sequence was determined within each isolate. The mean percentage was then determined for all 21 parasite isolates. The area of each node is proportional to this mean percentage representation. Colony picking from clones generated from cDNA (A and C) were compared with sequences obtained by amplifying genomic DNA (B and D). Each graph is coloured either using block-sharing groups (A and B) or cys/PoLV groups (C and D). Normark et al. 2007. A recent study in Uganda {Normark, 2007 #1422} suggests that it may be possible to search for functional motifs directly using a specifically designed motif detection algorithm. Three sets of related motifs, H1, H2 and H3 were identified that showed an association with rosetting. A) The figure indicates the positions of sequences within the Kilifi network that contained the identified motifs at any position. The degenerate motifs comprise of the following specific motifs used in our search H1: RYSANI, FSKNI; H2: TCAAKV, TCDATM, TCGATM, TCGATV, TCKAEV ; H3: DDKVQK, DKVEKG, EDKVQK, HDAVEK, KDAVQK, KDAVQN, KDDVEK, KDEVKE, NDEVWK. The mapping of the sequences on the network suggest that these motifs are associated with non-overlapping groups of sequences that have some correspondence to those identified in Kilifi. B) indicates the positions of sequences that, as groups, exhibited a significant correlation between expression levels and parasite rosetting frequency in our study (see text) "1/2"= block-sharing group 1, cys/PoLV group 2; "2/2"= block-sharing group 2, cys/PoLV group 2 ; "1/6"= block-sharing group 1, cys/PoLV group 6. C and D) compares the expression levels of sequences parasites exhibiting low (C) and high (D) levels of rosetting, as described in Figure 6 but coloured according to the Nomark groups. Barry et al. {Barry, 2007 have recently reported evidence for geographical structuring of var sequences in Papua New Guinea (PNG) using a collection of sequences sampled from 30 parasites in 1999. These sequences are broadly distributed over the world network. This may be because the intra-genomic diversity of var sequences sampled within each genome is maintained throughout the world. One approach to exploring geographical structuring using the network approach is to explore the composition of block-sharing groups. B) shows an example. This figure shows the 7 block-sharing groups containing 20 or more sequences that were generated using a PSPB length of 20 amino acids. C) Considering only the sequences sampled in these 7 groups, the percentage of sequences falling in each block-sharing group is expressed as a percentage of the total number of sequences sampled from each continent. There appears to be an excess of Asian sequences in block-sharing groups 2 and 4 and a deficiency in block-sharing group 1, and an excess of sequences from PNG in block-sharing group 3 (all Fishers exact, 2-sided P<0.001). Note that the location of the 7 main block sharing groups is different to the main groups found in the Kilifi network using 20 aa PSPB ( Figure 4I) which tended to be located within or next to the small lobe. The difference may be attributed to the geographic heterogeneity of the sequences in the world network, their bias towards sequences amplified from genomic DNA and the absence of group A reference sequences.

Figure S7. Exploration of geographical structuring of var sequences. A)
Folder S1.zip. Zipped file containing three dimensional networks. See materials and methods for more details.
Folder S2.zip. Perl script for classifying sequences by cys/PoLV group and identifying block sharing group 1 and 2-like sequences (i.e. those that share 14 amino acid PSPBs with block sharing groups 1 and 2). This script is a modified version of one we described previously and also classifies DBLα tag sequences into cys/PoLV groups (see Bull et al. 2007, Mol. Biochem. Parasitol (54) 98-102 for details) Folder S3.zip. Pajek project file for the Kilifi network. This contains data on the sequences in Dataset S1 Table S1. Clinical parasite isolates used in the study. Table S2. Sequences used in the world network. Table S3. Sharing of 14aa PSPB in the world network.