Genetic structure of Miscanthus sinensis and Miscanthus sacchariflorus in Japan indicates a gradient of bidirectional but asymmetric introgression

Summary Using high-density genetic markers, gene flow is identified from diploid Miscanthus sinensis to tetraploid M. sacchariflorus in Japan, in contrast to genetic isolation between these species in China.

run for one iteration of K = 1 through 10. The burnin consisted of 10,000 MCMC reps, followed by 50,000 MCMC reps after the burnin. Other parameters were left at defaults. Data were processed in a diploid format; Wang et al. (2013) demonstrated that introgression in polyploids can be quantified using RAD-seq data without knowledge of allele copy number.
To determine the origins of ornamental and US naturalized collections, a second set of Structure runs was performed that included the above individuals, plus 81 ornamental cultivars and 42 naturalized individuals from the USA. For this set, USEPOPINFO and PFROMPOPFLAGONLY were set to TRUE, MIGPRIOR was set to 0.05 (to reflect the large amount of admixture between populations) and GENSBACK was set to 4 (with the expectation that RAD-seq SNPs would be sensitive enough to detect BC 3 individuals). The POPFLAG column was set to 1 for native individuals and 0 for ornamental and naturalized individuals (Fig.  1B). Eight native populations were delimited in the POPINFO column using population assignments made with Discriminant Analysis of Principal Components (DAPC;Jombart et al., 2010), which were similar to the population assignments made with Structure at K=8 (Fig. 1A).
The validity of our Structure results for detecting introgression between species was confirmed using two independent methods: principal component analysis (PCA) of the same 1513 individuals that were evaluated by Structure, and Structure analysis of 1100 simulated hybrid and backcross individuals and 100 simulated individuals representing the common ancestor of M. sinensis and M. sacchariflorus. For PCA, we filtered our SNP dataset, only retaining 3490 SNPs with less than 5% missing data; filtering prevented individuals with high missing data rates from appearing as hybrid individuals due to missing data imputation. The glPca function from adegenet (Jombart and Ahmed, 2011) with scale=TRUE was then used to perform PCA. A linear model, with the value on the first principal component axis as the dependent variable, and the Structure Q value representing proportion M. sacchariflorus ancestry as the independent value, was fitted using R 3.1.1 (Supplementary Figure 2A).
In order to simulate hybrid and ancestral individuals for the second method to validate Structure results, "pure" individuals were first selected from the dataset, with the criteria that they had to have a Structure Q value greater than 0.95 for the M. sacchariflorus cluster (36 individuals from Japan and nine from China) or the sum of the three Japanese M. sinensis clusters (752 individuals from Japan). The SNP set was reduced to the 19,124 loci that did not have missing data in all pure M. sacchariflorus individuals. Missing data were imputed as the median genotype within the two sets of pure individuals, which were then sampled randomly to be used as parents to generate simulated hybrids, including 100 individuals each of F1 and BC1 through BC5 (1100 individuals total), with separate backcross classes depending on which species was the recurrent parent. In order to simulate individuals from an ancestral population from which M. sinensis and M. sacchariflorus diverged, the major (most common) allele was identified at each locus for the 45 pure M. sacchariflorus and 1105 pure M. sinensis (including 752 from Japan and 353 from mainland Asia). At the 16,096 loci at which M. sacchariflorus and M. sinensis had the same major allele, all 100 simulated ancestral individuals were homozygous for the major allele. At the 3028 loci at which M. sinensis and M. sacchariflorus had different major alleles, genotypes were simulated assuming that the two alleles were at equal frequency in the ancestral population. All simulated individuals had missing data inserted randomly at a rate of 22% to match the missing data rate in the "pure" populations. An input file was then generated for Structure, including genotypes of 1150 pure M. sinensis and M. sacchariflorus and 1200 simulated individuals at 19,124 loci. Five structure runs were performed at K=8 using a burnin of 10,000 MCMC reps followed by 50,000 MCMC reps after burnin, under default conditions as used in the analysis of native Miscanthus. Results are presented in Supplementary Figure 2B and Table S1.

Spatial Principal Components Analysis
Spatial principal components analysis (sPCA), implemented in the R package adegenet (Jombart et al., 2008) was used to identify spatial patterns in genetic variation of M. sinensis across the major islands of Japan using RAD-seq SNPs. This technique transforms molecular marker data into eigenvectors similarly to principal component analysis, but maximizes the product of spatial autocorrelation and variance of each eigenvector rather than maximizing the variance alone. Latitude and longitude of each collection site were rounded to two decimal digits to lump individuals into 205 collection sites, which included 198 M. sinensis accessions (654 individuals, excluding two accessions with no collection site data, two accessions that were identified in Structure analysis as having non-Japanese origin, and one individual with >20% ancestry from M. sacchariflorus according to Structure) from the Japan dense-sampling set and 128 individuals of Japanese origin from the region-wide set. Allele frequencies were estimated for each collection site. Out of the 20,704 RAD-seq SNPs, 19,148 were variable within this dataset. All SNPs that were variable in M. sinensis with missing data at four or fewer collection sites were retained for sPCA analysis (5,359 SNPs). A Gabriel graph was generated to represent geographic connections between collection sites. The number of eigenvectors to interpret was chosen based upon examination of a screeplot of eigenvalues (Fig. S3). The lag vector of each interpreted eigenvector was plotted against latitude and longitude using the s.image function from the R package ade4 (Chessel et al., 2004) with the argument span=0.4 to control the degree of smoothing.

Flow cytometry
Approximately 1 cm 2 of leaf tissue from Miscanthus samples was co-chopped with Sorghum bicolor 'Nr481' (IPK Gatersleben, Germany) as an internal standard for 3-5 min in extraction buffer on ice. Chopped samples were mixed with 10 ml extraction buffer and filtered through a 50 µm mesh, then centrifuged at 2100 rpm for 20 min. Nuclei were then stained with propidium iodide as previously described (Rayburn et al., 2009) and run on an LSR II Flow Cytometry Analyzer (BD Biosciences, San Jose, California, USA) at the Roy J. Carver Biotechnology Center at the University of Illinois. DNA content was determined based on median G1 peak area, assuming 1.74 pg/2C for sorghum.