Covariation of gene frequencies in a stepping-stone lattice of populations

For a one- or two-dimensional lattice of finite length consisting of populations, each of which has the same population size, the classical stepping-stone model has been used to approximate the patterns of variation at neutral loci in geographic regions. In the pioneering papers by Maruyama (1970a, 1970b, 1971) the changes of gene frequency at a locus subject to neutral mutation between two alleles, migration, and random genetic drift were modeled by a vector autoregression model. Maruyama was able to use the spectrum of the migration matrix, but to do this he had to introduce approximations in which there was either extra mutation in the terminal populations, or extra migration from the subterminal population into the terminal population. In this paper a similar vector autoregression model is used, but it proves possible to obtain the eigenvalues and eigenvectors of the migration matrix without those approximations. Approximate formulas for the variances and covariances of gene frequencies in different populations are obtained, and checked by numerical iteration of the exact covariances of the vector autoregression model.

Lattices of finite size are of greater practical interest as models of real populations. Malécot (1948Malécot ( , 1950 pioneered them, using a model of a torus or circle of a finite number of populations. Maruyama (1970aMaruyama ( , 1970bMaruyama ( , 1971 was the first to consider stepping stone models for ordinary one-and two-dimensional linear or rectangular lattices of finite numbers of populations. He gave expressions for the variances and covariances of gene frequencies for arbitrary pairs of populations when there was a two-allele neutral mutation model in which the population had reached its equilibrium distribution. To do so he used an approximate model which had some lack of realism in the treatment of migration into the terminal populations of the lattice. He considered different variations in the way these terminal populations were modeled. In the present paper I will treat a more exact model with a more realistic pattern of migration into the terminal populations. The results are similar, but not identical, to Maruyama's results. Patterson et al. (2006) have pointed out the importance of using the eigenvectors and eigenvalues of the variation in gene frequencies in principal components analysis of data from different populations, an approach which goes back to Menozzi et al. (1978) and is reviewed by Cavalli-Sforza and Feldman (2003). An important issue is the interpretation of any significant patterns of geographic differentiation that are found. They do not necessarily indicate historical events such as waves of migration. Novembré and Stephens (2008) have pointed out that the eigenvectors of a lattice model of migration are startlingly similar to principal components found for gene frequency patterns in geographic studies of genetic variation. When one of these principal components is seen, this makes it less obvious that it must arise from an historical invasion event, as the stepping stone models do not include historical invasions. They pointed out that migration matrices such as the ones Maruyama used fall into the class of Toeplitz matrices (Gray, 2006) for which the eigenvectors and eigenvalues can readily be computed. The more exact model which we use here has migration matrices that are not precisely Toeplitz matrices. It turns out that their eigenvalues have similarities with those of Maruyama's matrices, and their eigenvectors are readily found.
The immediate purpose of obtaining expressions for the covariances will be to approximate the joint distribution of gene frequencies in the populations by a multivariate normal distribution, for which the expectations and covariances are all that we need to determine the distribution. The expectations are easy to obtain; the covariances do still require one approximation but, when checked against an exact numerical solution of the equations, seem closer to the correct values than Maruyama's approximations are. The joint distribution of gene frequencies will not actually be normal, but for cases with small departures of gene frequencies from their expectations it will come close to being multivariate normal. Elsewhere I hope to discuss the development of an approximate maximum likelihood inference of the parameters of the model using this approximation. The formulas developed here may also be useful to others working with finite stepping-stone models who wish to have a closer approximation of the covariances of gene frequencies than has hitherto been available.

The model
The finite stepping-stone model is a linear lattice of n 1 populations (if in one dimension) or a rectangular lattics of n 1 × n 2 populations (if in two dimensions). Analogous models can be erected in higher numbers of dimensions. Each population has the same size, N individuals.
The model has discrete, nonoverlapping generations. In each generation an infinite number of offspring are produced in each population. From these a random N are chosen to survive to be the adults of the next generation. The population sizes thus remain constant.
Each individual among the offspring is diploid, and has two parents. Migration enters the picture in the locations of the parents. For most of the populations in a one-dimensional lattice, there is a probability 1 − m that a particular individual drawn to be a parent comes from the same population. In a one-dimensional lattice there is a probability 1 2 m that this parent comes from the population to the left, and a probability 1 2 m that it comes from the population to the right. The exception is when the offspring population is the first or the last in the lattice. Then there is a probability 1 2 m that it comes from the adjacent population, and the rest of the time, 1 − 1 2 m of the time, it comes from the same population.
If the lattice is a two-dimensional one, the same process is imagined to occur in both dimensions, completely independently. Thus for a population that is not on a boundary of the lattice, a parent for the (i, j) population may have come from that population, or from any of the 8 populations surrounding that one, as shown in Figure  These seemingly complicated patterns are really just the consequence of having 1 2 m come from each neighboring population in one dimension, with the two-dimensional pattern being independent movement in the two dimensions. Figure 1 shows the migration rates in the present model in one-and two-dimensional lattices. This is slightly different from the migration pattern usually used in two-dimensional stepping-stone models. In most previous work, in two dimensions the migration can only come from one of the two populations immediately adjacent in one dimension, or immediately adjacent in the other dimension. Migrants could only come from populations (i − 1, j), (i + 1, j), (1, j − 1), or (i, j + 1). The present scheme, in which migration occurs or not in either dimension, independently, leads to greater mathematical tractability. It is worth noting that such a scheme is also implicit in Maruyama's papers.
The model follows the gene frequency of one allele at a locus, in the presence of migration, mutation, and genetic drift. The model of mutation has two alleles with mutation back and forth between them. If the mutation rate from A to a is μ, and the mutation rate from a to A is ν, the equilibrium gene frequency is p̄ = ν/(μ + ν). Mutation in such a one-locus model acts as if it were a form of migration. If we set an additional rate of migration into each population of m ∞ = μ + ν, and have these immigrant copies of the gene be drawn from a pool in which the gene frequency of A is p, this will be indistinguishable from a model that has migration plus mutation between two alleles. In the present model, parents are chosen according to the migration model, with the two parents of an individual independently drawn. Each parent contributes an allele to the offspring. Mutation occurs (or does not) for each copy of the gene. Each population thus has a pool of newborn offspring, whose genetic composition is characterized by the gene frequency of the A allele (which I will call p). Among these offspring, the model makes the presence or absence of the A allele at each copy from that pool independent of the other copies, so that we do not need to concern ourselves with the diploid genotype frequencies in the pool.
Genetic drift occurs by sampling N diploid individuals from the offspring pool, without replacement. As the presence of the A allele in each copy in the diploid individuals is independent, this has the same effect as drawing 2N times from a pool which has gene frequency p.
We can take the gene frequencies in the local populations and arrange them in a column vector, whose length is the number of populations. For the two-dimensional case this involves taking the gene frequencies in the rows of the array of populations, forming each into a column vector, and stacking them on top of each other, so that the first two entries in the vector are the gene frequencies for populations (1, 1) and (1, 2). Let the populations now be numbered in the order in which they appear in this vector.
For both one-and two-dimensional cases we will see below that we can use the migration pattern to create a migration matrix M whose elements m ij are the probability that a copy of the gene found in the newborn offspring pool for population i came from a parent which was in population j, we can then write for population i the gene frequency in the next generation: where ε i is the change due to genetic drift. The random variable p i ′ is a binomial proportion after 2N trials with a probability of success equal to the sum of the first two terms on the right-hand side of this equation. The expectation of ε i is thus zero.
This set of equations can be put into matrix form as where the vector 1 is a column vector of 1's and p (t) is the vector of gene frequencies in the populations in the adult stage of generation t. This equation is true for any pattern of recurrent migration, not just for stepping-stone lattices.
Note that, except for the error not being multivariate normally distributed, these equations are essentially a vector autoregression (VAR) model, beloved of econometricians. In this case they are (nearly) a VAR(1) model, as each generation depends only on the preceding one. In VAR models, the objective is to infer the regression coefficients from a series of observations. Here most of the regression structure is known, it is also known that the errors ε i do not covary, and the objective is to predict the variances and covariances of the model at equilibrium.
This model is close to the models used by Maruyama (1970aMaruyama ( , 1970bMaruyama ( , 1971) but is not exactly the same. For the terminal population he considered two different models. One (his "absorbing boundary") had migration rate m/2 into the terminal population from the subterminal population, and also an extra inflow of m/2 into from a pool at the equilibrium frequency. This allowed a fraction 1 − m of the terminal population to be nonmigrants, which made all the diagonal elements of the migration matrix equal. The second (his "reflecting boundary") (Maruyama 1970a(Maruyama , 1970b had migration at rate m from the subterminal population into the terminal population, rather than m/2. Again the migration matrix had all its diagonal elements equal. We will not adopt either of these measures, hoping to be able to cope with a migration matrix that does not have all of its diagonal values equal. It is also worth noting that Maruyama's variable p is not our p, but is the deviation of the gene frequency from its equilibrium value.

Transformation to uncorrelated variables
In the cases we consider, the matrix M will be symmetric. For all cases of symmetric migration among populations (not just for stepping-stone lattices of populations) it will be possible to write it in terms of its eigenvalues and eigenvectors as where the matrix U of eigenvectors is orthonormal, so that its transpose U T is also its inverse. The diagonal elements of the matrix of eigenvalues Λ will be real numbers. Substituting this form of M into equation (2) we get where 1 is the column vector whose entries are all 1. This equation can be premultiplied by U T to get since U T is the inverse of U so that their product is the identity matrix and thus disappears from the matrix products.
We now consider a new vector of random variables and write the vector U T p1 as q, which is a constant vector and not a random variable. We notice that since the expectations of the random variables ε i (t) are zero, we can write the equation for x (t) as The vectors of random variables η (t) will have expectation zero.
The first eigenvalue λ 1 will be, for all the migration matrices we consider, 1, and its associated eigenvector will have all elements equal. The other eigenvectors will be orthogonal to this. It follows that q will have its first element nonzero, and all other elements zero.

Expectations
If we now take the expectations of the terms in equation (7), we can readily use equation (7), to show for each element of the vector x that in the long run that the process will approach a stationary distribution. In that distribution, the expectations the elements of x i (t) are each equal to the expectations of the corresponding elements of q. This is asymptotically true for t very large, when the effect of the initial frequencies of the p i (0) have been lost. Going back to the elements of the vectors p (t) in this limiting distribution, these all have equal expectations, Going back to the elements of the vectors p (t) in this limiting distribution, these all have equal expectations, which are simply p, the mutational equilibrium gene frequency. The equation for the expectations of the x (t) in this limiting distribution, which will be called x̄( t) , is then We can subtract this equation termwise from equation (7) and obtain an equation for the deviation of the x (t) from their expectations. If we call these deviations y (t) , this is then It is easy to see that the expectations of the y (t) are all zero. For this reason, the equation for the y (t) is simpler than the equation for the x (t) , and we now use it to investigate the covariances.

Approximating the stochastic process
If we write, from equation (9), the expression for one of the elements of the vector y (t + 1) , it is If the elements of η (t) were all independent of each other, it would be easy to show that the y i (t) would be independent random variables. Actually, the η i (t) are not, strictly speaking, independent. The vector η (t) is a linear transformation of a set of independent binomial variates (each element of ε (t) being the difference between a binomial frequency and its expectation). However the nonindependence is subtle and, for our purposes, unimportant. The objective of this paper is to derive the covariances of the gene frequencies, for use in a statistical inference when approximating the distribution of the gene frequencies as multivariate normal. It will be sufficient to be able to show that the η i (t) are approximately uncorrelated.
To do this another approximation will also be made. The ε i (t) have variances that arise in the binomial sampling of the gene frequencies in going from the infinite number of newborn At this point we do not know the values of the quantities σ i 2 . They depend on the very thing we want to compute, the variances and covariances of the gene frequencies p i . There are likely to be differences between the variances σ i 2 , with terminal populations having a somewhat higher variance of gene frequency than interior populations.
In the limiting case when migration rates are large, the gene frequencies in all populations will be very similar. In that limit the variances σ i 2 will be equal. The approximation made here will be to assume that we are near this limit, and that for the purposes of further calculation we can assume that all of the σ i 2 are equal to the With that approximation, we can show that the random variables η i (t) are not correlated. We have seen that they are the elements of the vector U T ε (t) . We know that the ε i (t) have zero covariances, as they are independent random variables. If we use the approximation that the ε i (t) have equal variances σ 2 , then we can write the covariance matrix of the η i (t) as As this is diagonal, the covariances of different elements of η are thus, to close approximation, zero.

Covariances of gene frequencies
With the covariances of the η i (t) now determined to (approximately) be zero, the random variables for the different i in equation (9) are now uncorrelated. They all have expectation zero. Its variance is easily determined by noting that the sum on the right-hand side is of two uncorrelated variables, whose covariance is therefore zero, so that: Once this random process has reached stationarity, the variance of y i in all generations is equal and, with our approximation of σ i 2 by σ 2 , we can then solve for the variance of y i as . (14) The remaining covariances are all zero.
The covariances of the x i are the same as those of the y i , since these vectors differ by a constant vector. To obtain the covariances of the gene frequencies p i , we use the fact that the gene frequencies are a linear transformation of the y i . From equation (6), The covariances of p will then be From equation (14) we can now write the covariances of the p i as Theor Popul Biol. Author manuscript; available in PMC 2016 June 23.
Since the diagonal matrix in this expression can also be written in terms of the square of the diagonal matrix of eigenvalues, it turns out that this covariance matrix can also be written in terms of the inverse of a simple function of the migration matrix: Of course, the diagonal elements of the covariance matrices are all assumed to be equal to σ 2 . Although the above expressions compute σ 2 in terms of itself, we will find a way to untangle that.
We now know that the expectations of the p i are all p, and for the (approximated) process we have the covariances in terms of the eigenvalues and eigenvectors of M. Now all we need is to find those. The derivation since (1) has been general for any system of recurring migration whose migration rates among populations are symmetric, provided that the migration is strong enough to allow us to assume that the variances σ i 2 are nearly equal. Now we need to use the particular pattern of migration on a finite lattice to obtain the eigenvalues and eigenvectors of the migration matrix.

The spectrum of the migration matrix
The matrix in one dimension is For the one-dimensional case we will use n in place of n 1 for the number of populations, to reduce typographical stress.
Feller (1957, section XVI.3, pp. 389-390) treats a random walk with two reflecting boundaries, whose transition matrix is the above matrix for the case when m = 1. He uses a partial fractions method; as he notes, this is equivalent to a spectral decomposition of the matrix. His quantity s r is the reciprocal of the eigenvalue λ r + 1 . Thus the eigenvalues for m = 1 are and if for m = 1 we call the matrix R, we have for general m M = (1 − m)I + mR (23) so that the eigenvalues of M are the corresponding linear combinations which turn out to be The eigenvectors of M will be the same as those of R. In the Appendix it is shown that if these are scaled so as to be orthonormal the right eigenvectors can be written as where the quantity Δ i is 1 if i = 1 and 2 otherwise. Maruyama found a similar quantity necessary in the expressions for his eigenvectors.
By comparison, Maruyama's eigenvalues were His eigenvectors could also be written in terms of cosines, but were different from the expressions for our matrix.
That there is a connection to Maruyama's matrix is made clearer if we consider a "shift matrix" S which has the diagonal above the main diagonal filled with 1s, and the rest of the elements zero:

Solving for the covariances
Using equation (19) together with equations (26) and (25) we can now write the covariance of populations in the one-dimensional case as For the two-dimensional case we note that the migration matrix M is the result of independent migration in the two dimensions; it turns out that where the superscripts indicate the two dimensions and ⊗ is the Kronecker product of matrices. It is well-known for Kronecker products that if Λ (1) and Λ (2) are the diagonal matrices of the eigenvalues of (respectively) M (1) and M (2) , then the diagonal matrix of eigenvalues of M is the Kronecker product Λ (1) ⊗ Λ (2) , which is the diagonal matrix whose diagonal elements are all n 1 × n 2 products of one of the eigenvalues of M (1) and one of the eigenvalues of M (2) . Similarly, the n 1 n 2 × n 1 n 2 matrix U of eigenvectors of M is the Kronecker product of the n 1 × n 1 matrix U (1) of eigenvectors of M (1) and the n 2 × n 2 matrix U (2) of eigenvectors of M (2) . If for the two-dimensional case we denote the gene frequency of the population at position (i, j) as p ij , the result is that Higher numbers of dimensions can be accommodated in an exactly analogous way. The generalization of the two-dimensional case to having different migration rates m 1 and m 2 in the two dimensions is straightforward.

Solving for σ 2
There is still the vexing matter of the variance σ 2 . One method of inferring it is to start with σ 2 = 0 and use equations (29)  However if in each iteration we instead make a weighted average of two quantities, one the mean of the new σ i 2 and the other the previous value of σ 2 , with their weights 4Nm and 1, this seems always to converge rapidly. Let us call this the F1 approximation.
An alternative approach that seems reasonable is, for a given pair of populations, i and j, to This will be referred to as the F2 approximation.
We will see that with 4Nm moderately large, it will make little difference which of these two methods of inferring σ 2 we use. The F2 method is more tedious computationally, requiring as it does multiple iterative estimations of σ 2 .

Alternative equations
The covariances between populations can be arranged in a square matrix Using equation (2) we can show that at equilibrium C will satisfy where Q is the covariance matrix of ε, which will be diagonal. This is a linear equation in the c ij , though a big one: for example, for a 30 × 30 lattice the matrix C will be 900 × 900.
This equation is general to any pattern of recurring migration.
It is worth noting that the linear equations in the c ij can be rewritten in a more conventional form if we convert the covariance matrices C and Q into vectors by stacking their columns into stacks s(C) and s(Q) so that This is a very large set of equations -for a 30 × 30 lattice of populations there will be 810,000 equations in as many unknowns. It is unlikely to be a practical numerical way of solving for the c ij , but use of a transformation analogous to equation (6) allows us to solve for the covariances of p in a way exactly analogous to our derivation above. The approximation of the σ i 2 is the same in this set of equations as in the previous forms.

Longer-range migration
The migration matrix allows only a single step of migration in each dimension. One straightforward way to model multiple- This is true for any matrix R of one-step movements. The eigenvectors of a matrix polynomial like this are well-known to be the eigenvectors of R, which are the eigenvectors of the one-step migration matrix, given in equation (25). The eigenvalues of M in the case of a lattice can be written in terms of the eigenvalues in equation (26) -the ith one is If we replace λ i in this expression with the ith eigenvalue of the migration matrix that has m = 1, we get from equation (26) for the ith eigenvalue of the multistep migration matrix For the gene frequency covariances, this can be substituted into equations (29) or (31) in place of the eigenvalue expressions there. The eigenvalues in equation (26) and those in equation (43) become asymptotically the same as m becomes small, as one would hope they would.
For two dimensions, tractability requires that the movement in the two dimensions be an independent random walk in each dimension. The number of steps in the x direction is a random Poisson variable, and so is the number of steps in the y direction, and these are independent. This will also be true if the total number of steps is a Poisson variate, and each step is independently chosen to be in the x direction or in the y direction, by independent Bernoulli variables (coin tosses). The eigenvalues are then all possible products of two quantities, each computed by equation (43).

Diffusion limit
It is well-known that if we consider a series of cases with increasing population sizes N, and decreasing values of the strengths of the deterministic evolutionary forces (here m ∞ and m), such that the products Nm ∞ and Nm are constant, and if we also observe the gene frequencies on a time scale in which one unit of time is N generations, that the stochastic process of gene frequency change approaches a diffusion process. This is of considerable interest, as the approximation is usually very close.
Taking these limits, terms in m ∞ 2 and m 2 , and all of their higher powers drop out of the equations. The result for the covariances is that equation (29) In the case of multiple-step migration, as we take N larger, m becomes small, and the occurrence of more than one step of migration becomes vanishingly rare, so it is easy to show that the diffusion limit is this same expression. The analogous expressions for the twodimensional lattice are easily obtained and involve two terms in 4Nm. Fleming and Su (1974) have derived another approximation to the finite stepping-stone models, by approximating the space as continuous. We will not give their formulas here, but below we will compare them to our approximations. I have showed elsewhere (Felsenstein, 1975) that models of finite populations of organisms migrating in continuous geographical spaces encounter some difficulties in maintaining a Poisson random field distribution.

Fleming and Su's approximation
Fleming and Su's equations cannot be seen as exact for a organisms randomly distributed in a continuous geographical space. We must instead regard them as an approximation to the stepping-stone model.
To do this I have taken their interval (−ℓ, ℓ) and divided it into n equal intervals, with the n populations of the stepping-stone model being regarded as located at the midpoints of those intervals. This is to some extent an arbitrary choice.

Numerical comparisons
The approximation involved in using a common σ 2 for all of the variances is expected to be a good one if 4Nm ≫ 1, but it is worth doing numerical checks. Table 1 shows numerical comparisons. The exact covariances for the model are calculated using equation (33), with the variances in the diagonal matrix Q obtained from equation (11). This equation was iterated many times until it converged.
These values, exact under our model, were compared with the approximations in equation (29). They were also compared with Maruyama's approximations in his 1970a paper, Table  4, and also with with the continuous approximation of Fleming and Su (1974). Table 1 shows the results for n = 10, m ∞ = 0.001, m = 0.1, N = 25, and p̄ = 0.2, the case that Maruyama considered. The exact value is denoted by E, our approximations by F1 and F2, Maruyama's approximation by M, and Fleming and Su's approximation by FS. Owing to the symmetry of the case, populations 6 through 10 have the same variances as populations 5 through 1 so they are not shown.

Numerical comparisons
The approximation involved in using a common σ 2 for all of the variances is expected to be a good one if 4Nm ≫ 1, but it is worth doing numerical checks. Table 1 shows numerical comparisons. The exact covariances for the model are calculated using equation (33), with the variances in the diagonal matrix Q obtained from equation (11). This equation was iterated many times until it converged.
These values, exact under our model, were compared with the approximations in equation (29). They were also compared with Maruyama's approximations in his 1970a paper, Table  4, and also with with the continuous approximation of Fleming and Su (1974). Table 1 shows the results for n = 10, m ∞ = 0.001, m = 0.1, N = 25, and p̄ = 0.2, the case that While generally similar, then two approximations differ noticeably, especially for the terminal population. In Maruyama's approximation, there is an extra injection of mutation into the terminal populations. This damps its departure from the mutational equilibrium which is caused by genetic drift. Both approximations are considerably lower than the true variance, the expected variance in the F1 approximation developed in this paper being 18% low in the terminal population and 26% low in the central population. The F2 approximation is slightly worse in the populations near the ends, and slightly better in the central region, but perhaps not enough to make it worthwhile in view of its higher computational burden. The Fleming-Su approximation is worse. It does agree with the other values by showing higher variance in the terminal populations. Table 1 has 4Nm = 10 and 4Nm ∞ = 0.1. When 4Nm = 100, as in Table 2, the F1 approximation is much better, being 0.56% high in the terminal populations, and 0.3% low in the center populations. The F2 approximation is again lower than the F1 approximation in the end region and higher than it in the central region; in most populations it is farther from the exact value than is the F1 approximation.
In longer one-dimensional stepping stone models, as in Table 3, we can see that the exact variances in the terminal populations are higher than in the center, but that this rapidly declines and becomes relatively constant as we move toward the center of the lattice. The F1 approximation tracks this quite closely, being about 3.6% too high in the terminal populations and 0.5% too low in the center. The F2 approximation is in general closer to the exact value, and through most of the central region is nearly exact. The Fleming-Su approximation does even less well than before. We can also compare the correlations between the gene frequencies of pairs of populations. Table 4 shows these for the same case as Table 1 above, where N = 25. For this case Maruyama also calculated the correlations from his reflecting boundary approximation, in Table 4 of his 1970b paper. Both of our approximations are very close, and his quite a bit farther away. The (1,9) element of this table has the greatest departure from the exact value for both our and Maruyama's approximation, and his is 8.24% high while ours is 0.67% low.
(There is a typographical error in the Theoretical number in the (9,10) element of his table). The F2 approximation is slightly better than the F1 approximation for pairs of populations near the ends of the region, and slightly better for all other pairs. Table 5 shows the correlations for the case when N = 250. It can be shown that the F1 approximation will not depend on N, so our predictions are the same as in Table 4. In fact, the expressions for predicted correlations using our approximation also do not depend on p̄ or on m ∞ either -they depend only on the geometry of the populations and on m. Compared to these predictions, the exact values are a bit more different in Table 5, being up to 5% low (in the case of the (10,1) element), but more typically being about 1% low. The F2 approximations do depend on N, but for the correlations they are so close to the values of the F1 approximation that there seems little point going to the extra effort of computing them.

Usefulness
The present solution is approximate, but less so that previous efforts. It is intended for use in a likelihood-based (or Bayesian-based) inference of migration and/or mutation parameters, scaled as a fraction of the local effective population size. However the approximation is specific to rectangular lattices which have reached equilibrium between mutation, migration, and genetic drift. Both aspects, the rectangularity and the equilibrium, may be questioned for natural populations, the equilibrium being a particularly severe challenge for human populations. In continents such as Europe, we need assurance that there has been enough time since historical population movements such as those associated with the spread of agriculture. Given perhaps only 7000 years since the establishment of agriculture there, that is a stringent requirement.
The availability of approximations based on processes in a continuum of finite length, instead of the stepping stone lattice, presents another challenge. We have compared our formulas for the variances with Fleming and Su's (1974) approximation, and for these cases the continuous approximation was a worse approximation. Barton and Wilson (1995) have developed approximations for coalescent times for an approximate model of a continuum of finite length. The papers by Barton et al. (2002Barton et al. ( , 2010) make similar approximations that allow for extinction and recolonization, which have not been discussed here. These papers use, respectively, a model of a two-dimensional infinite continuum, and a model of a finite torus. Wilkins and Wakeley (2002) and Wilkins (2004) use a different approximate model to develop other approximations to coalescence times in a one-or two-dimensional finite continuum. Neither of these approximations has been developed into an approximation for variances and covariances of local gene frequencies, although it does not seem difficult to do so. Barton and Wilson (1995) approximate these covariances by using Malécot's (1951) formulae for the one-and two-dimensional infinite continuum approximation.
If the stepping-stone model is regarded as the truth, and these continuum models are intended as approximations to it, the present formulas may help evaluate the accuracy of the continuum approximations. But to the extent that both the stepping-stone models and the continuum models are regarded as approximations to the more subtle structure of real populations, we would have to model those, perhaps in individual-based simulations, to evaluate whether either approach is viable.
we find that the elements of the eigenvector can be written as The sine factor does not depend on j, so it is a scaling of the ith eigenvector. These eigenvectors each need to be rescaled so that the eigenvectors are orthonormal. If they are written as where Δ i , is 1 for i = 1 and 2 otherwise, it can be shown that this is the appropriate scaling. Diagram of the migration pattern in a one-and two-dimensional stepping stone model. In each case the recipient population is shown together will all populations contributing migrants to it, with the fraction of the recipient population coming from each of these populations shown. This is shown for an interior population and a terminal population for the one-dimensial stepping stone model, and for an interior population, a side population, and a corner population for a two-dimensional stepping stone model. Comparison of exact solution (E) for the variances of population gene frequencies in a 10-stone stepping-stone model for n = 10, m ∞ = 0.001, m = 0.1, N = 25, and p̄ = 0.2 with the approximations of this paper (F1 and F2), Maruyama's reflecting-boundary approximation (M), and Fleming and Su's (1974) approximation. Approximation F1 uses the value of σ 2 averaged over all populations. Approximation F2 uses a value of σ 2 calculated for that population.
As the migration pattern is symmetrical, only the values for populations 1-5 are shown.  Comparison of exact solution (E) for the variances of population gene frequencies in a 10-stone stepping-stone model for n = 10, m ∞ = 0.001, m = 0.1, N = 250, and p̄ = 0.2 with the approximation of this paper (F).
Approximation F1 uses the value of σ 2 averaged over all populations. Approximation F2 uses a value of σ 2 calculated for that population. FS is the Fleming-Su approximation. As the migration pattern is symmetric, only the values for populations 1-5 are shown.  Table 3 Comparison of exact solution (E) for the variances of population gene frequencies in a 100-stone steppingstone model for n = 100, m ∞ = 0.001, m = 0.1, N = 250, and p̄ = 0.2 with the approximations of this paper (F1 and F2). Approximation F1 uses the value of σ 2 averaged over all populations. Approximation F2 uses a value of σ 2 calculated for that population. As the migration pattern is symmetric, only values for the first half of the lattice are shown.  Exact solution and two approximations for the correlations of population gene frequencies for n = 10, m ∞ = 0.001, m = 0.1, N = 25, and p̄ = 0.2. Within each nondiagonal cell, the uppermost value is the exact solution, the next two values our F1 and F2 approximations, and the lower value is Maruyama's reflecting-boundary approximation, from Table 4 of his 1970b paper.  Table 5 Comparison of exact solution (E) for the correlations of population gene frequencies in a 10-stone stepping-stone model for n = 10, m ∞ = 0.001, m = 0.1, N = 250, and p̄ = 0.2 with the approximation of this paper. The 10 × 10 lower triangle contains three numbers, the uppermost one being the exact correlation and the lower two our approximations F1 and F2. Approximation F1 is calculated with one value of σ 2 for all populations; approximation F2 is calculated using variances from approximations of σ 2 local to the two populations involved, and an average of these values is used to compute the covariance between the two populations.