Local and relaxed clocks, the best of both worlds

8 Time-resolved phylogenetic methods use information about the time of sample collection to estimate the rate of evolution. Originally, the models used to estimate evolutionary rates were quite simple, assuming that all lineages evolve at the same rate, an assumption commonly known as the molecular clock. Richer and more complex models have since been introduced to capture the phenomenon of substitution rate variation among lineages. Two well known model extensions are the local clock, wherein all lineages in a clade share a common substitution rate, and the uncorrelated relaxed clock, wherein the substitution rate on each lineage is independent from other lineages while being constrained to fit some parametric distribution. We introduce a further model extension, called the flexible local clock (FLC), which provides a flexible framework to combine relaxed clock models with local clock models. We evaluate the flexible local clock on simulated and real datasets and show that it provides substantially improved fit to an influenza dataset. An implementation of the model is available for download from https://www.github.com/4ment/flc. 9


21
Phylogenetic methods provide a powerful framework for reconstructing the evolutionary history of viruses, 22 bacteria, and other organisms. Correctly estimating the rate at which mutations accumulate in a lineage is 23 essential for phylogenetic analysis, as the accuracy of inferred rates can heavily impact other aspects of 24 the analysis. Classic approaches to infer the substitution rate of a group of organisms rely on the existence 25 of a so-called "molecular clock". The molecular clock hypothesis dictates that mutations accumulate 26 at an approximately steady rate over time, implying that the genetic distance between two organisms 27 is proportional to the time since these organisms last shared a common ancestor. The molecular clock 28 hypothesis was first proposed almost 50 years ago by Emile Zuckerkandl and Linus Pauling (Zuckerkandl 29 and Pauling, 1965) who suggested that the substitution rate was effectively constant over time. This very 30 restricted model of evolution has been implemented using a "strict clock" model in phylogenetic inference 31 software, but the rates of evolution in many organisms appears to change over time and the model can not 32 capture this phenomenon.

33
In recent years, richer models have been developed to capture the complexity of the evolutionary 34 process. Sanderson (2002) and Thorne et al. (1998) proposed to model rate heterogeneity among lineages 35 using auto-correlated clock models using penalized likelihood and Bayesian inference respectively. In 36 these parameter rich models, the rate of each lineage is assumed to be correlated to that of the parent 37 lineage. The auto-correlation assumption could be justified by considering that the substitution rate is 38 influenced by heritable mechanisms such as metabolic rate or generation time. However there is no 39 guarantee that rates evolve in an auto-correlated manner, especially when the timescale under study is 40 relatively small Drummond et al. (2006). An alternative approach is to assume that substitution rates 41 on adjacent branches are independent draws from an underlying parametric distribution. Drummond commonly used as they are available in the widely used BEAST package (Drummond and Rambaut, 2007; clock models due their ability to relax the constant rate assumption. Local clock models are an alternative to relaxed clocks, where the model assumes that the substitution 48 rate is constant within a clade but can differ between clades (Yoder and Yang, 2000;Yang and Yoder, 49 2003). The number and locations of these local clocks can be inferred from the data using the random 50 local clock model (Drummond and Suchard, 2010) or local clocks can be assigned by the user based on 51 prior information.

52
In this manuscript we introduce a hybrid model that integrates features of both the local and the 53 relaxed clock models. In the model each local clock can be specified either as a strict clock (as in the  Herein, we propose to relax the constraint that lineages within a local clock evolve at exactly the same 77 rate by replacing this implicit strict clock by a relaxed clock. 78 We applied the FLC model to two data sets of heterochronous viral nucleotide sequences. The first data  As in the original study (Drummond and Suchard, 2010), every model shows a substitution rate 134 increase in sequences sampled after the 1990 (Figure 2).

135
The marginal likelihood estimates ( Monte Carlo error and caution should be exercised in order to avoid overinterpreting small differences.

139
The 95% highest posterior density (HPD) of the standard deviation of the lognormal distribution assigned  For example, the assignment of a relaxed clock with a two parameter distribution to a single branch 162 would over-parametrize the model. An interesting direction for further research would be to develop an 163 algorithm that automatically selects the clock type for each local clock.

164
The FLC model is implemented in the BEAST 2 package as a plugin and is available from