A comprehensive climate history of the last 800 thousand years

8 A detailed and accurate reconstruction of past climate is essential in understanding the drivers 9 that have shaped species, including our own, and their habitats. However, spatially-detailed climate 10 reconstructions that continuously cover the Quaternary do not yet exist, mainly because no paleocli11 mate model can reconstruct regional-scale dynamics over geological time scales. Here we develop a 12 new approach, the Global Climate Model Emulator (GCMET), which reconstructs the climate of the 13 last 800 thousand years with unprecedented spatial detail. GCMET captures the temporal dynamics 14 of glacial-interglacial climates as an Earth System Model of Intermediate Complexity would whilst 15 resolving the local dynamics with the accuracy of a Global Climate Model. It provides a new, unique 16 resource to explore the climate of the Quaternary, which we use to investigate the long-term stability 17 of major habitat types. We identify a number of stable pockets of habitat that have remained un18 changed over the last 800 thousand years, acting as potential long-term evolutionary refugia. Thus, 19 the highly detailed, comprehensive overview of climatic changes through time delivered by GCMET 20 provides the needed resolution to quantify the role of long term habitat fragmentation in an ecological 21 and anthropological context. 22 Current patterns of diversification within and between species, such as our own [1], and the struc23 turing of whole ecosystems can only be studied in the context of past climatic changes that have shaped 24 them through time [2]. A detailed understanding of such processes has become an urgent necessity in 25 order to predict responses to global change. However, whilst predictions of climate change and their 26 impacts over the next few tens or hundreds of years are based on comprehensive Global Climate Models 27 (GCMs) that resolve processes at high temporal and spatial resolution, such as those used in the latest 28 IPCC Assessment Report [3], reconstructions back in time are challenging as they have to span a much 29 longer period. GCMs can provide snapshots for a specific time or short transients in the order of a few 30 thousands of years, whilst periods of tens or hundreds of thousands of years can only be covered with 31 Earth System Models of Intermediate Complexity (EMICs) [4, 5], at the cost of low spatial resolution 32 and a simplified representation of the climate system [6]. Neither of those two types of models is in33 tentionally designed for paleo-ecology or species evolution, disciplines that require appropriate temporal 34 scales of up to hundreds of thousands of years and spatial scales down to tens of kilometres. 35 Here, we fill this gap for a long-term reconstruction of climate that resolves regional-scale dynamics 36 by reconstructing the last 800 thousand years (ka) at an unprecedented spatial resolution of approximately 37

us to efficiently describe the behaviour over the longer, millennial, time scales. In turns, this implies that 118 the glacial-interglacial climate of the Middle and Late Pleistocene responded in a consistent manner to 119 orbital forcings and CO 2 . It will interesting in the future to test whether this approximation holds for the 120 Early Pleistocene with its faster ice age cyclicity of 41 ka; for this endeavour, we currently lack enough 121 of estimates for CO 2 before the Mid-Pleistocene Transition, but GCMET is fully capable of covering the 122 appropriate time periods if enough estimates become available. For the moment, we can offer a detailed, 123 coherent reconstruction of the past 800 thousand years, which allowed us to pinpoint long-term potential 124 refugia that have been characterised by the same habitat, and we expect that this will open up new ways 125 to study the impact of past climate in a number of disciplines such as ecology and anthropology. 126

127
The global climate model emulator GCMET 128 The multiple linear regression model of GCMET-LO GCMET is derived from 72 available HadCM3 129 snapshot simulations [10, 11] (https://www.paleo.bristol.ac.uk/ummodel/data/tdwza/standard_ 130 html/tdwza.html, last accessed on 05 Oct 2018). It is a linear regression model for each individual 131 model grid box with the following independent variables: atmospheric CO 2 concentrations as a major 132 greenhouse gas, and eccentricity, obliquity, and precession as orbital parameters [14]. The sine function 133 has been applied to the precession parameter which is expressed as longitude of the perihelion (in de-134 grees) to make it a continuous function (was in degrees). Atmospheric CO 2 concentrations are the same 135 as in the respective HadCM3 time slice simulation, e.g., 280 ppmv for 0 ka before present (BP). The 136 available 72 HadCM3 simulations cover the last 120,000 years in 2,000-year intervals from 120,000 to 137 24,000 ka BP and in 1,000-year intervals from 22,000 to present-day. 138 The dependent variables are temperature T , precipitation P, or specific humidity Q. All indepen-139 dent variables, i.e., the predictors, are applied as normalised forcings. Thus, the resulting regression 140 coefficients, or β coefficients, can be compared across different climate variables, i.e., temperature and 141 precipitation, and across each other (Extended Data Figures 5-7). 142 Variations of a climate variable X based on a multiple linear regression model for the deviations from 143 the mean, i.e., the anomalies X , such that X = X + X with X being the mean of X. The equation for X 144 then is: 145 X (x, y,t) = β CO 2 (x, y)CO 2 (t) + β ε (x, y)ε (t) + β e (x, y)e (t) + β Ω (x, y)Ω (t) ( T , precipitation P, and specific humidity Q into anomalies, i.e., the X on the left hand side of Eq. 1 is: We also consider changes in surface type, i.e., ocean, land, and ice. For example, around the coast-159 lines, land can turn into ocean due to rising sea levels and vice versa, or the expanding ice sheets turn 160 land into ice. Both, precipitation and temperature respond to different surface type in a different way.

161
Therefore, each of the surface types (ocean, land, and ice) yields a distinct linear regression model.

162
For the improved precipitation model (as mentioned in the main text) we used temperature T and 163 specific humidity Q as independent variables This leads to precipitation predictions with a lower root mean square error over land (it is also ex-165 plained below and shown in Extended Data Figure 1). For the predicition of the climate before 120 ka 166 BP this means that we first need to reconstruct T and Q, and then we can use the β coefficients for T and 167 Q to reconstruct P.  ponent of the hydrological cycle, as compared to temperature, precipitation is much less constrained by 208 external forcings than temperature. Therefore, the linear regression model has less predictive skill for 209 precipitation than for temperature. However, it turns out that the predictive skill for precipitation can 210 be improved by using temperature and specific humidity as predictors instead of the orbital parameters 211 and CO 2 . By doing so the RMSEs can be substantially reduces, especially over land (Extended Data 212 Figure 1).

213
The regression coefficients To get an idea of how reliable our estimate for predictors are, we calculate 214 the p-values for each of the predictors, i.e., the beta coefficients. Here, the p-value tests the null hypoth-215 esis whether the coefficient is equal to zero, which means that the specific predictor has no effect. If the 216 p-value is below a certain threshold-in our case below the 5% significance level: p < 0.05-the null 217 hypothesis can be rejected. That means that the specific predictor is a meaningful addition to our linear In these regions, we set the β coefficients to zero and the 221 associated forcing has no effect.

222
Increasing to high resolution in GCMET Using nine high-resolution HadAM3H simulations, which 223 cover the deglaciation since 21 ka BP (21, 18, 15, 12, 10, 8, 6, 3, and 0 ka BP), we are able to increase the 224 spatial resolution from 3°, which is the spatial resolution of GCMET after the linear regression step (and 225 the same as the coarse resolution of the original HadCM3 snapshots), to ca. 1°. We do so by calculating 226 the difference between equivalent coarse-and high-resolution snapshots. For example, the difference at 227 10 ka BP is ∆ 10 ka BP = HadAM3H 10 ka BP − HadCM3 10 ka BP . We choose to interpolate the differences 228 linearly according to their CO 2 levels, e.g., 231 ppm at 10 ka BP, because any statistical model with more 229 than one variable would require more snapshots to adequately predict the differences. Thus, we simply 230 assume that the differences between a coarse-and high-resolution climate can be explained as a function 231 of the CO 2 forcing, i.e., ∆ 10 ka BP = ∆ 231 ppm . Now, for any period in the past, e.g., 300 ka BP, we add 232 the high-resolution difference, i.e., the ∆, which corresponds to the respective CO 2 level, to the coarse-233 resolution reconstruction. Note that the downscaling approach captures the regional-scale dynamics of 234 the GCM in this step, which change over time. This is in contrast to commonly used "delta approach" 235 for downscaling.              Figure 8: R 2 values as estimator for the goodness of the model (higher is better) using the training data, and root mean square errors (RMSE) as estimators of the goodness of fit (lower is better) using the test data. Shown are the R 2 and RMSEs for mean annual temperature, precipitation, and the alternative model for precipitation-based on temperature and specific humidity.  Figure 9: Time series of external parameters: CO 2 and orbital parameters for the last 2 million years. The continuous CO 2 record is from the EPICA Dome C ice core in Antarctica [62]. The point-wise CO 2 record is based on boron isotopes from planktonic foraminifera [63]. The orbital parameters are numerical solutions for the Earth's orbit and rotation in terms of eccentricity, precession, and obliquity [64]. Extended Data Figure 10: A receiver operating characteristic curve for the random forest classifier of the WWF 14 major habitats. The upper left corner represents a perfect prediction of an ecosystem, while the diagonal line represents a prediction made by random guessing. The closer the ROC curve is to the perfection point (0,1) the better the random forest classification is.