Gene regulatory effects of a large chromosomal inversion in highland maize

Chromosomal inversions play an important role in local adaptation. Inversions can capture multiple locally adaptive functional variants in a linked block by repressing recombination. However, this recombination suppression makes it difficult to identify the genetic mechanisms underlying an inversion’s role in adaptation. In this study, we used large-scale transcriptomic data to dissect the functional importance of a 13 Mb inversion locus (Inv4m) found almost exclusively in highland populations of maize (Zea mays ssp. mays). Inv4m was introgressed into highland maize from the wild relative Zea mays ssp. mexicana, also present in the highlands of Mexico, and is thought to be important for the adaptation of these populations to cultivation in highland environments. However, the specific genetic variants and traits that underlie this adaptation are not known. We created two families segregating for the standard and inverted haplotypes of Inv4m in a common genetic background and measured gene expression effects associated with the inversion across 9 tissues in two experimental conditions. With these data, we quantified both the global transcriptomic effects of the highland Inv4m haplotype, and the local cis-regulatory variation present within the locus. We found diverse physiological effects of Inv4m across the 9 tissues, including a strong effect on the expression of genes involved in photosynthesis and chloroplast physiology. Although we could not confidently identify the causal alleles within Inv4m, this research accelerates progress towards understanding this inversion and will guide future research on these important genomic features.

parviglumus, hereafter parviglumus approximately 9000 years ago [20,21]. Since domestication, populations 48 of maize have been moved into high altitude environments, and landraces collected today show considerable 49 local adaptation to their home elevation in a range of traits [22]. Interestingly, population genetic scans for 50 loci associated with adaptation to elevation gradients have identified several loci common in highland 51 landraces that have been introgressed from a different subspecies of teosinte, Zea mays ssp. mexicana 52 (hereafter mexicana), which occurs in highland environments [23]. One of these introgressed regions, located 53 at approximately 171.7 to 185.9 Mb of chromosome 4, is a chromosomal inversion known as Inv4m [24,25]. 54 Inv4m is observed in high altitude locations in Mexico, and is associated with a three day acceleration of 55 flowering time, the largest effect flowering QTL yet found [26]. Inv4m also overlaps with a quantitative trait 56 locus (QTL) found in a previous study of leaf pigmentation and macrohairs in teosinte, which are thought to 57 be adaptive in highland environments [27]. 58 We first comprehensively characterized the population-genetic context of the Inv4m locus and its 59 association with key agronomic traits using dense whole-genome genotyping data of thousands of Mexican 60 landraces. We found that Inv4m is more closely associated with altitude in the center of maize diversity in

Results
Inv4m is associated with local adaptation in agronomic traits 72 To confirm the population genetic signature of local adaptation at Inv4m, we determined the Inv4m 73 genotype of 4845 maize plants from the SeeD-maize GWAS panel using published unimputed 74 genotype-by-sequencing data. Of these lines, 707 were homozygous for the minor haplotype, and 351 were 75 heterozygous for the locus. Of the 585 plants carrying at least one allele of the minor haplotype and with 76 complete with geographic information, all but 7 were from Mexico, with the majority collected from the 77 central highlands (Fig 1A). Therefore, to assess evidence of local adaptation, we assessed the association of 78 Inv4m genotype with elevation among the 1757 Mexican plants. In Mexico, 1186 and 381 plants were 79 homozygous for the alternate haplotype, and 190 were heterozygous, a distribution that significantly differs 80 from Hardy-Weinberg expectations (D=-252.0, p = 1.91e-197). Genotypes at Inv4m were strongly associated 81 with elevation, as previously reported [26] Fig 1B. The highland haplotype at Inv4m had much lower genetic 82 diversity across the locus; however diversity measures rebounded immediately outside of the published 83 boundaries of the inversion. Plants carrying the highland haplotype did show a slightly lower proportion of 84 segregating sites (θ) across chromosome 4 ( Fig 1C). Diversity estimates were relatively constant across the 85 "High" haplotype at the Inv4m locus, with little evidence of large-scale introgression of lowland alleles; only 86 5% of markers had segregating variation in common (MAF > 0.05) in both haplotypes at the locus, and 87 some of these may have been caused by cryptic paralogous loci which is common in GBS data [28]. The 88 lower diversity estimates are unlikely to be caused entirely by mapping biases against this divergent 89 haplotype. Of the 9201 GBS markers within the locus, ∼ 80% were successfully genotyped in > 5% of the 90 highland individuals, and of these markers, 14% were segregating in the highlands and 33% were segregating 91 in the lowlands. Among these markers, rates of missing genotypes were similar between the two alleles. If all 92 of the un-scored markers were actually present and variable in the highland individuals, there would still be 93 fewer segregating positions among highland individuals than lowland individuals. Romero-Navarro et al [26] identified Inv4m as a large-effect QTL in a multi-environment trial of landrace 96 hybrids grown in multiple field sites in Mexico. Gates et al [29] analyzed five additional agronomic traits 97 from some of these field trials and found evidence for effects of the Inv4m locus on several traits. However, 98 these studies did not explicitly show effect sizes for Inv4m across trials or traits, and none of the individual 99 SNP markers in these studies was perfectly associated with our Inv4m genotype. Therefore, we re-analyzed 100 the phenotype dataset focusing specifically on estimating the effect of Inv4m, and how this effect changed Geographic locations for each of the 1757 Mexican plants genotyped by GBS, colored by their imputed genotypes at Inv4m. B. Association of Inv4m and elevation. Each point shows the mean frequency of the "High" allele at the Inv4m locus among plants from landraces collected in each 100m bin. The ribbon shows a loess fit (± 2SE) to the logit-transformed frequencies weighted by the number of landraces in each elevation bin. Bins with fewer than 10 landraces were excluded (those with elevation >2700m). C and D. Diversity estimates π and θ for 371 sampled plants homozygous for the "Low" haplotype and 371 plants homozygous for the "High" haplotype at Inv4m, along chromosome 4 between 162 and 182 Mb. The boundaries of Inv4m from [24] are denoted by vertical lines.
across the elevations of the trials.

102
The highland haplotype of Inv4m was significantly associated with Days-to-Anthesis, the Anthesis-Silking 103 Interval (ASI), Grain Weight-per-hectare, Bare cob weight, and Field Weight, but in each case, the effect size 104 changed across elevations in the direction consistent with local adaptation: earlier flowering, reduced ASI, 105 and increased yield components in highland trials, while the opposite in lowland trials (Fig 2). The highland 106 haplotype was weakly associated with greater plant height, but the relationship was not significant (p=0.11 107 for the main effect). The relationship between Inv4m genotype and these traits was not simply an indirect 108 effect of the change in flowering time; each relationship remained qualitatively the same even after with the elevation of origin, and the first eigenvector of genetic variation across the remainder of the genome 113 (excluding chromosome 4) is also strongly correlated with elevation. Therefore, these results are also 114 consistent with a polygenic basis for the divergence of each of these traits along elevational gradients.

115
Functionally validating the association of traits with Inv4m therefore requires experimentally breaking the 116 association between Inv4m and the rest of the genome through experimental crosses.  Association of Inv4m genotype with agronomic traits depends on trial elevation We modeled each trait as a function of Inv4m genotype, trial elevation, and tester line, with controls for main effects and responses to elevation of the genomic background. Black lines and ribbons show estimates of the effect of the highland allele of Inv4m as a function of trial elevation ± 2SE, based on conditional F-tests at the REML solutions of the random effect variance components. Green lines show estimates of the Inv4m effect in a model that additionally included effects of Days-to-Anthesis on the focal trait within each trial.

118
Isolating Inv4m in common genetic background 119 We generated two sets of segregating families for the Inv4m locus by crossing the reference line B73 which 120 carries a lowland haplotype of Inv4m to two highland landrace donors (called PT and Mi21). The progeny 121 were backcrossed to B73 for 5 generations and then selfed to generate a population of plants that were largely 122 homozygous B73 across the genome, but segregated for the lowland highland haplotypes at the Inv4m locus. 123 We used 81 plants that were homozygous for either the High or Low haplotypes from these families to 124 measure the effect of Inv4m on more than 300,000 gene expression traits across 9 tissues (Table 1) and two 125 environments (warm: 32C/22C and cold: 22C/11C). an 18Mb region surrounding Inv4m (Fig 3A). The PT family also segregates for a large paracentric region of 138 residual donor alleles on chromosome 5 and a small region on chromosome 2, and the Mi21 family segregates 139 for a large paracentric residual region on chromosome 3 ( Figure S2). Beyond these large contiguous blocks, 140 we identified another 821 and 52 genes in the PT Mi21 families, respectively, that harbored high-confidence 141 SNPs in the RNAseq data, yet were not contiguous with any of the large residual introgression regions. It is 142 unlikely that there was sufficient recombination in the BC 5 populations to generate these independent blocks; 143 rather these genes likely have moved genomic coordinates in the landraces relative to B73, and actually 144 reside inside one of the large introgression regions [30,31]. However, none of these genes had genotypes that 145 were perfectly correlated with genotypes at the Inv4m locus, so we excluded them all from further analysis. 146 In the segregating families, there were a total of 7,236 genes from PT, and 4,095 genes from Mi21, 355 of 147 which were within Inv4m. Of all genes from the landrace donor which were highly correlated with the Inv4m 148 8/33 genotype (ρ > 0.5), 14% and 29% resided inside Inv4m in the PT and Mi21 families respectively. Therefore, 149 many associations of traits with the Inv4m haplotype that we observe in one population may be caused by 150 functional variants in regions flaking the inversion, rather than functional variants inside the inversion itself. 151 However, based on our GWAS results, only variants actually inside the inversion (or in the breakpoint 152 regions) are likely to be causal for the locus's effect.

153
The proportion of shared SNPs between PT and Mi21 families within Inv4m was significantly higher than 154 in the flanking introgressed regions (Fig 3B,  likely caused by functional variation inside Inv4m rather than in residual landrace DNA present in each 160 population (Fig 3). We also quantified the correlation between genetic and expression divergence between

162
Effects of Inv4m on genome-wide gene expression 163 Of the 432 tissue samples collected (nine tissues × two temperature treatments × two segregating families × 164 two Inv4m arrangements × three biological replicates × two experimental replicates), we excluded 53 165 samples with fewer than 100,000 reads, leaving a total of 379 samples. We detected a total of 23,428 unique 166 genes present where at least a third of the samples had 10 or more counts in at least one of the 167 tissue:treatment combinations (with an average of 17,016 genes per tissue) for a total of more than 306,000 168 gene expression traits. We visualized the overall transcriptome variation among samples using 169 multi-dimensional scaling ( Figure S3). Samples clustered predominantly by tissue, with an additional slight 170 separation by temperature treatment, but no visible separation by family or genotype at the Inv4m locus.

171
This was expected because only ∼ 7% of the genomes differed among samples. Among tissues, all leaf 172 samples except the S2 leaf base formed one major cluster, the two root samples formed a second cluster, and 173 the stem and SAM and S2 leaf base tissues formed separate individual clusters.

174
To assess the effects of highland alleles within Inv4m on global gene expression and plant development 175 and physiology, we focused on genes with expression associated with Inv4m genotype that resided in "clean" 176 genomic regions, so that the local genotype of each gene could not affect the association results. Overall, we 177 identified 11,842 unique genes associated with genotype at Inv4m in the PT families and 12,482 genes in the 178 9/33 Mi21 families, both using a 5% local false sign rate (lfsr ) threshold for significance ( Table 1). The number of 179 associated genes varied by tissue and temperature treatment with a range of 1,932-5,753 and 1,018-6,000  Inv4m-regulated genes were distributed across the genome, with no visible clustering by chromosome.

199
Identifying candidate causal alleles within Inv4m

200
To identify candidate genes within Inv4m, we measured the Inv4m genotype effect (i.e. cis effect) on each of 201 the annotated genes located inside the Inv4m locus, and the correlation between the expression of these  Table 3. Gene ontology (GO) terms remaining after final filtering. A universal enrichment analysis was conducted on each tissue and temperature and directional (up-regulated, down-regulated, or both) combination for Inv4m-regulated genes. Terms were then ranked by enrichment score and grouped by a semantic similarity score of higher than 0.5. The top term in each semantic similarity group was then selected. Zm00001d051879, Zm00001d052180, Zm00001d052229. These genes are reported in Table S2, with associated 217 description, GO term(s), and previous GWAS trait associations.  (Table S3). These expression effects were detected in the V1 and V3 primary root, S2 leaf tip, and 222 stem and SAM tissues. developmental transitions as well the development of pigmentation and macrohairs in maize [33]. The 226 expression of the pre-miRNA 172c was too low to assess differential expression in all but the leaf sheath 227 tissue, and was not associated with Inv4m genotype in that tissue. However, of 31 genes that are predicted 228 13/33 targets of miR172, 9 were consistently regulated by Inv4m in both populations in one-or-more tissues. Of these, six were down-regulated by the High haplotype of Inv4m. These genes and their associated 230 descriptions are found in Table S4. 231 De novo transcriptome assembly to search for novel genes 232 Finally, we used the RNAseqreads that did not map to the B73 genome to search for genes that may be 233 present in the High haplotype of Inv4m but not in the Low haplotype, and thus may have been missed by 234 our analysis. We assembled all un-mapped reads from samples carrying the PT or Mi21 haplotype at Inv4m 235 using Trinity, and searched for de-novo transcripts with evidence for expression only in the samples carrying 236 the High haplotype. We found 772 candidate transcripts. However, all were very lowly expressed. Only five 237 had estimated transcript counts of at least 10 summed across all PT or Mi21 samples, and none had 238 estimated transcript counts of at least 10 in two or more samples per family. Therefore, we found no evidence 239 that the High haplotype of Inv4m carries high-expressed genes that are not present in the Low haplotype. inspected Inv4m effects on more than 300,000 traits and employed statistical methods that leveraged the 248 replication of the effects across conditions [34] to broadly characterize the effects of this locus. Finally, we 249 inspected cis-regulatory variation on genes within the inversion to identify candidate genes responsible for 250 the Inv4m effects. variants more precisely than bi-parental mapping populations because they use diverse association panels 255 that encompass a large number of ancestral recombination events [35]. However, GWAS approaches suffer 256 14/33 from confounds due to population structure and non-equal relatedness leading to high frequencies of false 257 positives [36]. QTL mapping populations, on the other hand, are limited in resolution due to the requirement 258 for new recombination events between candidate loci.

259
Multi-parent populations like NAM [37] or MAGIC [38,39] capture advantages of both methods, 260 approaching the resolution of GWAS without the confounds of population structure. Our segregating families 261 shared a common parent (B73), and thus are similar to a two-population NAM panel. NAM-QTL mapping 262 can have high precision when the donor parents share the same alleles at a causal locus and do not share 263 alleles at other loci. In our case, the two donor parents PT and Mi21 likely share many alleles other than 264 Inv4m due to shared highland ancestry throughout the genome, so families from both parents will overlap at 265 other causal alleles linked to Inv4m. However, our results show that the genetic similarity between PT and 266 Mi21 is much higher inside Inv4m than in the remaining ∼50Mb of shared residual landrace genome between 267 the two donors. Therefore, while we cannot unambiguously attribute shared phenotypic effects across the 268 two donors to Inv4m, the majority of effects of Inv4m that we identified are likely caused by functional 269 variation within this locus itself, rather than due to other shared alleles between the two donors.

270
Our GWAS study of diverse maize landraces confirmed earlier results [26,29] that highland haplotypes of 271 Inv4m are strongly associated with agronomically important phenotypes including flowering time, the 272 Anthesis-Silking interval, and several measures of yield. In each case, the effect of the highland haplotype 273 was beneficial when grown in highland environments, but detrimental when grown in lowland environments, 274 consistent with this single locus causing antagonistic pleiotropy [40] across elevation environments. This may 275 explain the strong evidence for selection at this locus: its divergence in allele frequencies across elevations 276 and the evidence that this locus introgressed into highland maize from the wild relative mexicana [23]. That 277 the locus appears to independently control both flowering and yield traits is consistent with the idea that 278 Inv4m contains multiple important variants that each contribute to phenotypic differences between lowland 279 and highland maize [26,29]. Because they are inherited as a single unit, inversions are thought to contribute 280 to the prevalence of local adaptation despite high gene flow [41][42][43] by linking these multiple variants into a 281 single supergene [44]. above is that each trait has a polygenic basis, with small-effect loci distributed throughout the genome, each 289 of which has subtle allele-frequency differences across elevations. We conducted a preliminary analysis of the 290 PT and Mi21 segregating families in high elevation fields, but we did not have sufficient replication to detect 291 any effect. We are following up with experiments in highland field conditions to test the effects of the 292 inversion on gene expression and fitness.  Ultimately, identifying specific functional loci within Inv4m will require experimental mutagenesis or 302 other genetic perturbations within the locus. However, we aimed to begin to characterize the diversity of 303 functional variants in this locus using gene expression analysis. We used gene expression in four distinct ways 304 to dissect the functional variation captured by Inv4m: 1) by analyzing genome-wide gene expression 305 responses to Inv4m to mine for phenotypic effects across >300,000 traits; 2) by analyzing local cis-regulatory 306 effects of the locus on the 355 genes within Inv4m; 3) by analyzing the co-expression between Inv4m genes 307 and the rest of the genome; and 4) by assemblying un-mapped transcripts into de-novo gene models. The 308 first analysis provided an exceptionally detailed phenotypic dissection of the total effects of the Inv4m, and 309 showed that there are likely many distinct components to the cellular and physiological effects of Inv4m. The 310 second analysis provided an estimate of the density of functional variants within the Inv4m locus: we 311 detected likely cis-regulatory variation affecting 155 genes. The third analysis showed that many of these 312 cis-regulatory variants may have functional consequences beyond the immediate genes they regulate. The 313 fourth analysis did not give any compelling results, but may be useful in other systems.

314
By studying gene expression effects of Inv4m in two relevant environmental contexts (hot and cool 315 temperatures) and across nine distinct tissues, we aimed to maximize our ability to discover developmental 316 and phenotypic effects of the locus. It is certainly possible that we missed important phenotypic effects of 317 Inv4m by sampling only tissues on young plants -effects on pathways specific to reproductive tissues were 318 16/33 likely missed. However, we selected the nine tissues based on the published maize gene expression atlas [45] 319 so as to capture as much variation as possible in expression profiles, given the experimental constraints on 320 how large we could let the plants grow in our growth chambers.

321
Overall, our expression results show that Inv4m affects many disparate biological processes in young 322 maize tissues. The strongest Gene Ontology enrichment signals among Inv4m-regulated genes were in terms 323 related to mRNA and protein processing around the nucleus (nuclear transport and import, and the 324 pre-ribosome). We also found evidence of effects on epigenetic regulation, cell-cycle processes, metabolism, 325 and development. None of these results provide clear explanations for the effects of Inv4m on flowering time 326 and yield. However both flowering and yield are highly complex traits that are affected by many aspects of 327 development, physiology, and stress responses, and so the mechanistic links among these traits may not be 328 obvious [37]. We looked more specifically at a priori candidate genes for flowering and yield traits both 329 inside and outside Inv4m and found possible effects on several of these genes, but no strong enrichment of 330 Inv4m effects on either class.

331
Among the genes within Inv4m, nearly 70% of those expressed high enough to measure showed evidence 332 of cis-regulatory variation among alleles. While some of these genes may share regulatory elements, its likely 333 that the majority of these genes are affected by independent genetic variants. This suggests that the two Inv4m candidates among all Inv4m-correlated genes is roughly similar to the relative sizes of Inv4m to the 341 whole chromosome 4 introgression in each population. This suggests that introgressing any region from PT 342 or Mi21 into B73 will cause diverse effects on gene expression, and that Inv4m is not exceptional in the 343 magnitude of these perturbations.

344
Together, these results imply that the Inv4m locus has many effects on corn development and physiology, 345 and therefore it's contribution to local adaptation is complex and not simply a change to major flowering or 346 yield-related genes. The incorporation of ∼ 13Mb of the genome of mexicana likely brought with it a large 347 number of functional variants that have both positive and negative effects on many molecular traits, most of 348 which are not macroscopically visible, but may still impact performance in different conditions. 349

17/33
Conclusions 350 This study represents a broad characterization of an adaptive chromosomal inversion. Our results give insight 351 into the role of this inversion in adapting to high altitude environments. GWAS results show that Inv4m is 352 associated with faster flowering and higher yield in highland common gardens. The molecular roles of genes 353 within the inversion are summarised by the phenotypic effects of Inv4m-regulated genes ( Table 2) and 354 enriched GO terms (Table 3), and the candidate gene set within Inv4m (Table S2). Fine-mapping in this 355 region is required to further dissect the functional role of loci within Inv4m, but will have additional 356 challenges due to suppressed recombination between heterokaryotypes. Novel genomic technologies, such as a 357 CRISPR/CAS system [46] that can reverse the orientation of the High Inv4m haplotype could be used to

372
To calculate the association of Inv4m with elevation, we divided landraces into 100m bins, calculated the 373 haplotype frequency of the minor haplotype in each bin, and fit a loess curve to the log-transformed 374 haplotype frequencies, weighted by the number of landraces in each bin. Based on this analysis, we labeled 375 the minor haplotype in Mexico "High", and the major haplotype "Low". We used the R function HWExact 376 from the HardyWeinberg R package to test genotype counts against the Hardy-Weinberg expectation. We 377 calculated diversity statistics π and θ separately for plants homozygous for the "High" or "Low" haplotypes 378

18/33
at Inv4m across all of chromosome 4 in 500 marker windows using TASSEL 5 [48]. Because many more plants were homozygous for the "Low" haplotype, we randomly sampled 371 landraces to calculate the 380 diversity statistics to make the sample sizes equal. Primers were designed to amplify the fragment of DNA carrying the diagnostic SNP (Forward: minutes of final extension step at 72 • C, followed by a 4 • C hold. Amplified DNA was then digested with the 409 Hinf 1 enzyme for 1 hour at 37 • C, and the resulting product was run out on a 1% agarose gel for genotyping. 410 Two of the resulting BC 5 individuals identified to be heterozygous for Inv4m were self-pollinated per 411 population to produce BC 5 S 1 families segregating for Inv4m.   Two seeds were planted in each pot, one in the center and one near the corner, and a total of nine tissues 424 were sampled from the two plants when they reached specific developmental stages. These nine tissues were 425 selected to maximize the diversity of gene expression profiles based on the transcription atlas of [45]. Corner 426 plants were removed from the pot and sampled when they reached the V1 stage, while center plants were 427 sampled when they reached the V3 stage. Two tissue types were sampled from the V1 stage, and 7 tissue 428 types were sampled from the V3 stage (Table 1). Sampling occurred between 2 and 4 hours after simulated 429 sunrise. Plant tissue was placed in 2 ml centrifuge tubes, immediately flash frozen in liquid nitrogen, and 430 stored at -70 • C.

431
Inv4m Genotyping and RNA sequencing 432 We used the same DNA extraction and CAPs genotyping methods as previously described to genotype the 433 seedlings for the Inv4m haplotype. We harvested and successfully genotyped 364 individual plants from v.0.11.5 [51]. Adapter sequences, low quality reads (q<20), and sequences less than 25 bp were removed 449 using Trimmomatic v.0.36 [52].

450
Effects of genotype at Inv4m on seedling emergence 451 The effect of Inv4m on seedling emergence were analyzed using the following random slope and intercept 452 model for each donor and temperature treatment separately: 453 y ijk ∼ µ + β 1 G i + u ijk + e ijk y ijk is the emergence time for individual plant k in experimental replicate j in Inv4m genotype i. µ is the 454 model intercept, β 1 G i is the effect of Inv4m genotype, u = u ijk is a random effects term for the experimental 455 replicate, and e ijkl is residual error. Variance components, coefficients and standard errors were estimated by 456 REML using the R function lmer [53], and p-values were calculated using conditional F-tests [54]. the genome by calling variants in the expressed regions. Paired reads that passed filtering were aligned to the 461 B73 reference genome version 4 [55] using hisat2 [56], and variant loci were called using GATKv3 [57,58].  Within these introgressed regions, we used R/QTL [59] to assign genotype probabilities across the Inv4m locus 475 for each plant, allowing error.prob = 0.2. Finally, we observed that several genes outside these 5 476 introgressed regions each of which exhibited >= 2 SNPs relative to the reference. We hypothesized that 477 these genes may have a different chromosome location in the landraces relative to B73, and actually reside 478 inside one of the 5 introgressed regions. We therefore assigned their genotype to the most common genotype 479 among these variant loci.

481
To quantify gene expression, we ran kallisto v.0.42.3 [60] separately on each sample using the B73 482 AGPv4.36 transcript models downloaded from the maize genome database [55]. We limited to only one 483 bootstrap replicate, and then summed the transcript counts for each gene. Genes were retained for analysis 484 when at least a third of the samples had 10 or more reads. Gene counts were normalized using the weighted 485 trimmed mean of M-values (TMM) with the calcNormFactors function in edgeR [61]. Normalization using 486 TMM reduces bias of very highly and lowly expressed genes. The voom function [62] in the limma 487 package [63] was used to convert normalized reads to log2-counts per million (log2CPM), estimate a We divided genes into three groups to estimate the effects of Inv4m or other introgressed landrace alleles, 495 based on whether each gene resided in a "clean" genomic region with only B73's allele present in the families, 496 inside the Inv4m locus itself, or if it resided within one of the genomic blocks containing any of the residual 497 landrace genome outside of Inv4m. Each group of genes served a different purpose in the analysis of Inv4m. 498 Genes in the "clean" region were used to assess the effects of Inv4m on global gene expression and indirectly 499 assess the effects on development and physiology more broadly. Genes inside the Inv4m locus were scanned 500 for candidate alleles underlying Inv4m's effects. Genes in the residual introgression blocks were used as 501 controls to assess the similarity of PT and Mi21 alleles in other genomic loci, as well as compare effect size 502 and expression correlation with Inv4m.

503
For genes that resided in "clean" genomic regions with only B73's allele present in the segregating families (approximately 89.8% of genes expressed in both donors), we estimated the effect of the Inv4m locus separately in each Inv4m donor, temperature treatment, and tissue using the linear model: where y ij is a normalized, batch-corrected log2CPM value for a single gene in a single sample, µ is the 504 intercept for that gene in the particular population, environment and tissue, β is the corresponding effect of 505 the landrace Inv4m haplotype, and e ij is the model residual, which is assumed to be independent of all other 506 residuals and have variance proportion to φ ij , the empirical weight factor calculated by voom. We fit this 507 model to the whole set of "clean" genes using the lmFit function from limma, and extracted β and its 508 standard error (|β|/t). We leveraged the correlations in effect sizes across tissues and environments to 509 improve effect size estimates and identify a union set of genes regulated by Inv4m by combining results 510 across tissues and environments using the mash method [34] implemented in the mashr R package. mash was 511 run separately for the two segregating families. 512 We also fit a separate model to test for interactions between Inv4m and the temperature environment, separately for each Inv4m donor and tissue: This model adds β 2 , the main effect of temperature environment on expression, and β 3 , the interaction 513 between Inv4m and temperature. However it is less flexible than the first model because the residual 514 23/33 variance σ 2 is constrained to be equal for the two temperature environments. This model was also fit to each 515 gene using lmFit, and the estimate of β 3 and its standard error were extracted. We again used mashr to 516 identify the union set of genes affected by this interaction.

517
For genes residing inside the Inv4m locus, we fit the same two statistical models with the lmFit function 518 (both sets of genes were analyzed jointly to leverage the empirical bayes shrinkage of standard errors).

519
However, we did not include these genes in the multiple adaptive shrinkage analysis.

520
For genes residing outside Inv4m, but within one of the genomic blocks containing residual landrace DNA in both donors, we fit a slightly different statistical model: where cis is the local genotype of the gene, and β is the associated effect. For genes in residual genomic 521 blocks on chromosome 4, the cis genotypes were highly correlated with the Inv4m, so some of the cis effect 522 may have been caused by Inv4m, but these effects were difficult to separate statistically. However for genes 523 on other chromosomes, the two genotypes were largely uncorrelated.

524
Sequence divergence, and expression correlation between genes within the High 525 Inv4m haplotype and the residual genomic regions 526 We estimated the sequence divergence between the two landrace donors and B73 for each gene within the 527 chromosome 4 introgression containing Inv4m present in both families. For each gene, we calculated the 528 genetic similarity of the PT and Mi21 alleles relative to B73 by counting the number of shared SNPs divided 529 by total number of observed SNPs within each gene window, using only the highly filtered SNP set described 530 above. We calculated the correlation in effect sizes between donors for each tissue and temperature 531 combination. T-tests were used to determine whether genetic similarity and expression correlation were 532 higher within Inv4m relative to the shared introgressed region.

533
Gene ontology enrichment 534 Genes were assigned to gene ontology (GO) categories for functional annotation using an updated ontology 535 annotation [64] which we expanded to include all ancestral terms for each gene with the buildGOmap function 536 of the R package clusterProfiler [65]. Genes in "clean" genomic regions which responded to the High Inv4m 537 haplotype in the same direction in both donor populations were classified as Inv4m-regulated and tested as 538

24/33
foreground genes in a GO enrichment analysis. Genes that were expressed in each tissue and temperature combination in both donor populations, but were not Inv4m-regulated were included in the set of background 540 genes. We calculated the enrichment of each GO term using the enricher function in the clusterProfiler R 541 package. We selected GO terms with a false discovery rate less than 1% after a Benjamini-Hochberg multiple 542 test correction. We then calculated the percent of genes in each GO terms that were Inv4m-regulated, and 543 ranked all GO terms by their maximum enrichment across conditions. We then selected the highest enriched 544 GO term among each set of GO terms that had a semantic-similarity >0.5 as the representative term set.

545
Candidate gene pathway assessment among Inv4m-regulated genes 546 We inspected two additional candidate gene sets: genes known to regulate flowering time in maize [32], and 547 genes regulated by the microRNA miR172. Inv4m has previously been associated with flowering [26]. 548 miR172 is a highly conserved micro-RNA across the plant kingdom that regulates development and flowering 549 time and one miR172 gene, miR172c is located inside Inv4m. For miR172 targets, we found the mature 550 sequence for zma-miR172c: "AGAAUCUUGAUGAUGCUGCA" from miRBase http://www.mirbase.org, 551 and used this as a query of the Plant Small RNA Target Analysis Server (psRNATarget, 552 http://plantgrn.noble.org/psRNATarget), and collected all predicted target genes. We also used 553 TAPIR's pre-computed target genes for zma-miR172a-b-c-d. These two categories of genes were inspected by 554 hand for evidence of regulation by Inv4m.

556
We used the intersection of two separate methods to identify candidate adaptive genes within Inv4m. First, 557 we quantified the proportion of conditions (tissue:temperature combinations) that each gene within Inv4m 558 was differentially expressed according to Inv4m genotype. To be considered differentially expressed, the gene 559 needed to be differentially expressed in both Inv4m donor populations and in the same direction. Genes 560 where at least one donor was not expressed were removed from this analysis per condition. As a 561 complimentary approach, we calculated the correlation in expression between each Inv4m gene and all the 562 genes in "clean" genomic regions that we determined to be Inv4m-regulated above. We then quantified the 563 proportion of these correlations that were significant and in the same direction in both donor populations for 564 each gene. For this analysis, we used the lm function in R to implement a linear model with the 565 Inv4m-regulated gene expression as the response variable, and the Inv4m gene's expression and Inv4m 566 genotype as predictors. 567

25/33
De-novo assembly of novel genes 568 We collected all un-mapped reads from the samples homozygous for the high Inv4m haplotype, and used 569 Trinity v2.4 [66] to assemble un-annotated transcripts using default settings. We then used Kallisto to 570 quantify the expression of each of these novel transcripts using the un-mapped reads from each RNAseq 571 sample. To search for candidate "novel" genes in the highland haplotype, we filtered for Trinity genes that 572 had zero estimated counts in any of the samples that were homozygous for the B73 haplotype of Inv4m, but 573 had non-zero estimated counts in at least 2 samples homozygous for the PT or Mi21 haplotype in each 574 segregating donor family (to exclude genes that may reside in either PT or Mi21). 575

579
S3 Figure. Multidimensional scaling plot of normalized and batch corrected gene expression (log2CPM) of 580 the 500 highest expressed genes across our dataset using the plotMDS function. The plot represents a 581 two-dimensional log2 fold change distance between each sample colored in three different ways to display how 582 each factor effects the structure of the data. A) tissue B) Inv4m genotype and C) Temperature. 583 S1