Investigating the phenotypic consistency and genetic architecture of noncognitive skills.

Noncognitive skills have been demonstrated to predict a range of socioeconomic outcomes including educational attainment and employment. There is however little evidence of how consistent and reliable noncognitive skills are over time or across different measures. Using data from a UK cohort, we show that some key indicators of noncognitive skills are inconsistent, that phenotypic relationships between them are generally weak, and that they associate with educational and labour market outcomes inconsistently and less strongly than cognitive skills. Genomewide analyses reveal that noncognitive skills exhibit low heritability and no clear shared genetic architecture with cognitive skills or outcomes. Our results implicate a high noise to signal ratio and suggest caution in the use of noncognitive measures as reliable indicators of underlying traits. Some noncognitive skills, particularly behavioural difficulties, may provide malleable targets for interventional educational programmes, but many current measurements may be too imprecise and inconsistent to be reliably used.


Summary
Noncognitive skills have been demonstrated to predict a range of socioeconomic outcomes including educational attainment and employment. There is however little evidence of how consistent and reliable noncognitive skills are over time or across different measures. Using data from a UK cohort, we show that some key indicators of noncognitive skills are inconsistent, that phenotypic relationships between them are generally weak, and that they associate with educational and labour market outcomes inconsistently and less strongly than cognitive skills. Genomewide analyses reveal that noncognitive skills exhibit low heritability and no clear shared genetic architecture with cognitive skills or outcomes. Our results implicate a high noise to signal ratio and suggest caution in the use of noncognitive measures as reliable indicators of underlying traits. Some noncognitive skills, particularly behavioural difficulties, may provide malleable targets for interventional educational programmes, but many current measurements may be too imprecise and inconsistent to be reliably used.
Noncognitive skills have long been argued as an important mechanism through which a range of outcomes are differentiated between individuals [1][2][3][4][5][6][7] . They include characteristics such as persistence, motivation and temperament and are intended to capture a complementary set of skills to cognitive measures such as intelligence and abstract reasoning 1 . However, the definition of noncognitive skills as "personality traits, goals, character, motivations, and preferences" 8 is broad to the point of vagueness; there is no standard scientific definition of noncognitive skills and studies have used a wide range of measures (Box 1). This disparate measurement may reflect that previous studies have used samples or measures of convenience 9 . Modest correlations between different noncognitive skills have been observed for grit and conscientiousness 10 ; academic effort and academic problems 11 ; and internalising and externalising behaviours 12 . However, the bulk of reported correlations between noncognitive skills have been weak 10,11 , questioning any notion that they are one dimensional constructs 13 . Box 1: A non-exhaustive list of noncognitive skills agreeableness 14 effort 14 locus of control 15 risk aversion 6 ambition 16 emotional stability 9 metacognitive strategies 4 self-control 6 attention 17 emotional intelligence 14 motivation 18 self-discipline 19 behavioural problems 7 empathy 8 neuroticism 7 self-esteem 5 charm 5 externalising 20 openness 8 self-efficacy 8 clumsiness 7 extraversion 7 optimism 14 self-perceptions 4 communication 4 executive function 21 organisation 14 self-regulation 9 confidence 4 friendliness 4 patience 6 sociability 18 consistency 1 goals 18 perseverance 1 social skills 17 conscientiousness 10 greed 16 personality 18 temperament 6 curiosity 18 gregariousness 22 preferences 18 tenacity 19 creativity 12 grit 10 procrastination 14 time management 23 delay of gratification 9 humility 8 reliability 19 trustworthiness 19 dependability 1 impulsivity 24 responsibility 25 work habits 26 docility 5 leadership 3 resilience/coping 8 A large body of research suggests that people with higher measurements of noncognitive skills have better education, health and labour market outcomes 2,3,5,8,11,14,18,25,27 . For example, Borghans and colleagues report that personality explains 16% of the variance in achievement scores in a US sample 27 . This has led some to argue that noncognitive skills are as important as cognitive skills: "For many dimensions of behaviour… noncognitive ability is as important, if not more important, than cognitive ability" 5 . However, the evidence supporting this is diverse, inconsistent and inconclusive. A recent systematic review demonstrated small and heterogenous effects of noncognitive skills on a range of outcomes including academic achievement, with evidence of substantial publication bias (Table 1) 9 .There is a lack of robust causal evidence and interventions have generally had only weak and short-lived effects 4 . Noncognitive skills have been argued as a valuable target for interventions to improve outcomes because they are more malleable than cognitive skills 3,8,28,29 . Malleability relates to causation; the way that intervention can lead to change. But the extent to which the instruments to detect noncognitive skills are accurately measuring a genuine underlying trait is unknown and it is possible that the perceived malleability of noncognitive skills reflects measurement error or inconsistency. To be ideal candidates for policy interventions, measures of noncognitive skills must reliably pick up a consistent signal of an underlying skill, as displayed by high phenotypic correlations. It has been argued that reliable measures of noncognitive skills are available 8,30 , though this has been questioned by a lack of consistent longitudinal evidence 15,31 . Consistency has varied widely between studies and skills; for example, test-retest reliability (calculated as the correlation coefficient between the same measure at two occasions) has been estimated at 0.12 32 and 0.46 to 0.61 33 for risk aversion, and 0.49 for locus of control. These values are far lower than those reported for cognitive skills (c.f. 0.52 for digit span to 0.82 for reading) 32 .
The extent to which noncognitive skill measures detect signal over noise can be investigated with genome-wide data. Most human traits are at least partly heritable 34 , but there has been relatively little research into the genetics of noncognitive skills compared to cognitive skills 35 . Personality types are the most studied of the noncognitive skills with heritability estimated at between 0.4 and 0.5 37,38 , longitudinal variability 36 , and evidence of a genetic contribution to their stability over time 39 .
Heritability estimates for other noncognitive skills range from 0.36 for alienation to 0.83 for academic effort 11 ; 0.31 to 0.56 for aspects of openness 40 ; 0.18 to 0.49 for aspects of conscientiousness 41 ; and 0.40 for enjoyment and self-perceived ability 42 . Sibling correlations are lower for noncognitive (0.09 to 0.46) than cognitive (0.50 to 0.62) skills 43,44 , supporting a smaller genetic component, but the extent of genetic correlation between different noncognitive skills has not been widely investigated 45 .
In this study, we contribute to the literature with a comprehensive analysis into the phenotypic and genotypic structure of noncognitive skills. We investigate two related research questions: 1) are noncognitive skills phenotypically and genotypically correlated over time and across different skills? and; 2) Do noncognitive skills associate with socioeconomic outcomes?

Results
How consistent are noncognitive skills over time?
With the exception of the SDQ scale ( =0.18 to 0.73), most noncognitive measures were inconsistent over time ( Figure 1). This suggests that most skills are highly variable (inconsistent) or poorly measured (unreliable). Correlations amongst the Big 5 personality types were low except between the intellect/imagination and agreeableness subscales ( =0.46). While the phenotypic correlations for repeat measures of noncognitive skills were low, they were generally positive where distinguishable from zero and so do not appear to capture conflicting information at different measurement occasions. The weak temporal phenotypic correlations of noncognitive measures contrasts sharply to those for cognitive measures of IQ ( =0.60) and measures of educational attainment ( >0.78 for compulsory education). The high correlation of the education measures will have been partly socially organised by the streaming of children in schools. Phenotypic correlations between different measures of noncognitive skills are mostly very low ( Figure 1). This suggests that these different measures are capturing empirically distinct phenomena. The SDQ scale is the only measure that correlates consistently with other noncognitive skills at >0.2. This between-trait correlation is strongest for social skills ( =0.24 to 0.49), communication at age 10 ( =0.29 to 0.56) and empathy ( =0.17 to 0.38). Between-skill phenotypic correlations are almost exclusively positive, suggesting that where patterns are observed, children who score highly on one noncognitive skill are also more likely to score highly on another. All noncognitive skills except for the teacher reported measures of SDQ, locus of control at age 8 and the intellect/imagination personality type correlate weakly with cognitive skills ( <0.25 for IQ at age 8 and <0.20 for IQ at age 15). The highest phenotypic correlation is between IQ and the intellect/imagination personality type ( =0.33 for IQ at age 8 and =0.35 for IQ at age 15).

Do noncognitive skills associate with education and labour market outcomes?
Phenotypic correlations with educational attainment were low for most noncognitive skills. Correlations were highest for the SDQ scale ( =0.13 to 0.42) and the teachers' assessments were more strongly associated than the contemporaneous parent reports. For example, the correlation between educational attainment at age 11 and teacher reported SDQ at age 10 was =0.40 while the correlation with parent reported SDQ at age 10 was =0.29. Correlations with educational outcomes were modest for social skills ( =0.13 to 0.27); communication at age 10 ( =0.18 to 0.34), locus of control ( =0.26 to 0.36 at age 8; =0.19 to 0.28 at age 16), the agreeableness and the intellect/imagination subscales of the Big 5 ( =0.16 to 0.28 and =0.27 to 0.39 respectively). These contrasted strongly with the higher correlations between cognitive ability and educational attainment ( =0.43 to 0.73). Phenotypic correlations for noncognitive skills with employment and NEET were very low ( <=0.08 for employment; <=0.07 for NEET) but were slightly higher between some measures and income ( = -0.01 to 0.19). Correlations between labour market outcomes and cognitive skills were also very low ( <=0.09). Many noncognitive skills were weakly negatively correlated with non-response at ages 18 and 24 ( =0.05 to 0.15). Only the extraversion subscale of the Big 5 was positively correlated with non-response ( =0.06).
To further investigate the potential impact of skills we ran a series of regressions of age 16 educational attainment on noncognitive and cognitive skills ( Figure 2). Each skill was standardised and analysed independently controlling for sex and month of birth. A one standard deviation (SD) increase in noncognitive skills was associated with a 0.04 SD decrease to 0.41 SD increase in age 16 attainment. There was considerable heterogeneity for estimates between skills and measurement occasions for all noncognitive measures except the SDQ, which was similarly consistent to IQ. By comparison, a one SD increase in cognitive skills was consistently associated with a 0.5 SD increase. These patterns were similar for educational attainment at all ages (Supplementary Figure 1).  Table 2). The SDQ scale is the only noncognitive measure for which there was nonzero heritability at multiple occasions (ℎ 2 =0.18 to 0.23). Results are broadly comparable when using the internalising and externalising subscales of the SDQ (Supplementary Figure 2). Non-zero heritability is observed for communication at age 18 months (ℎ 2 =0.17), self-esteem at age 18 (ℎ 2 =0.25), locus of control at age 8 (ℎ 2 =0.21), and the 'intellect/imagination' subscale of personality type (ℎ 2 =0.21). Imprecision of the heritability point estimates means that non-zero heritabilties cannot be ruled out, but the upper bounds are estimated below 0.30 for most noncognitive skills. In contrast, the heritability of cognitive skills is far higher (ℎ 2 =0.43 and 0.47 at ages 8 and 15 respectively). Educational outcomes are highly heritable (ℎ 2 >0.4) but there is little evidence of heritability for labour market outcomes. Heritability of questionnaire non-response was estimated higher and with less uncertainty than the noncognitive measures (ℎ 2 =0.34 and ℎ 2 =0.21 at ages 18 and 24 respectively).

Figure 3: Heritability of skills and outcomes
Do noncognitive skills have a shared genetic architecture?
There was very limited evidence for genetic correlations across noncognitive skills (Figure 1, above the diagonal), though estimation precision is low for most skills. Genetic correlations within traits over time are only observed for the parent reported SDQ measures ( = 0.62 to 1.00) and cognitive skills which have near-unity genetic overlap ( = 0.97). Genetic correlations between different noncognitive measures are only observed between the parent, but not teacher, reported SDQ measures and communication at age 10 ( =0.68 to 0.91). The SDQ measures, communication at age 10, self-esteem at age 18, the agreeableness and the intellect/imagination subscales of the Big 5 personality types had strong genetic correlations with educational attainment. These were higher for teacher report at age 7 than any of the parents and teacher reported at age 16 SDQ measures. Genetic correlations between cognitive measures and educational attainment are all estimated near unity ( >0.96). There was little evidence of genetic correlations between labour market outcomes and any of the noncognitive or cognitive skills.

Discussion
Our results question the notion that noncognitive skills are consistent, reliable or highly predictive of outcomes 8 , and provide two key contributions to the literature. First, except for the SDQ scale, measures of noncognitive skills were weakly correlated (phenotypically and genotypically) over time. This contrasts with previous research that has demonstrated temporal stability of noncognitive skills 15,28 , and strongly cautions that many measures of noncognitive skills fail to capture a consistent underlying trait. Correlation of non-cognitive skills across different measures were also weak. The SDQ scale was also the only measure to consistently correlate with other noncognitive skills, namely social skills, communication and empathy. These results conform to a previous study that found low phenotypic correlations between different noncognitive skills 11 . This low between-trait consistency could reflect that different noncognitive measures capture fundamentally different underlying traits, but the lack of within-trait consistency over time suggests that this is unlikely. Our results also question previous suggestions that cognitive skills reflect noncognitive skills 3 as we observed only weak correlations between noncognitive and cognitive measures. By contrast, the cognitive measures of IQ were far more strongly correlated over the same period. There was very limited evidence of genetic correlations within and between noncognitive skills. The only measure for which we observed consistent genetic architecture was the parent reported SDQ scale. This could reflect the influence of shared parent-offspring genetics on reporting or parents genetics influencing offspring noncognitive skills indirectly through dynastic effects 46 ; there would be no such shared teacher-student genetics. Previous twin studies have found strong genetic correlations between noncognitive and cognitive skills 40,41 , but our results did not support this. The only noncognitive measures that were genetically correlated with cognitive measures were the teacher reported SDQ scale, locus of control, and the agreeableness and intellect subscales of the Big 5. The differences between these results and previous studies may have arisen due to several reasons. First, the noncognitive skill measures in ALSPAC may have been of lower quality than those used in previous studies. However, most of the measures we used have been widely validated and are consistent with those used in previous studies so this is unlikely to account for all the differences. Second, we used longitudinal measures that spanned the whole of childhood whereas previous studies used cross sectional measures or single repeat measurements. However, there was no age at which all noncognitive skills appeared more consistently measured and this is therefore also unlikely to account for all differences. Third, it is possible that the mothers who reported on many of the noncognitive skills were providing biased responses in cases where their child performed poorly. Again, while possible this is unlikely to have accounted for all the differences between this and previous studies. Fourth, it is possible that the ALSPAC cohort are fundamentally different to previously analysed study populations. Finally, it is possible that results from previous studies have been due to chance or selectively reported. While we reported every noncognitive skill in ALSPAC, many previous studies have reported only one or a small number of noncognitive skills.
Second, associations between noncognitive skills and socioeconomic outcomes were generally weak, contradicting findings from previous studies 3,8,18 and supporting a recent systematic review that highlighted the uncertainty around the impact of these skills on socioeconomic outcomes 9 . That we were able to use multiple measurements of noncognitive skills at different ages and a range of outcomes strengthens our findings. Behavioural problems as captured by the SDQ scale, social skills, and locus of control were the only noncognitive measures to phenotypically associate with educational outcomes strongly and consistently, findings that both support and contradict previous studies 47,48 . Neither noncognitive nor cognitive skills associated strongly with labour market outcomes, which while in contrast to previous studies 3, 8 . It has been previously claimed that personality is more important than cognitive ability for predicting socioeconomic outcomes such as education 49 , but our results do not support this: the effect sizes of noncognitive skills were at most around half that of cognitive skills. This greater contribution to educational attainment of cognitive than noncognitive skills may be due to several reasons. First and foremost, this may reflect that our study participants had only recently entered the labour market. Second, cognitive skills may fundamentally be a more important driver of educational differences between children than noncognitive skills. UK educational attainment is heavily assessed by performance in tests and exams (which are highly reflective of cognitive test environments). Third, cognitive tests have a long history of development and are therefore likely to be far more accurate and consistent than noncognitive measures. Fourth, cognitive skill assessment is likely to be more objective than noncognitive skill assessments, which can be context dependent. Finally, our cognitive measures were based upon direct assessment of the study children while the noncognitive measures were based upon parental reports. Further work using self-reported noncognitive skills is required to examine the full impact of this. Associations were stronger between educational attainment and the teacher than the parent reported measures of the SDQ scale, suggesting that teachers may more accurately identify education-associated factors related such as problematic behaviour than parents 50 . Many noncognitive and cognitive skills were weakly negatively correlated with non-response phenotypically, suggesting that individuals who score low on these skills are more likely to drop out of studies. Cognitive ability and attainment were strongly negatively correlated genotypically with non-response, adding to the growing body of evidence that non-response is genetically patterned 51,52 . This may have important implications for participant representativeness and generalisability in cohort studies. Each of the SDQ scale, communication, locus of control (age 8), and the agreeableness and intellect/imagination subscales of the Big 5 personality types were strongly genetically correlated with educational attainment. Genetic correlations with educational attainment were higher for teacher than parent reported measures of the SDQ scale. This may reflect that teachers can more accurately identify and report problem behaviours than parents, or that there are different aspects of child behaviour and performance that they focus on. Many of the genetic correlations we observed were imprecise due to the low heritability of noncognitive skills, but our estimates suggested an upper bound heritability of 0.3 for most skills. The SDQ scale was the only noncognitive measure for which we consistently estimated non-zero heritability (~0.15), far lower than the estimated heritabilties for cognitive measures (~0.45) and educational attainment (~0.50).
Our results suggest that while many of these measures of noncognitive skills would be poor intervention targets for policies to improve educational or labour market outcomes, interventions to improve behavioural problems as captured by the SDQ scale may have merit. Most of the noncognitive measures we used were poorly correlated over time suggesting a poor signal to noise ratio characterised by measurement error. The SDQ scale uses responses to a large battery of questions which may lead to proportionately smaller measurement error compared to the signal of the underlying trait that it captures. If this is the case, then future research requires more detailed measures of non-cognitive skills are required rather than opportunistic single item measures as have often been used. The lack of consistency observed for other skills may also have reflected other mechanisms including i) genuine temporal intra-individual variation amongst study participants in the expression of these skills; ii) differences over time (e.g. due to schooling); or iii) different responders (e.g. parent vs. teacher). Regardless of the mechanism though, the inconsistency suggests that many measures are likely too noisy and variable for intervention as these measures are imprecise indicators of underlying traits. It is of course possible that there are latent underlying noncognitive skills which are not accurately captured by the measures we used, but our findings suggest that noncognitive skills may be more heterogenous than is widely appreciated.
This study has several limitations. First, it is possible that measurement error was unusually high in the noncognitive measures used in the ALSPAC study. However, the measures used in ALSPAC have been widely used across the literature and this has not previously been identified as a problem 12,53 . Furthermore, measurement error would need to have been high across all measures used from birth to age 18 so is unlikely to explain our results. Future studies into test-retest reliability of noncognitive skills based on different longitudinal samples could help resolve these questions. Second, many of the genetic correlations were estimated with extremely low precision, often being constrained at the values of -1 or 1 (see supplementary Table 2), and should therefore be interpreted with caution. This may be due to the low estimated heritability of the noncognitive skills; low heritability implies a small contribution of variants (either in the number of variants associated or the strength of associations) and therefore lower power to detect genetic correlations between these skills. It may also be due to our fairly low sample sizes (n=2,545 to 8,868) and the resulting power to detect univariate and bivariate genetic associations. Future studies conducted on larger samples are required to more accurately estimate heritability of, and genetic correlations between noncognitive skills and other phenotypes. Third, our genetic associations may have been biased by uneven linkage disequilibrium, residual population structure, or assortative mating 46,54 . We controlled for the first twenty principle components of population structure to account for population structure, however we this may not have accounted for all differences 55 . While assortative mating is thought to be low for noncognitive traits 38 , there is evidence of assortment on psychiatric traits such as ADHD 56 . This may have biased our estimates, but previous work demonstrates that this should inflate rather than deflate genetic associations where assortment is positive 57 . It is possible that assortment on non-cognitive skills may be negative and future work is required to determine this. Previous work using ALSPAC has demonstrated that the heritability of cognitive variables and educational outcomes are estimated higher than in other samples and we may therefore expect that our heritability of noncognitive skills is unlikely to be underestimated.
In conclusion, many noncognitive skills are poorly correlated over time and associate weakly with education and labour market outcomes. The SDQ scale is an exception, providing a noncognitive measure that is reliable and associates with educational outcomes albeit less strongly than cognitive skills. While the variability of many noncognitive skills over time questions their suitability as interventional targets 8 , interventions based on behavioural problems in childhood may have utility for improving educational outcomes.

Study sample
Participants were children from the Avon Longitudinal Study of Parents and Children (ALSPAC). Pregnant women resident in Avon, UK with expected dates of delivery 1st April 1991 to 31 st December 1992 were invited to take part in the study. The initial number of pregnancies enrolled was 14,541. When the oldest children were approximately 7 years of age, an attempt was made to bolster the initial sample with eligible cases who had failed to join the study originally. This additional recruitment resulted in a total sample of 15,247 pregnancies, resulting in 14,899 children who were alive at one year of age. From this sample genetic data is available for 7,988 after quality control and removal of related individuals (see Genetic data below). For full details of the cohort profile and study design see 58,59 . The study website contains details of all the data that is available through a fully searchable data dictionary and variable search tool at http://www.bristol.ac.uk/alspac/researchers/our-data/. The ALSPAC cohort is largely representative of the UK population when compared with 1991 Census data; there is under representation of some ethnic minorities, single parent families, and those living in rented accommodation 58 . Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees. We used the largest available samples in each of our analyses to increase precision of estimates, regardless of whether a child has data on other noncognitive skills. Sample sizes range from 2,545 to 8,868 (Supplementary Table 1).

Genetic data
DNA of the ALSPAC children was extracted from blood, cell line and mouthwash samples, then genotyped using references panels and subjected to standard quality control approaches. ALSPAC children were genotyped using the Illumina HumanHap550 quad chip genotyping platforms by 23andme subcontracting the Wellcome Trust Sanger Institute, Cambridge, UK and the Laboratory Corporation of America, Burlington, NC, US. The resulting raw genome-wide data were subjected to standard quality control methods. Individuals were excluded on the basis of gender mismatches; minimal or excessive heterozygosity; disproportionate levels of individual missingness (>3%) and insufficient sample replication (IBD < 0.8). Population stratification was assessed by multidimensional scaling analysis and compared with Hapmap II (release 22) European descent (CEU), Han Chinese, Japanese and Yoruba reference populations; all individuals with non-European ancestry were removed. SNPs with a minor allele frequency of < 1%, a call rate of < 95% or evidence for violations of Hardy-Weinberg equilibrium (P < 5E-7) were removed. Cryptic relatedness was measured as proportion of identity by descent (IBD > 0.1). Related subjects that passed all other quality control thresholds were retained during subsequent phasing and imputation. 9,115 subjects and 500,527 SNPs passed these quality control filters.
ALSPAC mothers were genotyped using the Illumina human660W-quad array at Centre National de Génotypage (CNG) and genotypes were called with Illumina GenomeStudio. PLINK (v1.07) was used to carry out quality control measures on an initial set of 10,015 subjects and 557,124 directly genotyped SNPs. SNPs were removed if they displayed more than 5% missingness or a Hardy-Weinberg equilibrium P value of less than 1.0e-06. Additionally, SNPs with a minor allele frequency of less than 1% were removed. Samples were excluded if they displayed more than 5% missingness, had indeterminate X chromosome heterozygosity or extreme autosomal heterozygosity. Samples showing evidence of population stratification were identified by multidimensional scaling of genome-wide identity by state pairwise distances using the four HapMap populations as a reference, and then excluded. Cryptic relatedness was assessed using a IBD estimate of more than 0.125 which is expected to correspond to roughly 12.5% alleles shared IBD or a relatedness at the first cousin level. Related subjects that passed all other quality control thresholds were retained during subsequent phasing and imputation. 9,048 subjects and 526,688 SNPs passed these quality control filters.
We combined 477,482 SNP genotypes in common between the sample of mothers and sample of children. We removed SNPs with genotype missingness above 1% due to poor quality (11,396 SNPs removed) and removed a further 321 subjects due to potential ID mismatches. This resulted in a dataset of 17,842 subjects containing 6,305 duos and 465,740 SNPs (112 were removed during liftover and 234 were out of HWE after combination). We estimated haplotypes using ShapeIT (v2.r644) which utilises relatedness during phasing. The phased haplotypes were then imputed to the Haplotype Reference Consortium (HRCr1.1, 2016) panel of approximately 31,000 phased whole genomes. The HRC panel was phased using ShapeIt v2, and the imputation was performed using the Michigan imputation server. This gave 8,237 eligible children and 8,196 eligible mothers with available genotype data after exclusion of related subjects using cryptic relatedness measures described previously. Principal components were generated by extracting unrelated individuals (IBS < 0.05) and independent SNPs with long range LD regions removed, and then calculating using the `-pca` command in plink1.90.

Noncognitive skills
We used all noncognitive skills outlined in Box 1 that are available in ALSPAC, except for attention which was omitted due to low frequency of events (<5%).  The SDQ is a scale used to assess child emotional and behavioural difficulties 60 and is one of the most widely used questionnaires for evaluating psychological well-being amongst children. It consists of 25 items that cover common areas of emotional and behavioural difficulties (emotional symptoms; conduct problems; hyperactivity/inattention; peer relationship problems; and prosocial behaviour). Responses to each question are on a three-point scale: not true, somewhat true and certainly true, coded to scores of 0, 1 and 2 respectively. The individual subscales of the SDQ have relatively low-prevalence in ALSPAC so we used total SDQ score which is defined as the count of problems on the first four scales. To ensure that our results are not being driven by differences in the internalising (emotional symptoms and peer relationship problems) or externalising (conduct problems and hyperactivity/inattention symptoms) subscales we also ran sensitivity analyses on these separate sub-scales. All SDQ scores are reverse coded so that high values refer to fewer problems.

Denver
The Denver Developmental Screening Test 61 was used to identify developmental problems in young children at ages 6, 18, 30 and 42 months. ALSPAC mothers were asked to report their child's development in response to 42 questions across four different categories: social and communication skills, fine motor skills, hearing and speech, and gross motor skills. Responses to questions were 'often', 'once or twice' and 'not yet started', and were coded with the values of 2, 1 and 0 respectively. Prorated scores combining all four scales were used to boost sample size with missing values assigned the mean score of that child's responses, provided that three or less items had missing scores.

Social skills
Social skills at age 13 were determined using a battery of 10 questions reported by the study mother, such as being "easy to chat with, even if it isn't on a topic that specially interests her". Responses were reported on a five-point scale indicating the mother's perception of how well her teenager's social skills compared to her perception of the average teenager, and then summed to provide a total overall social skills score.

Communication
Communication at 6 months was calculated from mother-reported responses to a battery of eight questions asking about the development of their child's communication skills such as "S/he turns towards someone when they are speaking". At age 1 communication was calculated from motherreported responses using the MacArthur Infant Communication questionnaire 62 was derived at age one as the sum of mother reported responses to 82 questions across four domains of communication: understanding; vocabulary; non-verbal communication; and social development. At 18 months communication was calculated using mother-reported responses to a battery of 14 questions asking about the development of their child's communication skills. At age 3 communication was calculated from mother-reported responses to a battery of 123 questions forming a vocabulary score. At age 10 communication was calculated from mother-reported responses as the sum of five domains of communication from a total battery of 39 questions. Responses were coded as 2, 1 and 0 reflecting the level of communication and understanding of communication that the child demonstrated. A final communication score was created as the sum of these responses.

Self-esteem
Self-esteem at age eight was measured using self-report responses to the 12-item shortened form of Harter's Self Perception Profile for Children 63 comprising the global self-worth and scholastic competence subscales. Self-esteem at age 18 was measured using self-report responses to 10 questions of the Bachman revision of the Rosenberg Self-Esteem Scale 64,65 . At each age responses are summed to give an overall score. Responses are coded on a Likert scale with the value 0 assigned to the most negative statements and the value 4 assigned to the most positive statement, so that high scores correspond to high self-esteem.

Persistence
Persistence at age 6 months was measured as a weighted score from mother-reported responses to seven questions relating to child temperament. At age 2 persistence was measured as a weighted score from nine mother-reported responses. Questions included items such as "He perseveres for many minutes when working on a new skill (rolling over, picking up object, etc)" and "He loses interest in a new toy or game within an hour" and were recorded on a five-point Likert scale. At age 7 persistence was recorded by ALSPAC interview testers as the study child's persistence when completing the word reading and decoding session. In this session children were shown a series of pictures accompanied by four words starting with the same letter as the picture and were asked to point to the correct word. Response options were coded into persistent (combining persistent and sometimes persistent) and non-persistent. Testers indicated whether they thought the child was persistent with tasks during the session. On all measures higher scores correspond to higher levels of persistence.

Locus of control
Locus of control, the strength of connection between actions and consequences, was measured at ages 8 and 16. At age eight it was measured using responses to 12 questions from the shortened version of the Nowicki-Strickland Internal-External (NSIE) scale for preschool and primary children (the non-cartoon format Preschool and Primary Nowicki-Strickland Internal-External scale) 66 . At age 16 locus of control was measured using the 12 item Nowicki-Strickland Locus of Control Scale 67 . These scales use questions such as "do you feel that wishing can make good things happen?" and were reverse coded so that high values denote higher locus of control.

Empathy
Empathy was measured at age seven using mother reported responses to five questions about the child's attitudes towards sharing and caring, with responses were on a four-point Likert scale. Responses were coded and summed so that a higher score corresponds to higher levels of empathy.

Impulsivity
Impulsivity was measured during two sessions at the age 8 direct assessment using a behaviour checklist. Testers rated whether the children demonstrated restlessness, impulsivity, fleeting attention, and lacking persistence. Responses were coded to 0 (behaviour not characteristic of the child), 1 (behaviour somewhat characteristic of the child), or 2 (behaviour characteristic of the child). Responses were summed and the mean value of the two sessions was used for each child. At age 11 the children were asked a battery of 10 questions designed to capture impulsive behaviour such as "have you spent all of your money as soon as you got it?". Responses were binary and were summed to give a total impulsivity score.

Personality
Personality was measured at age 13 using the five-factor model of personality. Five broad and independent dimensions of personality, the "Big Five" (extraversion, neuroticism, agreeableness, conscientiousness, and intellect), explain a major portion of judged interindividual difference in personality. They are treated as dimensions with individuals varying continuously along these and most people falling between the extreme. They were measured using self-report responses to 50 items of the International Personality Item Pool in which participants indicate the extent to which statements describe their personality on a five point scale 68 .
Cognitive skills and outcomes IQ Intelligence was measured during the direct assessments at ages eight and 15 using the short form Wechsler Intelligence Scale for Children (WISC) from verbal, performance, and digit span tests and the Wechsler Abbreviated Scale of Intelligence (WASI) from vocabulary and matrix reasoning tests respectively. These assessments were administered by members of the ALSPAC psychology team overseen by an expert in psychometric testing. The short form tests have high reliability 69 and the ALSPAC measures utilise subtests with reliability ranging from 0.70 to 0.96. Raw scores were recalculated to be comparable to those that would have been obtained had the full test been administered and then age-scaled to give a total overall score combined from the performance and verbal subscales.

Educational attainment
We used four measures of educational attainment. The first three were average fine-graded point scores from three end of 'Key Stage' assessments during compulsory education at ages 11, 14 and 16. As the final stage of compulsory education when the ALSPAC cohort were in school, performance in the age 16 exams impacted further education and employment opportunities. Point scores were used as a richer measure of attainment than categorical level bandings during compulsory education. The fourth measure as a ranking of grades attained in post-compulsory A-levels at age 18, which are a requirement for progressing to university education. We used a measure of the three highest A-level grades grouped into ordered categories (see 70 for a detailed description). At the time the cohort were studying, A-levels were non-compulsory and therefore all participants who did not continue into further education had a coded value of zero. All measures were obtained through data linkage to the UK National Pupil Database (NPD) which represents the most accurate record of educational attainment available in the UK. We extracted the age 11, 14 and 16 scores from the NPD Key Stage 4 (age 16) database as this provided the largest sample size at earlier ages. Age 18 grades were extracted from the NPD Key Stage 5 database (for further information see https://www.gov.uk/government/collections/national-pupil-database).

Employment
At age 23 participants were asked to report whether they were in full time paid employment of more than 30 hours per week, with responses coded as binary.

NEET
Because some participants may not be employed because they are still in full-time education or training, we used a measure of not in education employment or training (NEET) at age 23 with responses coded in binary. The use of a NEET measure ensures that employment results are not biased by participation of the cohort in education or training.

Non-response
We also include a binary measure of questionnaire non-response at ages 18 and 24 to allow us to investigate correlations between noncognitive skills and cohort participation.

Statistical analysis
To determine the consistency of noncognitive skills across measures and over time we estimated phenotypic correlations between each measurement-pair (45 measurements [34 noncognitive skills; 2 cognitive skills; 7 socioeconomic outcomes; 2 non-response measures]; 990 unique measurementpairs). Heritability of each occasion-specific noncognitive skill was estimated using genomicrelatedness-based restricted maximum likelihood (GREML) in the software package GCTA (see 71 for a detailed description of this method). GCTA uses measured SNP level variation across all SNPs (see Genetic data) to estimate the genetic similarity between each pair of unrelated individuals in the sample. Univariate analyses are specified as: where is the inverse normally rank transformed sex and age of measurement standardised measure of phenotype, is a series of covariates indicating the first 20 principal components of inferred population structure to control for systematic differences in allele frequencies due to ancestral differences between different subpopulations (population stratification), is a normally distributed random effect with variance 2 denoting the contribution of SNPs, and is residual error with variance 2 . Heritability is then defined as the proportion of total phenotypic variance (genetic variance plus residual variance) explained by common genetic variation: Non-zero heritability estimates indicate that genetically similar pairs are more phenotypically similar than genetically dissimilar pairs. To estimate the extent to which noncognitive traits share underlying genetic architecture we estimate genetic correlations between each phenotype-pair. Genetic correlations provide an estimate of the extent to which the same genetic variants associate with two phenotypes, that is, the overlap of genetic associations between two phenotypes. Genetic correlations are estimated as: Where is the genetic correlation between phenotypes and , ( ) is the genetic variance of phenotype and ( , ) is the genetic covariance between phenotypes and . All analyses are adjusted for false discovery rate using the Benjamini-Hochberg procedure 72 and include the 20 principal components of population structure.