Analysis of immune-related loci identifies 48 new susceptibility variants for multiple sclerosis

Using the ImmunoChip custom genotyping array, we analysed 14,498 multiple sclerosis subjects and 24,091 healthy controls for 161,311 autosomal variants and identified 135 potentially associated regions (p-value < 1.0 × 10-4). In a replication phase, we combined these data with previous genome-wide association study (GWAS) data from an independent 14,802 multiple sclerosis subjects and 26,703 healthy controls. In these 80,094 individuals of European ancestry we identified 48 new susceptibility variants (p-value < 5.0 × 10-8); three found after conditioning on previously identified variants. Thus, there are now 110 established multiple sclerosis risk variants in 103 discrete loci outside of the Major Histocompatibility Complex. With high resolution Bayesian fine-mapping, we identified five regions where one variant accounted for more than 50% of the posterior probability of association. This study enhances the catalogue of multiple sclerosis risk variants and illustrates the value of fine-mapping in the resolution of GWAS signals.


Basic content
ImmunoChip is an Illumina Infinium HD custom array designed to enable fine mapping of established autoimmune loci and deeply replicate autoimmune GWAS results. 1 In total 207728 variants were submitted for inclusion on the chip; of which 196524 passed manufacturing quality control and were ultimately included on the chip (192402 autosomal, 1595 X-linked, 1735 Y-linked, 791 pseudoautosomal and one mitochondrial). In its final form the chip included 189 non-MHC fine mapping regions, two on the X chromosome and the other 187 on the autosomes. Three of the autosomal regions included less than 10 usable markers; leaving 184 deeply interrogated non-MHC fine mapping regions on the chip.
The multiple sclerosis contribution to the ImmunoChip was based on our meta-analysis of the published GWAS, 2-5 their related follow-up efforts [6][7][8][9] and a preliminary analysis of the GWAS we performed in collaboration with WTCCC2. 10 Based on these data we nominated a total of 26 regions for fine mapping (including the MHC); 15 of these were uniquely nominated by multiple sclerosis, while the other 11 were also nominated by at least one other autoimmune disease group in the ImmunoChip consortium.
We also nominated 5887 SNPs for replication; 4794 identified from our meta-analysis of published GWAS 11 (including all 206 SNPs with a p value of < 10 -5 in that analysis) and a minimally overlapping set of 1130 identified from our preliminary analysis of the IMSGC and WTCCC2 GWAS 10 (including 643 independent signals and proxies for these where available). In total, 2212 of these were eventually included on the chip, of which 1917 passed QC, including 1012 of the 4794 originally nominated from the metaanalysis 11 and 935 of the 1130 originally nominated from the preliminary analysis of the IMSGC and WTCCC2 GWAS. 10 Some 481 of these are part of fine mapping intervals leaving 1436 outside these intervals; 776 from the meta-analysis 11 and 671 from the latest GWAS 10 (only 11 markers were nominated by both). Linkage disequilibrium (LD) pruning these sets to generate independent signals left 567 from the meta-analysis 11 and 317 from the GWAS 10 (with just one marker in common).
In addition a further 2109 SNPs of local interest (wild cards) were also nominated, 659 of these were eventually included on the chip and 394 passed QC. Some 110 of these were included in fine mapping regions. The remainder (284) included several regions of extensive LD, so after LD pruning there were just 52 independent signals. The multiple sclerosis related SNPs selected for inclusion on the ImmunoChip were picked at an early stage in the analysis of the main GWAS (and the meta-analysis of previous GWAS) with the result that not all of the 57 regions eventually reported in the GWAS have been included as fine mapped regions on ImmunoChip.
Eleven of the 935 SNPs nominated from the 2011 GWAS 10 are included amongst the 48 novel associated SNPs listed in Table 1 of the main text. A further 20 of the SNPs on Table 1 have proxies amongst the 935 (13 of which pass the p < 1 x10 -4 threshold in the discovery analysis). Overlap also exists with the results from the previous meta-analysis of GWAS, 11 with 5 of the SNPs from Table 1 having proxies amongst the 1012 SNPs nominated from that study. In total >50% (26/48) of the associations listed in Table 1 would have been captured by follow up study of just these MS specified sources. Genomewide significant evidence for association was seen in multiple sclerosis in 25 of the 26 fine mapping regions nominated by the IMSGC and WTCCC2 (one region containing two independent signals of association). The only nominated region that failed to identify such evidence was one of the three fine mapping regions on the ImmunoChip where there were ultimately less than 10 markers passing QC. In total 59 of the 97 associations reaching genomewide significance in our ImmunoChip data lie in fine mapping regions (two regions each contain two signals of association and one region contains three such signals). Substantial enrichment for genuine association was also seen amongst the deep replication SNPs nominated by IMSGC and WTCCC2, as is demonstrated by the marked genomic inflation apparent in the QQ plots for these SNPs (those passing QC and lying outside the fine mapping regions) (see Supplementary Figure 1 and Supplementary Figure 2). The multiple sclerosis specified content on the ImmunoChip included proxies for all but 14 of the 48 novel associations reaching genome-wide significance.

Previously published primary associations
In our 2011 GWAS 10 we listed 57 primary associations -23 that had been identified in earlier GWAS (and their related follow up efforts) and 34 that were novel (29 reaching genomewide significance and 5 that just missed this threshold; all of these 5 have subsequently been confirmed with genomewide significance 12 ). We listed a further 3 different SNPs in our meta-analysis of previously published GWAS, 11 however two of these are now known to be tagging already reported signals (rs170934 and rs6718520) and the final SNP (rs2150702) has no good proxies included on the ImmunoChip. Of the 57 primary SNPs from the 2011 GWAS 10 51 were included on the ImmunoChip and good proxies for the other six were also included (see Supplementary Table 3). For every SNP the previously associated allele was again over represented in ImmunoChip cases and for all but 2 SNPs (rs2028597 and rs802734) the difference was at least nominally significant (one sided p < 1.0 x 10 -1 ). Association with rs9657904 (r 2 = 0.271 and D` = 1 with rs2028597) in CBLB was originally reported in the Sardinian GWAS 13 . There has been modest replication in other populations such as the continental Italian population, 14 but never as strong. We conclude that the lack of signal in the ImmunoChip data set may be due to both an LD specific effect in the Sardinian population and a lack of adequate tagging. The SNP rs802734 has a D' of 1.0 with the nearby SNP rs9482848 (less than 2kb) which is modestly associated (p = 1.0 x 10 -3 ) indicating that rs802734 is probably only modestly tagging the functionally relevant variant at this site.

Previously published secondary associations
In our original GWAS 10 conditional analysis based on the primary signals suggested the existence of 7 additional secondary associations, 4 of these replicated and were therefore reported -one (rs12048904) close to rs11581062 on Chr1, another (rs7090512) close to rs3118470 on Chr10 and two signals (rs4285028, rs4308217) close to rs9282641 on Chr3.

Chromosome 1p21 (in the region of VCAM1, EXTL2 and SLC30A7)
In the original GWAS 10 the most associated (lead) SNP in this region was rs11581062. Conditioning on this lead SNP revealed association with a second SNP rs12048904; both of these signals replicated and gave genomewide significant association in the combined analysis (after conditioning). 10 There is very little LD between these two SNPs and conditioning on both showed that there was no residual signal in the region beyond these. 10 Unfortunately rs12048904 was not included on the ImmunoChip, however a near perfect proxy rs12027668 was included. In the ImmunoChip data the evidence for association in the region was less pronounced (regression to the mean) and in these data the most associated (lead) SNP was found to be rs7552544. This SNP is in modest LD with both rs11581062 and rs12027668 (the proxy for rs12048904) and in both cases the risk alleles are positively correlated, indicating that association at rs7552544 is driven by both signals. Conditioning the ImmunoChip data on rs7552544 we found no residual signal at the most strongly correlated SNP, rs12027668 (p = 9.1 x 10 -1 ), but nominally significant evidence of association at rs11581062 (p = 1.2 x 10 -2 ) was apparent, as would be expected. Based on the ImmunoChip data alone one would likely have concluded that association in this region was modest and driven by a single variant (either rs7552544 or a variant in LD), however by chance these signals were stronger in the GWAS data set 10 where it was clear that association in the region is driven by two independent signals both in modest LD with rs7552544. In short in the ImmunoChip data it was not possible to resolve the modest apparent signal to the two independent signals underlying it. Supplementary Table 4 summarizes the association results for these SNPs and the pattern of LD between them.

Chromosome 10p15 (in the region of IL2RA)
Our 2007 GWAS 2 and related follow up efforts 6 identified rs2104286 as the most strongly associated SNP in this region. Although this SNP was not included in our most recent GWAS 10 the association was still identified, with the most associated (lead) SNP in the latest GWAS being rs3118470. Conditioning on this lead SNP revealed association with a second SNP rs7090512 (again both of these signals replicated and gave genomewide significant association in the combined analysis, after conditioning). 10 However the substantial LD between these two SNPs meant that effect on risk was best related to haplotypes of these two SNPs rather than to individual alleles (see supplementary file from the original GWAS 10 ). ImmunoChip included rs2104286 and rs7090512 but not rs3118470 (which failed QC). A reasonable proxy for rs3118470 was included on the ImmunoChip (rs4147359). Of note all these SNPs are in LD with each other. The most associated SNP in the ImmunoChip (rs2104286) seems to account for the whole signal in this region, with no evidence for any association left at rs4147359 (the proxy for rs3118470) or rs7090512 after conditioning. Supplementary Table 5 summarizes the association results for these SNPs and the pattern of LD between them.

Chromosome 3q13 (in the region of CD86 and SLC15A2)
This is the most complex of the regions containing second signals identified in our most recent GWAS. 10 Conditioning on the most strongly associated SNP from the region (rs9282641) not only revealed a second associated SNP (rs4285028) but after conditioning on both of these a third signal was also apparent (rs4308217). All three signals replicated and were genomewide significant in the combined analysis, however, as with the IL2RA region, the LD between two of the SNPs (rs9282641, the primary signal and rs4308217 the tertiary signal) meant that risk was best assigned to haplotypes of these two SNPs rather than individual alleles. Of these three SNPs only rs9282641 was included on ImmunoChip, however rs12695416 (a proxy for rs4285028) and rs2255214 (a proxy for rs4308217) were also included. In the ImmunoChip data the most strongly associated SNP was rs1920296 (p=6.8x10 -15 ) which is a near perfect proxy for rs12695416 (r 2 =0.961, D'=1.0) and is thus unsurprisingly in LD with the secondary signal from the GWAS (rs4285028; r 2 =0.373, D'=0.80), and not with either of the other two SNPs identified in the GWAS (r 2 <0.04 for both); what had been the secondary signal in the GWAS is actually the strongest signal in the ImmunoChip data set. After conditioning on rs1920296, the most strongly associated SNP in the ImmunoChip data is rs2255214 (a proxy for the GWAS tertiary SNP rs4308217) although the original GWAS primary SNP (rs9282641) still yielded significant evidence for association. Conditioning on both rs1920296 and rs2255214 confirms some evidence for a residual signal from rs9282641. The analysis of this region is limited as it was not selected for fine mapping. The residual evidence for a haplotype effect suggests that an untyped variant from the region might still be responsible for the primary and tertiary signals identified in the GWAS. Supplementary Table 6 summarizes the association results for these SNPs and the pattern of LD between them.

Materials Samples
All cases included in this study were diagnosed by Neurologists familiar with multiple sclerosis in accordance with recognised diagnostic criteria that employ a combination of clinical and laboratory-based para-clinical information. [15][16][17] The clinical characteristics of the patients are typical and vary only modestly between groups according to local interests and ascertainment strategies as outlined below. Disease severity was measured using the Expanded Disability Status Score (EDSS) 18 and its dependent derivative the Multiple Sclerosis Severity Score (MSSS), 19 while clinical features related to course, relapse and progression were defined in accordance with consensus criteria. 16,[20][21][22] For cases analysed in the discovery phase, data regarding age at examination (AAE), age at onset (AAO), EDSS, MSSS and clinical course (primary progressive or bout onset) were available from 12327 (81%), 10795 (71%), 9631 (64%), 7934 (50%) and 12274 (81%) individuals respectively. See Supplementary Table 9 for clinical and demographic features of the cases.
As anticipated the index cases from the trio families were generally younger than patients in the case collections (reflecting the requirement for both parents to also be able to donate a DNA sample).

Local ascertainment
The ascertainment procedures for cases and controls from each population are described below; the non-European individuals genotyped as part of this project will be reported and described elsewhere. All cases and controls involved in this study gave valid informed consent in accordance with approval from the relevant local ethical committees or institutional review boards (IRBs).

Australia and New Zealand
All cases were self-identified volunteers recruited at centres located in Adelaide, Brisbane, Gold Coast, Hobart, Melbourne, Newcastle, Perth and Sydney and in New Zealand. All were confirmed by Neurologists.

Belgium
Samples were collected under coordination of the Neurology Department of the University Hospitals Leuven amongst out-patients and hospitalized patients with definite MS attending the Neurology Department of the University Hospitals Leuven or the "National MS Center" in Melsbroek (cases) and spouses of patients attending the Neurology Department of the University Hospitals Leuven (controls). Both centres are located 28 km apart in the centre of Belgium and recruited mainly amongst patients from the northern Dutch-speaking part of Belgium. Participation rate of patients attending these clinics is virtually 100%. At both centres, the majority of patients are followed-up longitudinally by neurologists specialized in MS with at least yearly visits. Approximately 40% of patients are being treated with an immunomodulatory therapy.

Denmark
Patients were recruited between 1996-2009 by neurologists at multiple sclerosis centers from across the whole of Denmark, although the majority of patients originate from the Copenhagen area. This clinic based approach means that the proportion of patients with relapse remitting multiple sclerosis (RRMS) is higher than is seen in the general population. Controls originate from staff members (10%) and healthy donor controls from the University Hospital Rigshospitalet in Copenhagen.

Finland
Cases were recruited from seven centres (Helsinki University Central Hospital, Tampere University Hospital, Kuopio University hospital, Oulu University Hospital, Seinäjoki Central Hospital, Satakunta Central Hospital and Rovaniemi Central Hospital) and thus come from many different regions of Finland. All were identified in hospital clinics by experienced neurologists. Part of the families have been recruited before 1998 as part of an effort to collect either multiplex families (at least two cases in a family) or trio/nuclear families (an affected individual with both parents and if not available with one parent and siblings), and the rest were recruited between 2000 and 2006 as trio/nuclear families by neurologists of the MGEN consortium. Data from the Dietary, Lifestyle, and Genetic determinants of Obesity and Metabolic syndrome study (DILGOM) subset of the national FINRISK 2007 Study were used as controls. 23 All individuals range in age from 25-74 years and were collected from five large study areas in Finland.

France
The French MS Genetics group (REFGENSEP) has been prospectively collecting samples with twenty-three centers covering France. Some patients volunteered in response to advertising campaigns through patient associations. Two annual meetings are organized for the national network REFGENSEP physicians in order to review procedures and diagnosis guidelines. Clinical information, informed consent and a blood sample were collected under the supervision of a trained physician. Control samples have been collected by the Etablissement Français du Sang (EFS) and come from all over France.

Germany
Three German centers contributed cases and controls: a) Hamburg Samples were collected from patients regularly seen in the MS outpatient clinic and day hospital of the Institute for Neuroimmunology and Clinical MS Research. The vast majority of these cases were derived from the local, i.e. Metropolitan region Hamburg, capture area. Hence, patient origin is largely from Northern Germany with the vast majority being of Caucasian descent. Control samples were also largely derived from this area, but since they were Institute and University employees, their geographic origin is more diverse, i.e. mostly from entire Germany, than the patients. b) Munich Samples from the TU in Munich can be stratified in three cohorts of cases and one cohort of controls. The first cohort comprises patients with multiple sclerosis from central Germany. The second cohort was recruited from across multiple sites in Germany and includes individuals with multiple sclerosis being treated with interferon-beta for at least 6 months. The third cohort of patients with multiple sclerosis was recruited in South-Eastern Germany. A cohort of controls was contributed from the population based study KORA (Cooperative Health Research in the Augsburg Region). c) University Mainz Samples were collected from outpatient clinics. Patients originate from different regions in Germany with the vast majority being of self-reported Caucasian descent. Collection was performed in Outpatient Clinics. Control samples were derived from university staff members of Caucasian descent.

Italy
Two Italian centers contributed cases and controls: a) Piedmont Patients were collected from continental Italy (excluding Sardinia) as part of the PROGEMUS (PROgnostic GEnetic factors in MUltiple Sclerosis) project, 87% of cases were collected in North-West Italy (Novara, Torino, Milano, Pavia) and 12% in Central Italy (Rome). These patients were all recruited from hospital based clinics; mean participation rate was approximately 60% (range 20%-90%).
Controls included Italian individuals (medical students, university and hospital staff, blood donors) matched for regional origin with the MS patients.

Sweden
The Swedish cases and controls were recruited from four different investigations. Two concern the role of genetic and environmental risk factors in the development of multiple sclerosis: 1. The Epidemiological Investigation of Multiple Sclerosis (EIMS) 24 which is a nationwide population-based case-control study of incident multiple sclerosis cases (1098 cases and 1134 controls). 2. Genetic Environment study in Multiple Sclerosis (GEMS) which is a nation-wide population-based case-control study of prevalent cases of MS (1554 cases and 950 controls). A third investigation is a nation-wide study of natalizumab treatment in multiple sclerosis patients (IMSE) (963 cases). 25 A fourth study is collection of samples collected during routine neurological diagnostic work-up at Karolinska University Hosptial and Danderyd Hospital, Stockholm, Sweden (455 MS patients). 26 In addition samples from 721 blood donors from the Stockholm area with Scandinavian ancestry are included. 27 There was some overlap of patients between studies. In total 2839 independent cases and 2800 independent controls were included in the current investigation.

United Kingdom (UK)
The majority of UK cases were collected through a national recruitment project ("the genetic analysis of multiple sclerosis") coordinated by the Department of Clinical Neurosciences at the University of Cambridge and involving additional recruitment centres based in UK cities -Aberdeen, Birmingham, Bristol, Cardiff, Exeter, Hull, Ipswich, Leicester, London, Manchester, Newcastle, Norwich, Nottingham, Oxford, Peterborough, Preston, Plymouth, Poole, Rotherham, Sheffield, Southampton and Stoke. These cases were supplemented by additional cases recruited as part of local natural history studies in the South West of England 28 and South Wales, 29 together with samples recruited as part of the Northern Isles Multiple Sclerosis (NIMS) study 30 and samples from the UK multiple sclerosis tissue bank (http://www.ukmstissuebank.imperial.ac.uk/). For a proportion of cases DNA was also collected from both parents to establish trio families (an affected individual and both parents); ultimately genotypes passing quality control in all three individuals were available from 621 trio families. UK controls were obtained from three sources: the National Blood Service (anonymised DNA samples stored in the UK blood transfusions services repository that was originally established to support the activities of the Wellcome Trust Case Control Consortium), the 1958 Birth Cohort (DNA samples obtained from subjects involved with the National Child Development Study, an epidemiological survey based on all individuals born in England, Wales and Scotland during one week in 1958, see www.b58cgene.sgul.ac.uk/followup.php) and the NIMS study (we included these controls to more accurately match the cases included from the Scottish Isles).

United States of America (USA)
Four centers in the USA contributed cases and controls: a) Brigham & Women's Hospital (BWH), Boston MA Study participants were recruited through the Partners MS Center in Boston, MA. All samples were collected at the MS Center and processed on site to extract DNA. Healthy control subjects were contributed by the BWH PhenoGenetic project, a collection of subjects 18-50 years of age living in the Boston metropolitan area who are self-reported to be healthy. b) University of Miami (UM) Study participants were recruited using multiple ascertainment approaches. However, the majority of participants were enrolled through the University of Miami Health System's designated MS Center of Excellence. Additional participants were recruited via MS community outreach events and support group meetings. Data on the clinical characteristics and clinical history of MS cases were obtained via medical chart review by a board-certified neurologist. Control participants included unrelated spouses and friends of affected individuals in addition to unaffected individuals in the general population. These control subjects were recruited via the same ascertainment mechanisms. c) University of California San Francisco (UCSF). Study participants were recruited from the UCSF MS clinic and from other collaborating sites across the United States using common inclusion and exclusion criteria. Phlebotomy was performed at the individual's preferred clinic, and blood samples were shipped to the UCSF laboratory by overnight courier. The dataset studied here is comprised of two groups, multicase families, in which at least one first-degree relative of the affected proband also had clinically definite MS, and sporadic cases, in which the affected individual reported no known history of MS in any family member. MS phenotypes were confirmed by systematic chart review. All known ancestors were of European descent. Controls were also of European ancestry and consist primarily of spouses and friends of MS patients who reported no known history of chronic diseases, including in first-degree relatives. d) Vanderbilt University (Nashville) All samples were obtained from the Vanderbilt DNA Biorepository (BioVU). This repository connects a de-identified version of Vanderbilt's electronic medical records to DNA samples extracted from waste blood obtained from the phlebotomy labs at Vanderbilt. Thus all samples were obtained through the Vanderbilt Hospitals and Clinics and represent a primary catchment area of Middle Tennessee, U.S.A. Cases were defined using algorithms focused on ICD-9 billing codes, prescribed MS treatments, and keywords located in the text. A manual review of 50 cases indicates a positive predictive value of 98%. Controls were defined as having no ICD-9 code of 340, no mention of multiple sclerosis, no use of medications commonly prescribed for multiple sclerosis, and no mention of any other autoimmune disease.

DNA
Standard methods were used to extract DNA and each contributing centre quantified and normalised their samples locally. Cases and controls from France and the USA were genotyped at local centres as were half of the German (KORA) controls. Samples from Italy and Sardinia, and one third of the samples from Sweden, were genotyped at the Miami University, John P. Hussman Institute for Human Genomics. All other samples (approximately two thirds of the total) were genotyped at the Wellcome Trust Sanger Institute (WTSI). Sample quality control and renormalisation was performed at the respective centres prior to genotyping.

Genotyping
We genotyped a total of 45885 samples, with typing being performed in Boston (1824), France (1028), Germany (997), Miami (5383), Virginia (5416), Vanderbilt (3817) and the Wellcome Trust Sanger Institute (WTSI, 27420). All typing was performed according to the manufactures standard specifications. Raw data from all sites was transferred to the WTSI and was called in three batches using OptiCall (http://www.sanger.ac.uk/resources/software/opticall/). 31 The Wellcome Trust Case Control Consortium common control samples from the 1958 birth cohort (6894), the UK National blood transfusion service (3057) and the HapMap project (48) were called in one batch (these control data were used by other groups working with the ImmunoChip on other diseases). All other multiple sclerosis related data generated at the WTSI were called together as a second batch (22837). All multiple sclerosis related data that was generated outside the WTSI was called as a separate third batch (13049). In total 4149 samples failed QC (see below). The remaining samples included 1230 African Americans, 244 Hispanics, 321 South Asians, 742 individuals from 70 multiplex families, 1899 individuals from 633 trio families (an affected individual and both parents), 17445 unrelated cases and 19855 unrelated controls. We excluded the 1795 non-Europeans from this analysis (these data will be reported elsewhere in an independent analysis).

Sample quality control (QC)
Sample QC was performed in a hierarchal fashion. For each quality test applied, the number of samples excluded is shown, with the breakdown of this number in terms of cases and controls shown in parentheses -number excluded (cases:controls).
Samples were excluded if 1) The observed gender based on sex chromosome markers disagreed with the reported gender -171 (51:120). This assessment was based on calls generated using Illuminus 32 since at the time of analysis OptiCall 31 was not configured to process sex chromosome markers. 2) Call Rate < 98% across all 192402 markers that generated data -192 (72:120).
Note sample QC was done prior to any SNP QC. 3) Autosomal heterozygosity more than 3 standard deviations from the mean value -230 (119:111) 4) Ambiguity or inconsistency in the sequenom finger print ID -924 (189:735). In most cases we were able to establish that such errors resulted from technical issues such as plate swaps, faulty chips or sample handling errors and we were therefore able to attempt retyping of these samples. 5) Excessive Identity By Decent (IBD) -991 (441:544 and 6 unknown). Using the IBD command in PLINK we made all pairwise sample comparisons and excluded the sample with the lowest call rate from non-family pairs with PI_HAT ≥ 0.25; PI_HAT being defined as P(IBD=2)+0.5*P(IBD=1). In total 686 samples were judged to have been typed in duplicate (i.e. had PI_HAT > 0.90). Some families had to be excluded because they were related to other families or were found to be unrelated to each other. Within the families surviving QC all pairwise comparisons had PI_HAT between 0.25 and 0.60. 6) Eigenstrat outliers -1330 (721:609). Principal components (PC) were generated based on each of the 11 populations considered (see below) and individuals were excluded if they were more than 6 standard deviations from the mean on any of the first ten PCs within their respective population. All individuals were also projected onto HapMap PCs and excluded if they were outliers with respect to the European group. Some families had to be excluded because they were non white. 7) Excessive Mendelian Errors -51 (18:33). On testing the trio and multiplex families we found and excluded 17 trio families with > 5000 Mendelian errors, there were no multiplex families showing this level of inconsistency. 8) Other -165 (14:151). In some families the parents (one or both) had to be excluded because other key individuals in that family failed QC (e.g. the index case in a trio family). Some samples were removed because they withdrew from the study.
A population specific break down of these exclusions is provided in Supplementary Table  10.

Shared external controls
The International Inflammatory Bowel Disease Genetics Consortium (IIBDGC, http://www.ibdgenetics.org) provided us with ImmunoChip genotypes from 20337 control individuals, of which 9799 were found to overlap with control individuals already included in our study. A further 433 of the IIBDGC samples failed our QC so that in total we included data from 10102 of the IIBDGC samples, increasing the number of controls available for all populations except Finland, France, Norway and the UK.

Overlap with existing GWAS
Through Identity By Descent calculations, it was determined that 8813 samples genotyped on ImmunoChip (2947 cases and 5866 controls) overlapped with samples which were part of the previous GWAS efforts 10,11 . These samples were removed from the discovery phase and included in the replication phase to enable an independent and robust replication using data from an already completed meta-analysis of all previous GWAS (14802 cases and 26703 controls).

SNP QC
The QC for SNPs was applied in each of the 11 population strata independently using just those samples that passed QC. SNP QC was performed in a hierarchal fashion. In each stratum, SNPs were excluded if they 1) Had a call rate < 98% 2) Showed significant evidence of deviation from Hardy-Weinberg Equilibrium, with p < 1.0 x 10 -5 3) Showed evidence of differential missingness between cases and controls with p < 1.0 x 10 -3 4) Were monomorphic Numbers excluded under each criteria are shown in Supplementary Table 11. Amongst the full set of 192402 SNPs there were 839 duplicates and 13952 that gave more than 1 Mendelian error across our 703 families (633 trio families and 70 multiplex families). These SNPs were excluded in all populations. Since the German controls were typed in two parts (KORAEX V and KORAF4) at different centres we compared these data. Inspection of the Quantile-Quantile (QQ) plots revealed 18 SNPs that showed notable deviation from the expected null distribution. Thus, we excluded these SNPs in the German analysis. The same process was repeated for Sweden and the UK (two other sites where controls were processed in two parts at different centres), this led to the exclusion of 8 and 6 SNPs in Sweden and the UK respectively. The external IIBDGC control data included genotypes from 8076 UK control subjects that had also been called as part of our effort. Comparing these duplicate calls identified 776 SNPs with a significant difference (p < 1.0 x 10 -3 ), these were excluded in each of the populations where the IIBDGC external controls were employed. The IIBDGC and multiple sclerosis controls were compared within each population where both were available. This identified a total of 678 SNPs showing a significant difference (p < 1.0 x 10 -5 ) in at least one population. These were excluded in all populations which included IIBDGC controls (i.e. all except Finland, France, Norway and the UK). After a preliminary case control analysis potentially associated markers were identified and their cluster plots were visually inspected using Evoker (http://www.sanger.ac.uk/resources/software/evoker/). 33 In total we checked the cluster plots from 4896 SNPs and rejected 197 as inadequate, these 197 SNPs were excluded from subsequent analysis, see Supplementary Figure 98 for examples of rejected SNPs. After all these exclusions there were a total of 165892 SNPs available in at least one population. Markers available in only one population were excluded from analysis, resulting in 161328 SNPs analysed.

Principal Component Analysis
Principal Components (PCs) were calculated in each of the 11 stratum independently, using a core set of 21468 SNPs. Only samples which passed all quality control were included. For this purpose of accounting for population substructure in association analysis, no HapMap samples were included as they were for outlier detection. These 21468 remained after excluding SNPs 1) That did not pass QC in any 1 of the 11 data sets 2) From the MHC regions (all markers on chromosome 6 from 26 to 36 MB) 3) From other regions of extended LD such as chromosome 8 from 6 to 16 MB and chromosome 17 from 40 to 45 MB 4) With a minor allele frequency (MAF) less than 0.01 5) From LD pruning so that no marker had pairwise r 2 >0.1 with any marker within a 100 SNP window.
By way of an example Supplementary Figure 97 shows the UK cases and controls plotted according to the first two PCs. After adjusting to a standard sample size 34 only Denmark and Norway showed evidence of any substantial genomic inflation (see Supplementary Table 12). After including the first 5 PCs as covariates, only minimal evidence for genomic inflation in Denmark remained (final column in Supplementary Table 12).

Secondary phenotypes
In addition to the primary analysis of susceptibility we also performed an analysis of the ImmunoChip data from the cases with respect to severity (as measured in terms of the MSSS 19 ) In this severity (MSSS) based analysis, data from each population were tested separately using the first five principal components together with age at onset and gender as covariates, and then corrected for genomic inflation before results were combined in meta-analysis across the 11 populations.
No SNP reached genomewide significance in this secondary phenotype analysis. The most associated SNP in the MSSS ImmunoChip analysis was rs4092077 from an intergenic region on chromosome 4q35. Supplementary Figure 99 shows the QQ plot for the MSSS ImmunoChip analysis.

Trio families
In addition to the main case control samples we also typed the ImmunoChip in a set of trio families (an affected individual and both parents). After QC this effort provided data from all three individuals in 633 trio families (621 from the UK and 12 from the USA). Transmission disequilibrium testing (TDT) in these families showed over transmission of the risk allele apparent in the case control analysis for all but 16 of the 97 independent genomewide significant SNPs identified. None of the 16 discordant SNPs showed statistically significant evidence of deviation from the null while 25 of the concordant SNP showed a significant difference (one sided p-value 5.0 x 10 -2 ). The probability of seeing 81/97 concordant over transmission by chance alone is 6.2 x 10 -12 . Supplementary Figure 100 shows the QQ plot for the TDT analysis.

Heritability Explained
In order to calculate the genetic variance currently explained in multiple sclerosis, we used a logistic regression model in the discovery phase data. Using a joint analysis of all discovery phase data for association with multiple sclerosis, we fit a null model with the first 5 principal components and indicator variables for the 11 country-level strata as covariates. We then fit an alternative model which additionally included the 109 non-MHC susceptibility alleles that were included on ImmunoChip (see Supplementary In the 2011 GWAS 10 we calculated that collectively the four MHC and 57 non-MHC susceptibility alleles identified in that study accounted for 25% of the sibling recurrence risk (λ s ) observed epidemiologically; assuming a total λ s of 6.3. 35 In addition we also used a liability threshold model and estimated that these same risk alleles explain 17% of the genetic variance.
Using the same methods as described in the supplementary information file from the 2011 GWAS 10 we repeated these calculations using the summary statistics from the109 non-MHC susceptibility alleles that were included on ImmunoChip. These variants explain a λs of 1.271 and 12.8% of the genetic variance in liability. Combined with the effects attributable to the four MHC risk alleles 10 we now explain a total of 28% of the sibling recurrence risk and 23% of the genetic variance in liability. Given the geometric relationship between recurrence risk and relatedness apparent in multiple sclerosis 36,37 it is likely that a substantial fraction of the apparent heritability of the disease is phantom, 38 (i.e. attributable to interactions rather than the additive marginal effects identified in GWAS) and therefore that the proportion of narrow sense heritability explained is likely to be substantially higher.  The meta-analysis of GWAS 11 was published prior to our main GWAS. 10  Top row rs9893808, middle row rs3785907 and bottom row rs7206912. For each SNP the left hand panel is based on the WTCCC common controls (n=9999), the centre panel on the multiple sclerosis cases and controls typed at WTSI (n=22837) and the right hand panel on the multiple sclerosis cases and controls that were typed at other centres (n=13049). All plots were generated using Evoker. 33