Integrating genetics with newborn metabolomics in infantile hypertrophic pyloric stenosis

Introduction Infantile hypertrophic pyloric stenosis (IHPS) is caused by hypertrophy of the pyloric sphincter muscle. Objectives Since previous reports have implicated lipid metabolism, we aimed to (1) investigate associations between IHPS and a wide array of lipid-related metabolites in newborns, and (2) address whether detected differences in metabolite levels were likely to be driven by genetic differences between IHPS cases and controls or by differences in early life feeding patterns. Methods We used population-based random selection of IHPS cases and controls born in Denmark between 1997 and 2014. We randomly took dried blood spots of newborns from 267 pairs of IHPS cases and controls matched by sex and day of birth. We used a mixed-effects linear regression model to evaluate associations between 148 metabolites and IHPS in a matched case–control design. Results The phosphatidylcholine PC(38:4) showed significantly lower levels in IHPS cases (P = 4.68 × 10−8) as did six other correlated metabolites (four phosphatidylcholines, acylcarnitine AC(2:0), and histidine). Associations were driven by 98 case–control pairs born before 2009, when median age at sampling was 6 days. No association was seen in 169 pairs born in 2009 or later, when median age at sampling was 2 days. More IHPS cases than controls had a diagnosis for neonatal difficulty in feeding at breast (P = 6.15 × 10−3). Genetic variants known to be associated with PC(38:4) levels did not associate with IHPS. Conclusions We detected lower levels of certain metabolites in IHPS, possibly reflecting different feeding patterns in the first days of life. Supplementary information The online version of this article (doi:10.1007/s11306-020-01763-2) contains supplementary material, which is available to authorized users.


INTRODUCTION
Infantile hypertrophic pyloric stenosis (IHPS), a disease typically appearing between two and eight weeks after birth, is characterized by hypertrophy of the pyloric sphincter smooth muscle, which leads to obstruction of the gastric outlet 1 . IHPS has a population incidence between 1 and 4 per 1000 live births 2,3 , a 4-to 5-fold higher risk in boys than girls 1 , and is the most common disease requiring surgery in the first months of life 4 . Despite its well-established clinical presentation, diagnosis and treatment, IHPS etiology remains incompletely understood, with both genetic and environmental factors contributing to disease pathogenesis. IHPS familial aggregation of 20-fold increased risk among siblings 5 , twin heritability estimates of 87% 5 and genome wide SNP heritability of 30% 6 suggest a strong genetic component for IHPS. Furthermore, perinatal risk factors such as exposure to macrolide antibiotics 7 , being first-born, preterm, delivery by cesarean section 8 , and bottle-feeding 9,10 highlight the importance of modifiable early environmental exposures on the risk of IHPS 11 .
Previous findings from our group and others showed an association between IHPS and lipid metabolism. Briefly, we detected genetic variants at the APOA1 locus in which known cholesterollowering alleles were associated with increased risk of IHPS 12 . Moreover, in the same study we found lower levels of total cholesterol in umbilical cord blood in cases vs. controls 12 . In our most recent genome-wide association study (GWAS) meta-analysis of IHPS, we also detected a genome-wide positive genetic correlation with high-density lipoprotein (HDL) cholesterol and a negative correlation with very low-density lipoprotein cholesterol (VLDL) 6 . Moreover, 14% of children with Smith-Lemli-Opitz syndrome (SLOS), a malformation syndrome caused by an inborn defect of cholesterol biosynthesis 13 , are reported to have IHPS 14 .
. CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.02. 22.20026724 doi: medRxiv preprint Following on these previous findings, this study had two goals. First, we wanted to investigate associations between IHPS and lipid-centric targeted metabolites in the blood of neonates from dried blood spot samples obtained through the routine Danish neonatal screening program 15 . Second, we wanted to address the question of whether detected differences in metabolite levels were likely to be driven by genetic differences between IHPS cases and controls or by differences in early life feeding patterns.

Study Participants
Study participants belong to a Danish ancestry cohort previously described in our genome-wide metaanalysis of IHPS 6 . Briefly, IHPS cases were defined as having a pyloromyotomy in their first year of life, were singletons born in Denmark with parents and grandparents born in Northwestern Europe, and were born with no severe pregnancy complications nor major congenital malformations. For controls we used the same selection criteria as for cases, but without any IHPS diagnosis or surgery code. Our initial sample size comprised a total of 588 samples taken from dried blood spots (DBS) (294 pairs of cases and controls matched by sex and day of birth), of which we removed sequentially the following samples: (a) 2 pairs due to low sample quality in the cases; (b) 19 pairs due to at least one of the samples in each pair not passing standard GWAS QC (e.g. genetic variant missingness, ancestry outlier or genetic relatedness) 6 ; (c) 5 pairs in which the controls had missing parity information; and (d) 1 pair in which the case had missing gestational age information. After this filtering, the discovery cohort comprised 267 pairs of IHPS cases and controls matched by sex and day of birth (range from 1997 to . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint

Metabolite profiling and nomenclature
The DBS samples consist of whole blood blotted onto filter paper, dried at room temperature for at least 3 hours before being sent by mail for newborn screening at Statens Serum Institut. After newborn screening, the samples were then stored at -20°C in the Danish National Biobank. On the day of the analysis, a 3.2-mm punch was collected using a Panthera-PuncherTM 9 blood spot punching system (PerkinElmer, Waltham, MA, USA) directly into the 96-well kit plates.
Targeted metabolite quantification followed the manufacturer's user manual for the AbsoluteIDQ® p400 Kit (Biocrates Life Sciences AG, Innsbruck, Austria), adapted for DBS analysis by the manufacturer 19 . The kit has a high-performance liquid chromatography (LC) separation step and a flow injection analysis (FIA) step, both followed by mass spectrometry (MS) analyses. Mass detection and compound identification are performed by multiple reaction monitoring. The method of the . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.02. 22.20026724 doi: medRxiv preprint AbsoluteIDQ® p400 kit has comparable results to the AbsoluteIDQ® p180 kit 20 , a kit with reported high interlaboratory reproducibility 21 , and validated for human blood plasma and DBS 22   . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.02. 22.20026724 doi: medRxiv preprint We first injected extracts for LC-MS measurements on one dedicated injector (amino acids, biogenic amines). FIA was conducted the next day on the second dedicated injector (acylcarnitines, glycerophospholipids, sphingolipids, hexoses, cholesterol esters, glycerides). Eight 96-well plates were run, with wells measured in a fixed order (from #1 to #96). A matched case and control would always be on the same plate but not necessarily injected sequentially. LC-MS data was pre-processed using

Statistical Analysis
For each metabolite m and plate p, the limit-of-detection (LOD m,p ) was defined as the median nonnormalized values of metabolite m for paper-based blanks (done in triplicate for each plate) on the relevant plate p. This means that LODs were calculated separately for each plate. We excluded a metabolite m, if more than 20% of non-normalized concentrations in cases and controls were ≤ LOD.
MetIDQs data normalization procedure was implemented in R 23 , to scale out differences between plates. Briefly, the median value per metabolite per plate of the five replicated injections of the QC2 (A m,p ) was calculated (only for cells above LOD) and divided that by the metabolite target value (TV m ) of each metabolite, available in the MetIDQ database, leading to the correction factor by plate (A m,p /TV m = C m,p ). Then to normalize the data, the raw concentration values of all samples were divided by the plate-specific correction factor: Norm m,p = Raw m,p /C m,p . We also did metabolite filtering by coefficient of variation (CV) based on the normalized values of the QC2 replicates. For each metabolite m, we calculated its CV for all plates together, i.e. for 5 QC2 samples and 8 plates, the CV . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.02. 22.20026724 doi: medRxiv preprint was based on 40 observations. If CV m > 25% we excluded metabolite m. Thus, of the 408 metabolites measured with the kit, we therefore selected 148 metabolites that were above the limit of detection (LOD) in at least 80% of the samples (in cases or controls separately) and had a coefficient of variation of the QC2 replicates below 25%.
Due to the non-normal distribution of metabolite concentrations (reported in µM), we quantile transformed the normalized metabolite concentrations to a standard normal distribution prior to all statistical analyses. Estimation of association of metabolite levels with IHPS was performed with a linear mixed-effects model, implemented in the lmer function from the lme4 R package 24 , adjusted for sex (as factor), year of birth (as factor), parity, and gestational age (in weeks) as fixed effects, and IHPS case-control pair as a random effect. A statistically significant association was defined as P < 0.05/148, to adjust for the 148 metabolites tested (Bonferroni correction).

Diagnosis codes and co-occurrence with IHPS
All International Classification of Diseases (ICD) diagnosis codes from our 267 pairs of IHPS cases and controls were extracted from the Danish National Patient Register 18 . Fisher's exact test for count data implemented in R 23 was used to calculate the odds-ratio of occurrence of the ICD-code P92.5 in IHPS cases vs. controls.

Genetic variants associated with metabolite levels
Assessing the effect of FADS1 genetic variant rs174547 25-27 on IHPS was done doing a lookup in our previous genome-wide meta-analysis of IHPS 6 . The association of rs174547 with metabolite levels in our cohort was determined in a linear model adjusted for IHPS status, sex, year of birth (as factor), parity, and gestational age (in weeks). Stratified analysis was also conducted separately for individuals . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint .
1 0 born from 2009 onwards (lower age at sampling) and for individuals born before 2009 (higher age at sampling).

Description of the cohort
A cohort comprising 267 pairs of IHPS cases and controls matched by sex and day of birth was sampled from DBS samples taken a few days after birth (Methods, Supplementary Methods). Key characteristics of the cohort are described in Table 1 and Methods.

Targeted metabolomics and IHPS association
Based on our previous findings, in which we detected an association between IHPS and lipid  . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

IHPS-associated metabolites may tag different feeding patterns between cases and controls
is the (which was not peer-reviewed) The copyright holder for this preprint  Figure 5). This suggests that other factors influencing PC(38:4) levels underlie the IHPS association, and feeding pattern could be one such factor. Although we do not have general information on feeding patterns in our cohort, a larger number of IHPS cases than controls had a diagnosis code for neonatal difficulty in feeding at breast (ICD-10 code P92. 5 Table 4, Methods).

DISCUSSION
In this case-control metabolomics study of IHPS, we identified seven metabolites with significantly lower levels in newborns who later developed the disease. The metabolites included five PCs, acylcarnitine AC(2:0), and the amino acid histidine. The associations were driven by 98 case-control pairs born before 2009, when median age at sampling was 6 days. No significant associations were seen in the 169 case-control pairs born in 2009 or later, when median age at sampling was 2 days.
Genetic variation known to affect levels of our sentinel IHPS significant metabolite PC(38:4) in blood of adults 27 was not associated in our neonatal cohort, nor was associated with IHPS in our previous GWAS meta-analysis 6 . This suggests non-causality of the genetically determined levels of the metabolite PC(38:4) with IHPS, requiring another explanation for the observed association between PC(38:4) levels and IHPS. The fact that IHPS cases had more diagnoses for neonatal difficulty in . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.02. 22.20026724 doi: medRxiv preprint feeding at breast than controls supports the hypothesis that a higher proportion of cases were primarily bottle-fed. Furthermore, the phosphatidylcholine PC(38:4) is known to be at higher levels in plasma 28 and dried blood spot 29 samples from breast-fed infants compared to bottle-fed infants, and bottlefeeding is a well-known risk factor for IHPS [9][10][11] . Thus, IHPS risk could be exacerbated by introducing bottle-feeding, which may induce lower levels of phosphatidylcholines, essential in cholesterol metabolism 30 . This is in line with the argument that, right after birth, newborn's milk intake (whether formula or breast-milk) is very low but increases rapidly within the first week of life 31,32 , which could explain the significant metabolite associations only for individuals born before 2009, as they were sampled at a later age. Moreover, the fact that the ratio of PC(38:4)/PC(38:3) was significantly associated with FADS1 genetic variation for individuals born from 2009 (samples about 2 days after birth) but not for individuals born before 2009 (sampled at a later age) suggests that the effects of feeding patterns about one week after birth influence the ratio so much that the effect of FADS1 genetic variation is masked. The association between PC(38:4) levels and IHPS (seen about 6 days after birth) is therefore more likely to be attributable to feeding differences between IHPS cases and controls. Similar to phosphatidylcholines, histidine and the short-chain acylcarnitine AC(2:0) were also detected to be at significant lower concentrations in IHPS ( Figure 1, Table 2, Supplemental Figures 6-7). The lower histidine concentration levels in IHPS cases could be further evidence that feeding differences exist between cases and controls. Histidine being an essential amino acid 33 , it cannot be synthesized de novo in humans and therefore, needs to be supplied in the diet. Moreover, histidine content is known to be higher in breast milk compared to standard liquid infant formulas 34 (preferred formulas in the first days of life), suggesting that controls had a higher degree of breast-feeding than cases. Regarding AC(2:0), it has been reported that infants who are breast-fed have higher concentrations of this . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.02. 22.20026724 doi: medRxiv preprint 1 4 metabolite in their plasma compared to bottle-fed infants 35 , which further suggests the hypothesis that controls are being more breast-fed compared to cases.
Despite the epidemiological evidence that boys have a 4-to 5-fold higher IHPS risk than girls 1 , and that sex differences exist in serum lipids and lipoproteins at birth 36 , we did not detect sex differences in the concentrations of the IHPS associated metabolites.
Since our results seem to reflect mainly the influence of feeding practice on IHPS risk, our previous findings of lower levels of total cholesterol (TC) in umbilical cord blood in cases vs. controls 12 and positive genetic correlation of IHPS with HDL cholesterol 6 and negative genetic correlation with VLDL 6 could not be explained by the results of the current study. However, together with the reported high incidence of IHPS in children with Smith-Lemli-Opitz syndrome (SLOS) 14 , caused by a cholesterol biosynthesis defect 13 , it appears that both genetic and environmental factors can contribute to the pathophysiology of IHPS through disturbances of the lipid metabolism of newborns.
Our study had some limitations, including the use of small amounts of sample material from DBS samples and the use of a targeted approach in which other relevant metabolites could have been overlooked. We also did not have general information on breast-feeding vs. formula feeding so we were not able to study the feeding effect directly. In addition, genetic variation known to associate with levels of metabolites queried here is only known in adults, and so we might have missed genetic associations specific to newborn metabolite levels. Strengths of the study include the use of a wellestablished targeted metabolomics platform (Biocrates) to investigate the association between IHPS and 148 metabolites from DBS samples taken a few days after birth. To our knowledge, this is the first time that extensive newborn metabolomics profiling is being used to study IHPS. Another strength of . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.02. 22.20026724 doi: medRxiv preprint the study is the use of well-defined codes from nation-wide Danish registers to define outcome, exclusions, and covariates. Finally, our study design in which we performed 1-to-1 matching of IHPS cases and controls based on sex and exact day of birth is also a clear strength of the study.
Our observations have implications for ongoing studies of complex diseases in which both genetics and environmental factors play a role in disease etiology. In time, a detailed molecular dissection of IHPS risk factors could potentially be used as a basis for more rapid identification of particularly susceptible newborns. While our study represents a step in that direction, further efforts are needed to replicate our findings in other populations and to better understand the interplay of genetics and early feeding patterns and consequent progression to IHPS. Specifically, the use of wide lipid profiling in nutritional intervention studies could help to unravel causal relationships and underlying mechanisms. Moreover, since AC(2:0) is already present on the list of most of newborn screening panels worldwide 37 , future work could address the predictive power of AC(2:0) as a newborn screening test for IHPS.

CONCLUSIONS
Although IHPS is highly heritable 5,6 , it is a complex disease that does not have Mendelian transmission through families. Common genetic variants 6,12,16 and environmental risk factors [7][8][9][10][11] have been previously discovered, but IHPS etiology remains incompletely understood. In this case-control metabolomics study of IHPS we have detected lower levels of PC(38:4), together with correlated PCs, histidine and AC(2:0), in IHPS. Several lines of evidence suggest that environmental factors affecting these metabolites such as feeding pattern in the first days of life may underlie the associations. Further understanding of the complex interplay of genetics and early life environmental factors is critical to the . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . 1 6 understanding of the pathophysiology, diagnosis and possible treatment of IHPS, and can serve as a model for other complex diseases.
. CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint     . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint  Table 3. Main IHPS association results for the 148 metabolites tested for individuals born in 2009 or later. Table 4. Number of IHPS cases and controls with and without the ICD-10 code P92.5 describing neonatal difficulty in feeding at breast. Color-coded linkage disequilibrium is shown for the sentinel SNP rs174547. Linkage disequilibrium determined for the 1000 Genomes EUR population, November 2014 freeze, as described at http://locuszoom.sph.umich.edu/. The x-axis represents the genomic region (hg19 assembly) surrounding 400kb of rs174547, while the y-axis represents the strength of the association in −log10(Pvalue). . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Supplemental
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.02. 22