Power comparison between population-based case– control studies and family-based transmission– disequilibrium tests: An empirical study

BACKGROUND: There are two major classes of genetic association analyses: population based and family based. Population-based case–control studies have been the method of choice due to the ease of data collection. However, population stratifi cation is one of the major limitations of case–control studies, while family-based studies are protected against stratifi cation. In this study, we carry out extensive simulations under different disease models (both Mendelian as well as complex) to evaluate the relative powers of the two approaches in detecting association. MATERIALS AND METHODS: The power comparisons are based on a case–control design comprising 200 cases and 200 controls versus a Transmission Disequilibrium Test (TDT) or Pedigree Disequilibrium Test (PDT) design with 200 informative trios. We perform the allele-level test for case–control studies, which is based on the difference of allele frequencies at a single nucleotide polymorphism (SNP) between unrelated cases and controls. The TDT and the PDT are based on preferential allelic transmissions at a SNP from heterozygous parents to the affected offspring. We considered fi ve disease modes of inheritance: (i) recessive with complete penetrance (ii) dominant with complete penetrance and (iii), (iv) and (v) complex diseases with varying levels of penetrances and phenocopies. RESULTS: We fi nd that while the TDT/PDT design with 200 informative trios is in general more powerful than a case–control design with 200 cases and 200 controls (except when the heterozygosity at the marker locus is high), it may be necessary to sample a very large number of trios to obtain the requisite number of informative families. CONCLUSION: The current study provides insights into power comparisons between population-based and familybased association studies.

Association mapping of susceptible genes underlying complex disorders is an active area of current research in genetic epidemiology. Compared with Mendelian disorders, there has been limited success in identifying genes involved in complex disorders as these traits are believed to be controlled by multiple loci, some with minor gene effects, and genetic variation at any one locus does not completely determine the trait. Moreover, epistatic as well as gene-environment interactions often modify the risk of developing the disease. While linkage analyses [1] have been traditionally successful in identifying rare variants with large genetic effect sizes characterizing Mendelian disorders, they have been relatively unsuccessful in detecting common variants with moderate effect sizes characterizing complex disorders. There is evidence that association studies, which measure the extent of linkage disequilibrium (LD) between alleles of two loci, [2] are statistically more powerful than linkage studies in gene mapping of complex traits. [3] This is because LD exists over small distances on the genome, while linkage exists over larger distances. Thus, a positive association fi nding gives a more precise location of a locus responsible for the trait. The most popular design for genetic association S28 studies is population-based case-control studies due to the ease of data collection and statistical methodology of testing for association. However, such studies suffer from a major inherent limitation: the problem of population stratifi cation. [4] If the sample is a mixture of genetically heterogeneous subpopulations (i.e., there is heterogeneity in allele frequencies at the SNPs across subpopulations), the association fi nding may be spurious.
This problem is of specifi c relevance for studies on Indian populations due to the increasing evidence of genetic heterogeneity among different ethnic populations in India. [5][6][7] While there are some statistical methods [8][9][10] to adjust for population stratifi cation, it remains unclear as to the optimal number of genome-wide markers required to evaluate the level of stratifi cation and the extent of possible correction of the relevant statistics.
Thus, it has been of interest to explore, for family-based studies, alternatives that attempt to detect patterns of preferential transmission of a specifi c parental allele to the offspring, the most well known being the Transmission Disequilibrium Test (TDT). [11] The major advantage of this test is that it is protected against population stratifi cation, although it requires a relatively more demanding data compared with case-control studies.
In this study, we carry out extensive simulations to compare the statistical powers of population-based casecontrol analyses and the family-based TDT and Pedigree Disequilibrium Test (PDT) [12] for a wide spectrum of genetic disease models. The major challenge lies in the fact that a direct and straightforward power comparison is not possible in the strict statistical sense because the study designs are different with respect to data requirements.

Materials and Methods
We have performed the allele-level test for casecontrol studies, which is based on the difference of allele       In this light, the current study provides an alternative framework for statistical comparison based on power.
We found that while the TDT or PDT based on a set of informative trios is more powerful in detecting association compared with a case-control design comprising an equal number of cases and controls as the number of informative trios except when the heterozygosity of the marker locus is very high, a more fair statistical comparison of the total number of trios screened in the TDT or PDT analysis with the number of cases (or controls) in a case-control design to obtain equivalent power shows that the case-control design wins the battle of sample sizes very comprehensively. Moreover, it needs to be emphasized that while a case-control design comprising N cases and N controls requires genotyping of 2N individuals, a TDT or PDT design with N trios requires an expected genotyping of (2+) N individuals, where  is the proportion of informative trios. Thus, in view of the fact that the case-control design yields more power than the TDT/PDT where the number of cases (or controls) equals the number of trios, the relative gain in a case-control design is even greater when the genotyping costs are taken into consideration. We would like to highlight that while family-based association analyses are protected against population stratification with respect to false-positives, they may be adversely affected with respect to false-negatives. Thus, it is of interest to compare the powers of the case-control design to the TDT/PDT in the presence of population stratifi cation. This is statistically challenging as population stratifi cation induces an infl ated rate of false-positives in the casecontrol framework and, hence, a direct comparison of powers without adjusting the distributional thresholds for stratifi cation is not statistically valid.
We plan to carry out extensive simulations under population stratifi cation and compare the powers of the two procedures after adjustments of stratifi cation in the case-control analyses based on a principal components approach. [10]