Challenges and recommendations for conducting epidemiological studies in the field of epilepsy pharmacogenetics

Epilepsy is one of the most prevalent neurological disorders, afflicting approximately 50 million Indians. Owing to affordability and easy availability, use of first-generation antiepileptic drugs (AEDs) is heavily encouraged for the treatment of epilepsy in resource-limited countries such as India. Although first-generation AEDs are at par with second-generation AEDs in terms of efficacy, adverse drug reactions (ADRs) are quite common with them. This could be attributed to the inferior pharmacokinetic parameters such as nonlinear metabolism, narrow therapeutic index and formation of toxic intermediates. In addition, epilepsy patients may differ in the pharmacokinetic and pharmacodynamic profiles, with about 1/3rd of the population failing to respond to treatment. A proportion of this interindividual variability in response may be explained by genetic heterogeneity in the activity and expression of the network of proteins such as metabolizing enzymes, transporters and targets of AEDs. Over the last two decades, a considerable effort has been made by the scientific community for unraveling this genetic basis of variable response to AEDs. However, there have been inconsistencies in such genetic association studies conducted across different territories of the world. There could be several reasons underlying the poor replicability of these studies, mainly nonuniform phenotypic definitions, poor sample size and interethnic variability. In the present review article, we provide an overview of heterogeneity in study designs for conducting pharmacogenetic studies. In addition, critical recommendations required for overcoming such challenges imposed by pharmacogenetic epidemiological studies have been briefly discussed.


Introduction
Epilepsy, characterized by recurrent unprovoked seizures, is one of the most common brain disorders.
However, commonly available AEDs are effective in only 60-70% of the epilepsy patients and are often associated with ADRs. Pharmacogenetic studies may provide vital clues for providing optimal benefi cial treatment with minimum risk for developing drug-related side-effects.
Post the Human Genome Project, with the advent of high-throughput genotyping chips, genetic studies have garnered enormous attention and are increasingly being used to identify genetic variants that might infl uence drug response and predisposition to ADRs in patients on AEDs. [1] Prominent among these sequence variants are millions of single nucleotide polymorphisms (SNPs), which have emerged as strong candidates for drugresponse studies. The availability of such an enormous wealth of data has served to fuel the pharmacogenetic epidemiological studies. However, such studies often come under the scanner owing to a lack of reproducibility of the results. There are several key issues in this regard which need to be adequately addressed to ensure the S5 validity, accuracy and reliability of such results before coming to scientifi cally relevant conclusions. These issues could range from population stratifi cation, smaller sample size, inconsistency in phenotypic definition across different studies, highly heterogeneous clinical symptoms in a specifi c study design and unaccountability of all the environmental variables. [2] Further, selection and prioritization of candidate genes and SNPs and use of appropriate statistical tools could also play a major factor in detecting true positive associations. We provide an assessment of variability in study designs, accountability of confounding factors and use of bioinformatics and statistical tools, with an emphasis on the pharmacogenetic studies of epilepsy. Looking at such differences in methodological issues may help us to resolve inconsistencies in replication studies and in extending the laboratory fi ndings to clinical practice.

Clinical study design and phenotypic data collection
A prerequisite for a successful genetic study is a large cohort of patient samples with well-defi ned phenotypes, especially in case of complex genetic disorders including epilepsy. One of the pressing challenges faced by geneticists today is the clinical heterogeneity of seizure and syndrome types as well as the associated etiology. [3] In addition, complexity in the classifi cation and terminology of epilepsies makes it highly unlikely that it will be used for non-diagnostic purposes.
To confront this serious issue, several research groups have evolved their own phenotypic classifi cations based on parameters such as seizure frequency, time to fi rst seizure, time to seizure remission, time to drug withdrawal and number of drugs tried. [4][5][6] Further, time period for evaluation of these parameters vary considerably from 3 to 12 months. [4][5][6] In addition, there are several other key issues that need to be addressed for uniformity in study designs. Most pharmacogenetic studies have failed to exclude patients with symptomatic epilepsies that may confound clinical outcome by rendering patients to respond poorly to AED treatment, irrespective of type, dose and duration of drug therapy.
In addition to nature and regimen of drug therapy, brand of the drug administered, concurrent hormonal therapy and history of treatment could all have a major impact on the improvement in clinical symptoms or phenotypes under observation during the course of the study. [7] Furthermore, stratifi cation according to gender and age are crucial for conducting epidemiological studies. [7] The most common phenotype for the drug response studies have been drug resistance to AEDs.
However, such studies may not yield meaningful interpretations owing to the trial of multiple drugs on the same patients, as AED-AED interaction is a fairly common observation in epilepsy patients. [7] Very limited studies have attempted to garner pharmacogenetic data on monotherapy epilepsy patients. [4] Hence, there is a need to develop the concept of endophenotypes with the purpose of dividing disease symptoms into more stable phenotypes, eventually leading to a robust study design with more powerful test statistics. [8] In summary, all the variables that directly or indirectly play a role in infl uencing phenotypic characteristics of a patient must be given appropriate weightage before deciding upon the inclusion and exclusion criteria for enrolling a patient.
Further, measurement of these clinical parameters must be included in the study design, with accountability in statistical analysis and interpretation.

Prioritizaion of genes and genetic loci
Recent genetic association studies have adapted both the candidate gene approaches as well as the genomewide association (GWA) studies for identifi cation of associated genetic variants. There are several popular strategies for candidate gene selection, including positional and functional approaches. The former relies on the linkage-based approach, which does not require any assumptions regarding the disease mechanism.
Chromosomal regions found in high linkage with the drug response are exploited to search for relevant genes residing at that locus using LD gene mapping. EPHX1 and UGT2B7, serve as prime candidate genes for testing the infl uence of genetic variability on variable drug response. [1,9,10] So far, most of the studies have focused on the role of functional alleles from these genes on decrease or increase in metabolism of AEDs.
However, studies exploring direct role of these variants on seizure control are very limited, with most of them focusing on transporter genes (ABCB1, ABCC1 and ABCC2) and drug targets such as sodium channels (SCN1A). [1,9,10] This could be due to the diffi culty in measuring the levels of AEDs to which different brain regions are exposed. In addition, the task of differentiating between sensitivity of drug targets and permeability of blood-brain barrier in influencing drug response is diffi cult and complicated. Another approach involves a study of genome-wide single nucleotide polymorphisms (SNPs) in a case-control study design for generating unbiased information. However, GWA studies have their own limitations, including a lack of cost-effectiveness, multiple hypothesis testing and the large sample size required for robust high-powered studies complemented with the fact that the available output from GWA studies explains only a fraction of disease heritability. [11] Recently, a meta-analysis of several GWA studies has gained considerable signifi cance for the identifi cation of disease-susceptible loci with a higher confi dence. So far, GWA studies evaluating drug response in epilepsy patients are lacking, owing to poor sample size in different phenotypic categories with a minimum requisite of hundreds, if not thousands, of epilepsy patients in each group. In recent times, studies are now taking advantage of large-scale deep resequencing to develop a better understanding of the human genome, and it is very likely that such approaches will be used in the future for pharmacogenomic studies.

Use of public SNP resources and bioinformatics tools
With the swiftly evolving databases and state-ofthe-art tools, bioinformatics is rapidly becoming an genomes. [13] The efforts undertaken have led to consistent representation of gene information across NCBI, Ensembl and UCSC genome browsers, which is essential to maintain a high standard of reliability and biological accuracy. Further, online databases such as HuGE Navigator or the NIH Genetic Association Database and The Epilepsy Genetic Association Database (epiGAD) provide the options for systematic data tabulation and display, highly relevant for epilepsy researchers with detailed information such as protective and risk-alleles, epilepsy syndrome, study duration and sample size. [14] Statistical analysis and interpretation Statistical analysis plays a fundamental role in interpreting the fi ndings of complex genetic research, and several statistical issues need to be addressed during the study design stage itself to prevent erroneous results.
In this section, we discuss some key statistical issues of interest in genetic study designs. or population stratifi cation. [15] Deviation from HWE in cases in the absence of these confounding effects can provide evidence for association, wherein the true genetic effect of the SNP is not controlled by a multiplicative model. [15] However, because the affected samples are over-represented in such studies than that are expected in a random population, there is a good probability that infl ated type-I error in HWE tests might result in exclusion of potential markers from the study. [15] Population stratifi cation Population stratification has become a crucial statistical issue as it can lead to spurious results, especially in a case-control study design. Stratifi cation refers to the existence of subpopulations with different allele frequencies that might be a result of founder effects, genetic drift or recent admixture. [16] Association studies with such unmatched subjects might result in statistical associations between a disease phenotype and arbitrary markers that have no physical linkage to the causative loci. Exclusion of stratifi cation is therefore more of a necessity than an option in a case-control association study and requires the recruitment of subjects from a genetically homogeneous population. In addition, several tools and algorithms have been devised in order to check the population stratifi cation in such studies. Pritchard and Rosenberg et al. proposed the use of a set of unlinked markers that are unrelated to the disease or the drug response. [16] These unlinked markers will not exhibit signifi cant differences in genotype/allele frequencies between the responder and the nonresponder groups of the study (that would be expected in case of population stratifi cation) as well as in the control individuals. Genomic control (GC) approaches had been proposed to adjust for the confounding effects of population stratifi cation, but are less sensitive for moderate stratifi cation and subtle substructures within the studied population. [17] In addition, statistical tools such as structure, principal component analysis (PCA) and multidimensional scaling (MDS) have proved effective and are commonly used to address this issue. [17] Power and sample size Another vital limb of a genetic study is computation of statistical power. Power of association studies refers to the probability of correctly detecting a genuine association, and is often estimated before carrying out the study to determine the sample size required for fi nding a true genetic effect. [18] Most studies aim to achieve a power of 80%, and the predicted sample sizes

Multiple corrections
Another key issue in association studies is determination of threshold for signifi cant results. Although nominal signifi cance levels of 5% is generally acceptable, it might lead to infl ated type I error, i.e. detection of false-positives when multiple independent tests are performed. One of the earliest tests proposed to overcome this limitation was Bonferroni correction, wherein the probabilities were recalculated depending on the number of independent tests performed. [22] However, the Bonferroni correction method has received much criticism as it is overly conservative. The studied SNPs in an association study might not be entirely independent; rather, they could be correlated and existing in LD. This might result in infl ated type II error, i.e. increase in false-negatives and hence loss of results. Another method, Nyholt's method for multiple corrections, takes into account the background LD for calculation of signifi cance thresholds, but is still conservative in conditions of moderate LD. [23] Other popular methods proposed to overcome the limitations of multiple testing include false discovery rate (FDR), LD block-based corrections and permutation testing.

Functional characterization
Functional characterization helps in discrimination of a causal SNP from an association due to linkage, and