The creeping Pearl: Why has the rate of contraceptive failure increased in clinical trials of combined hormonal contraceptive pills?

BACKGROUND
Despite several drawbacks, the Pearl Index continues to be the most widely used statistical measure of contraceptive failure. However, Pearl indices reported in studies of newer hormonal contraceptives appear to be increasing.


STUDY DESIGN
We searched PubMed and Medical Intelligence Solutions databases for prospective trials evaluating oral contraceptive (OC) efficacy to examine potential factors that could contribute to increasing Pearl indices.


RESULTS
Numerous potential factors were identified, including an increased rate of failures of newer OCs, deficiencies in methods of calculating contraceptive failure rates, differences in study design and changes in patient populations resulting in increased rates of contraceptive failures due to the inappropriate or inconsistent use of the method.


CONCLUSIONS
The two most likely important contributors to the increase in Pearl indices are more frequent pregnancy testing with more sensitive tests and less adherent study populations. Because study populations appear to be increasingly representative of the likely actual users once the product is marketed, we can expect to see even higher failure rates in ongoing and future studies. This result poses challenges for companies and regulatory agencies.


INTRODUCTION
Oral contraceptives (OCs) are the most widely used method of reversible contraception in the United States [1]. They are popular, safe, and when used correctly and consistently, highly effective for preventing pregnancy.
According to women who use OCs, effectiveness is the most important factor that influences their choice of contraceptive method [2]. In fact, in a recent US survey of 574 women seeking abortions, effectiveness was the contraceptive feature that was considered extremely important by the largest percentage of these women-84% [3].
Contraceptive efficacy is demonstrated by the rate of contraceptive failures during a specified period of exposure in a clinical trial. Currently two methods are used for measuring contraceptive efficacy in clinical trials: the Pearl Index and the life table. Because of variations in study design, study populations, and data collection and analyses, rates of contraceptive failure reported in clinical trials are difficult to interpret and compare. Nonetheless, contraceptive failure rates reported in clinical trials of new hormonal contraceptives submitted to the FDA for approval seem to have increased in recent years, especially in the past decade [4]. Table 1 shows Pearl indices, but as discussed below there has also been an increase in life table failure rates.
In this paper, we explore the numerous factors that could contribute to this apparent trend. Potential factors include an increased rate of failure of the newer oral contraceptives, deficiencies in methods of calculating contraceptive failure rates, differences in study design and patient populations of hormonal contraception trials, and an increased rate of contraceptive failures due to the inappropriate or inconsistent use of the newer methods [4].

CHANGED?
One possible explanation for the increase in Pearl indices in recent applications for FDA approval for hormonal contraceptives is that the newer methods have less inherent efficacy than older hormonal methods and, thus, are more likely to fail when used correctly. Clinical trials supporting OCs that were FDA approved in the 1960s showed contraceptive failure rates of <1 as estimated by the overall Pearl Index [4]. In fact, prior to 1975, the Obstetrics and Gynecology Division of the FDA recommended that Pearl indices of <1.5 were required to establish an acceptable level of efficacy [4]. Over time, decreasing estrogen and progestin doses have coincided with an increased number of method failures, and OCs are now currently approved with Pearl indices higher than 2.0.
However, the approval of any drug product depends on an analysis of its risks and benefits, as well as its efficacy and safety. Although OCs with higher estrogen and progestin doses may exhibit slightly greater efficacy and a more favorable margin for user error, lower -dose OCs are likely to have a better tolerability and safety profile with a lower risk of serious thromboembolic and cardiovascular events.
The efficacy of hormonal contraceptives refers to how well they work in ideal conditions, while the effectiveness refers to how well they work in actual practice. It has been assumed that clinical trials yield estimates of efficacy, but as discussed below this may no longer be the case. If changes in hormonal content of OCs were responsible for changes in contraceptive efficacy, a parallel decrease in contraceptive effectiveness would be expected [5], because lower-dose products would be likely to be less forgiving of imperfect use. The National Survey of Family Growth (NSFG) has provided useful estimates of contraceptive effectiveness in women aged 15 to 44 years, indicating that estimated rates of OC failure increased from 2.0% in 1973 to 8.8% in 1995 and 8.7% in 2002, although the former estimate included only married women while the latter estimates included all women (and were corrected for under-reporting of abortion, which made little difference) [6,7]. In addition, the more recent Contraceptive Choice Study has yielded cumulative failure rates of 4.8%, 7.8%, and 9.4% at years 1, 2, and 3, respectively, for combined hormone contraception (pills, patches, and rings) in a large population [8].
However, the reduction in hormone doses in OCs is not believed to have contributed to the rise in contraceptive failure because higher-dosed OCs (ethinyl estradiol [EE] doses ≥35 mcg) were the most commonly prescribed OCs during the period in which decreases in contraceptive effectiveness were observed [5]. Lower-dose OCs did not increase in popularity until after 1995, after the decrease in contraceptive effectiveness was first detected [5].
As their popularity has increased, low-dose OCs have undergone changes to their dosing regimen and cycle duration for improved tolerability, adherence, and efficacy. Modifying the traditional 21/7 regimen by shortening the number of placebo days can potentially enhance ovarian follicular suppression and decrease the risk of escape ovulation [9], while enabling a withdrawal bleeding episode to reassure women they are not pregnant. Several standard 28-day cycle OCs with reduced placebo days have been introduced, such as 24/4 regimens of drosperinone/EE and norethindrone acetate/EE that include 24 active combination tablets and 4 placebo tablets, as well as an LNG/EE regimen that includes 21 active combination tablets followed by 2 placebo tablets and 5 EE 10-mcg tablets [10][11][12].
Extended-cycle OCs reduce symptoms related to hormone withdrawal (e.g. headache, dysmenorrhea) as well as the number of scheduled bleeding days [13,14]. Since missing the first pills of the subsequent 28-day pill packs is a cause of pill failure and unintended pregnancy [15], dispensing 91 days of contraception in a real-world setting may help women avoid missed first pills because the number of cycles will be reduced. While one might speculate that extending the number of active pills to 24 or 84 might increase contraceptive effectiveness, such an impact in a real-world setting would need to be confirmed in a largescale post-marketing surveillance. Incorporating both period-stabilizing estrogen and an extended cycle, a novel, 91-day, ascending-EE-dose/levonorgestrel (LNG) OC is in development that increases estrogen exposure during the time in the cycle when unscheduled bleeding most frequently occurs in extended cycles, reducing the frequency of unscheduled bleeding episodes. Data indicate that using period-stabilizing estrogen may reduce unscheduled breakthrough bleeding [16]. As new methods of hormonal contraception are introduced, it becomes increasingly important to evaluate the impact of the multiple factors that influence OC efficacy and effectiveness.

WHAT ARE THE LIMITATIONS OF THE PEARL INDEX IN EVALUATING CONTRACEPTIVE EFFICACY?
The Pearl Index was originally introduced in 1933 by Raymond Pearl, a Johns Hopkins biologist. Due to its ease of calculation, it continues to be the most widely used statistical measure of contraceptive failure [17][18][19]. The Pearl Index represents the number of failures per 100 womanyears of exposure [17]. The numerator in the Index is the number of pregnancies, and the denominator is the cumulative number of months or cycles of exposure from the start of the method until the completion of the study, discontinuation of the method, or pregnancy. The quotient is multiplied by 1,200 if the denominator is reported in months or by 1,300 if the denominator is reported in cycles [20].
The Pearl Index is easy to calculate, but as a measure of contraceptive effectiveness, it is deeply flawed [20]. One of its major limitations is that it generally decreases with the duration of the clinical trial because the likelihood of pregnancy decreases over time. This observation results from the fact that women who are most likely to conceive do so after shorter durations of contraceptive use and exit from observation [20]. Two explanations for the decline in pregnancy rates with duration of use are possible. In the first, individual women's proficiency of use is constant over time, but individual women vary in their propensity to fail, with higher propensities due to relatively higher fecundity or coital frequency or less adherence [18]. Conversely, women still using the contraceptive method after long durations are unlikely to become pregnant, due to lower fecundity, lower coital frequency, or greater adherence [18,20]. Another potential explanation is that individual women become more proficient with the correct use of contraceptives as they gain more experience [20]. Consequently, Pearl indices could theoretically be driven towards zero simply by extending the duration of the trial, and the end result is that Pearl indices of studies of different lengths cannot be meaningfully compared [20]. Historically, OC trials often lasted up to 2 years, but more recent trials have typically lasted 12 months, and occasionally as briefly as 6 months [5]. However, Table 1 shows no consistent decline in trial length, so this factor cannot be responsible for the observed increase in failure rates.
Because of these limitations, life table analysis should be used instead of the Pearl Index as a method for estimating rates of contraceptive failure. The life table analysis provides the contraceptive failure rate for each month of use and can provide a cumulative failure rate for any duration of exposure [20]. By using the life table method, early contraceptive failure and efficacy over time can be identified. Most trials of OCs now typically report contraceptive failures using the life table method as well as the Pearl Index and regulatory agencies require both analyses in drug development studies [4,21]. Therefore, for most clinical trials, singledecrement life table analyses are most relevant [20]. Note that there has been an increase in Pearl indices in those trials lasting 13 cycles/one year, so life-table estimates of failure at the end of a year have also increased.

HOW DO DIFFERENCES IN STUDY POPULATIONS AFFECT
PREGNANCY RATES?

Imperfect use
The characteristic that is most likely to affect pregnancy rates is a woman's likelihood of using the contraceptive method inconsistently or incorrectly [22]. Distinguishing "user" failures from "method" failures is critical to an accurate estimation of the contraceptive's inherent efficacy. However, most investigators in clinical trials have calculated "method" and "user" failure rates incorrectly [22].
Conventionally, pregnancies that occur during a month in which a method was used properly are classified as method failures; all others are classified as user failures. By definition, method failures cannot occur during cycles of imperfect use, and user failures cannot occur during cycles of perfect use [18]. This convention is straightforward and yields the correct numerators for method and user failure rates. The error in most studies occurs in the determination of exposure, the denominator in the calculation of failure rates [22].
Logically, method failure rates should include only exposure during perfect use in the denominator. However, many investigators do not inquire about perfect use except when a pregnancy occurs. In order to compute failure rates during perfect use, investigators need to classify all cycles as either perfect-or imperfect-use cycles.

Body Mass Index (BMI)/Weight
Elevated body mass index or weight is one factor that has been posited to increase OC failure rates. It is well established that obesity rates in adult women have increased considerably over the past several decades. According to data from the National Health and Nutrition Examination Survey, the prevalence of obesity in women ≥20 years of age has more than doubled from 15.8% in 1960-1962 to 36.1% in 2009-2010 [23]. Among women aged 20 to 39 years, the prevalence of obesity increased by nearly 20% between 1999 and 2008 (from 28.4% in 1999 to 2000 to 34.0% in 2007 to 2008) [24].
Data indicate that increasing body weight affects rates of estradiol metabolism in young women [25]. Because the time to reach steady-state levels of LNG is increased among obese women, the interval preceding ovarian suppression theoretically may be lengthened, placing these women at a higher risk for ovulation [25].
However, recent studies and reviews of the existing literature have found no convincing evidence that obese women have a higher risk of hormonal contraceptive failure during perfect use, even among women using the lowest dose formulations [26][27][28][29]. Nonetheless, there is some evidence that the obese population may have poorer adherence to medication [30]. Although there is a lack of clear evidence linking obesity to contraceptive failure, many studies of hormonal contraceptives have excluded obese women. The recent recommendation of an FDA advisory committee that subjects in all future trials of hormonal contraceptives must be representative of the population who will actually use the products has resulted in a strong recommendation by the FDA to sponsors to that effect. Future trials of hormonal contraceptives will thus likely include obese and overweight women [31].

Other factors
Other factors that are likely to affect a woman's pregnancy rates include frequency of intercourse, age, fecundity, motivation to avoid pregnancy, sociodemographics, geographic study location, and prior use of hormonal contraceptives.
Among women who use a contraceptive method correctly and consistently, the frequency of intercourse is the subject characteristic that is the greatest determinant of the risk of pregnancy [22], and this frequency is not usually determined in a clinical trial setting. However, new technologies such as cell phone apps that allow respondents to report behaviors that occur once or more than once per day should result in much better quality reporting than conventional diaries.
Because a woman's capacity to conceive and coital frequency decline with age (and marital duration), the risk of contraceptive failure also decreases with age [22]. The woman's desire to avoid pregnancy-including no desire for future children-may also affect rates of contraceptive efficacy by increasing correct and consistent use [5]. With regard to sociodemographic factors other than age, reported contraceptive failure rates have been shown to be lower among less impoverished women and married women. With respect to race/ethnicity, contraceptive failure rates are highest among blacks, lowest among whites, and in between for Hispanics [6]. While the difference between whites and non-whites could result from differences in health literacy, health literacy among blacks is higher than among Hispanics [32].
Geographic study location may also affect pregnancy rates, as lower rates of OC failure have been noted in studies conducted in Europe compared to those conducted in the US [33]. The most probable reason for this is that OCs are prone to be used more correctly and consistently in Europe. Higher failure rates in US trials likely reflect differences in health care systems. Uninsured women in the US may be inclined to join a clinical trial in order to get the health care provided from the trial and receive compensation for their participation. However, a lack of insurance might not affect participation in population-based studies such as those based on the NSFG in the US [6].
With respect to prior use, women who use OCs do not regain full fertility immediately. On average, women who use methods other than the pill become pregnant 2 to 3 months earlier than those who stop using the pill [20]. Thus, women who switch to the pill from another method in a clinical trial will experience higher failure rates than women who switch between types of OCs, provided that an adequate washout period was not included in the trial. Moreover, switchers have been successful users and are selected, for reasons given above, for a lower propensity to fail than are fresh starters. Therefore, including women with very recent or immediate prior use of OCs from an analysis will drive failure rates downward [20]. The correct analytical approach for handling switchers is to enter them into the life table at their current duration of use, [6] but we have not seen clinical trials analyzed in this way.

HOW DO DIFFERENCES IN STUDY DESIGN AFFECT ESTIMATES OF CONTRACEPTIVE EFFICACY?
There is no single accepted and uniformly used design for clinical trials of hormonal contraception, making it difficult to compare the efficacy and tolerability of different products.
In an ideal trial, a random sample of women, with no refusals and therefore no selection or volunteer bias, would participate in a randomized prospective trial in which all subjects would be followed until the completion of the study or several weeks after they discontinued use. They also must "forget their past history with contraception so that they enter the trial as virgin users of contraceptives" [20]. Obviously, there are ethical and logistical challenges to conducting such a trial. For example, women cannot be forced to participate in a trial. The characteristics of contraceptive methods make true double-blind designs challenging, and in almost all studies, a fraction of women become lost to observation or lost to follow-up [20]. In addition, a placebo-controlled trial, which would be necessary to estimate absolute efficacy, is unethical (unless conducted in women who do not mind getting pregnant).
Because of these limitations, clinical trials of contraceptives are likely to be subject to several biases, the most important of which is selection bias. Women who participate in clinical trials are likely to be different from the population using contraceptives. And women who participate in clinical trials of less effective methods such as spermicides or moderately effective methods such as OCs are likely to be very different in ways both observable and unobservable from those currently using intrauterine contraceptives or implants.
Another important source of variation among trials is the determination of which pregnancies to count when calculating failure rates [20]. Varying procedures for reporting pregnancies in clinical trials can substantially affect pregnancy rates, and since pregnancy is a relatively rare event in OC trials, even a few pregnancies can substantially affect reported failure rates. Three key sources of variation exist in pregnancy reporting: variations in procedures for detecting pregnancies, pregnancies in the lost-to-follow-up group, and unreported post-study pregnancies [5].

Variation in procedures for detecting pregnancies
In many older studies, only the pregnancies observed and reported by the women in the study are counted. However, if a pregnancy test were administered every month, the number of pregnancies would increase because the early fetal losses that may not be noticed by the women in the study would be detected. More recent trials have included routine pregnancy testing and have reported higher pregnancy rates than studies without such testing. In fact, a study of the incidence of early pregnancy loss found that 22% of pregnancies ended before they were reported by patients or diagnosed by physicians [34]. More recent studies are likely to use more sensitive pregnancy tests than older studies. As a result, failure rates reported in these studies with routine testing are not comparable to those reported in earlier studies. Also, transvaginal ultrasound resolution has improved significantly over the last two decades, making the detection of very early pregnancy a possibility, even during incidental examinations.

Thoroughness of follow-up procedures
Traditionally, it is assumed that women who are lost to follow-up experience the same rate of accidental pregnancy as those who continue to be observed [20]. However, this assumption may not be accurate. In one study of the efficacy of the calendar rhythm method, the pregnancy rate rose from 9.4 to 14.4 per 100 women-years of exposure when cases lost to follow-up were resolved [35]. In some trials, 20% or more of subjects in the trial are lost to observation, and in such cases, an increase in pregnancy rates among patients lost to follow-up could substantially affect the outcome of the trial [18,20]. Consequently, studies with more thorough follow-up procedures may report higher pregnancy rates.

Definition of "on-study pregnancy"/post-study
Clinical trials do not evaluate post-study pregnancies uniformly. Consequently, studies that conduct pregnancy testing on all subjects after the end of the study will tend to have higher pregnancy rates than those that do not. Studies of this type are superior because they are more likely to detect unobserved pregnancies that result in spontaneous abortions and reduce the likelihood of unreported induced abortions.
Further, inconsistencies exist even among studies that conduct post-study pregnancy testing as trials differ in the timing of the test results that are considered. What is defined as an "onstudy pregnancy" varies from study to study. Regulatory agencies are inconsistent in their requirements for assessing pregnancies that are detected after the end of the study, but they do want to make sure that women who become pregnant while on study medication are counted even though these pregnancies may not be immediately identified. In some studies, pregnancies detected within 7 days after the last dose of active treatment are included while in others pregnancies detected within 14 days are included [36][37][38][39].

Exclusion for non-adherence
Pregnancy rates are also calculated differently from study to study. The definition of method failure and user failure may vary as some studies may exclude pregnancies from being classified as user failures if only 1 pill was missed, while others may exclude them even if 3 pills were missed [5].
Studies also assess adherence differently and methods for doing so vary in accuracy, including paper diaries, electronic diaries, and interactive voice responses [40]. In one study, there was only a 45% agreement between the electronic monitoring and detection of pill use and missed pills self-reported in patient diaries [41]. In this study, electronic estimates of the number of women who missed at least 3 pills in a cycle was 3 times as high as the diaryderived estimate [41]. A more recent study on the use of daily text message reminders to improve OC adherence yielded similar results, as an electronic monitoring device indicated much poorer adherence than that recorded by patient diaries [42]. While most patients honestly keep track of pill taking and diary recordkeeping, many investigative sites describe patients who routinely fill out months of diary information retrospectively, even while in their car before appointments at clinical centers. Such practice is notoriously prone to recall bias and overestimation of compliance. Categorization of failures is thus imprecise because of the poor correlation between reported and actual adherence. Here again, new cell phone apps should increase the quality of reporting.
Use of other contraceptives while on study medication is also an important variable as the simultaneous use of back-up contraceptives dramatically reduces the risk of unintended pregnancy, particularly if back-up methods are used properly and consistently [22].

SUMMARY
Numerous factors have likely contributed to the increases in Pearl indices that have been observed in recent decades. Two factors seem to us to be the biggest contributors. The first is more frequent pregnancy testing with more sensitive pregnancy tests. We favor study designs that include such testing.
The second is a decrease in adherence among the study population due to changes in study populations over time.
A recently completed phase III trial starkly illustrates the challenges facing companies and regulators as patient populations become more diverse [43]. In this 13-cycle trial, women were randomly assigned to either a new low-dose EE-LNG patch or an approved OC containing 20 µg EE and 100 µg LNG; women in the OC arm used OCs for 6 cycles and then were switched to the patch. Only 17% switched from another method of hormonal contraception; only 57% were white non-Hispanic; and significant non-adherence was demonstrated by lab results. The overall Pearl indices were 4.96 in the patch group and 4.02 in the OC group. In order to ensure themselves against results like this, companies will need to conduct adequately powered equivalence studies with an already approved product as a comparator. And regulatory agencies must recognize that if they insist on more diverse study populations, the rates of contraceptive failure in methods requiring adherence will be much higher than those previously observed.
Concerns about counting pregnancies detected after the study drug is discontinued can be addressed easily, at least for women completing the trial. To avoid this concern, studies could last 14 cycles and have a 13-cycle life-table analysis of probability of failure as the primary endpoint.
We would also encourage investigators to develop new technologies such as cell phone apps to allow real-time reporting of such behaviors as adherence (so that failure rates during perfect use can be calculated) and coital frequency.
Finally, it would be very helpful if the regulatory agencies (particularly the FDA and the European Medicines Agency) would collaborate to make their requirements consistent. A joint advisory committee could produce recommendations about study design and analysis that would make studies of the same or similar products easier to compare.