THE 6-MINUTE WALK TEST AND OTHER CLINICAL ENDPOINTS IN DUCHENNE MUSCULAR DYSTROPHY: RELIABILITY, CONCURRENT VALIDITY, AND MINIMAL CLINICALLY IMPORTANT DIFFERENCES FROM A MULTICENTER STUDY

Introduction: An international clinical trial enrolled 174 ambulatory males ≥5 years old with nonsense mutation Duchenne muscular dystrophy (nmDMD). Pretreatment data provide insight into reliability, concurrent validity, and minimal clinically important differences (MCIDs) of the 6-minute walk test (6MWT) and other endpoints. Methods: Screening and baseline evaluations included the 6-minute walk distance (6MWD), timed function tests (TFTs), quantitative strength by myometry, the PedsQL, heart rate–determined energy expenditure index, and other exploratory endpoints. Results: The 6MWT proved feasible and reliable in a multicenter context. Concurrent validity with other endpoints was excellent. The MCID for 6MWD was 28.5 and 31.7 meters based on 2 statistical distribution methods. Conclusions: The ratio of MCID to baseline mean is lower for 6MWD than for other endpoints. The 6MWD is an optimal primary endpoint for Duchenne muscular dystrophy (DMD) clinical trials that are focused therapeutically on preservation of ambulation and slowing of disease progression. Muscle Nerve 48: 357–368, 2013

families. Ambulation has been noted to be a prerequisite for normal physical functioning in humans 13 ; the major goal of medical and physical therapy intervention during the ambulatory phase of DMD is to maintain ambulation for as long as possible. 4,5,14,15 Given that ambulatory compromise is a key component of the DMD disease process and that ambulation measures the function of multiple muscle groups as well as cardiovascular activity, ambulationrelated outcome measures are the most relevant endpoints in DMD patients who are still able to walk. Typically, evaluation of ambulation in DMD features short-term assessments, such as the 10-meter run/ walk, 14,16 which measure transient peak activities. The 10-m run/walk test is well accepted and commonly employed in assessing disease progression, but it does not measure endurance, a crucial aspect of ambulatory functioning.

DEVELOPMENT OF THE 6-MINUTE WALK TEST IN DMD
The 6-minute walk test (6MWT), a well-established outcome measure in a variety of diseases. It is accurate, reproducible, simple to administer, and well tolerated. 17 It was originally developed as an integrated global assessment of cardiac, respiratory, circulatory, and muscular capacity. 17 More recently, it has been used to evaluate functional capacity in neuromuscular diseases [18][19][20][21][22][23][24][25] and has been the basis for regulatory registration of several drugs. 19,21,24,25 Importantly, the 6MWT assesses function and endurance, which are important aspects of DMD patients' disease status. The 6MWT is a robust assessment tool for use in clinical trials given its ability to quantitatively evaluate ambulation in a controlled environment.
In advance of this study, the American Thoracic Society (ATS) version of the 6MWT was modified specifically for DMD. 26 Also, an orientation video was developed to assist the pediatric subjects (some of whom have cognitive delay) in their understanding of the nature and expectations of the test. In an earlier short-term study, we reported that the modified 6MWT is feasible, safe, and reliable in boys with DMD who have not yet transitioned to full-time wheelchair use. 26, 27 We also documented that they have markedly compromised ambulation relative to healthy boys and correlated 6-minute walk distance (6MWD) with age, anthropometric characteristics, and measures, which change with disease progression, including stride length and cadence. 26,27 In a follow-up longitudinal study, 27 we documented that changes in 6MWD depended on stride length and age; improvements in 6MWD usually occurred up to 7 years of age in both healthy subjects and patients with DMD. However, the 6MWD of older DMD subjects worsened, whereas the 6MWD of older healthy subjects tended to either increase or remain stable. Subsequent studies have demonstrated that the 6MWD correlates with other clinical endpoints in DMD, such as timed function tests and the North Star Ambulatory Assessment (NSAA). 28,29

MINIMAL CLINICALLY IMPORTANT DIFFERENCE OF ENDPOINTS
Interpretation of functional changes in walk tests can guide clinical management and be primary endpoints in interventional studies. It thus is important to determine whether a change in function is clinically relevant. Data from this study will allow us to determine quantitatively the minimal clinically important difference (MCID) for the test. The MCID is a concept defined as "the smallest difference in score in the domain of interest which patients perceive as beneficial and which would mandate, in the absence of troublesome side effects and excessive cost, a change in patient management." 30 The MCID is different from the minimal detectable change, which indicates the amount of change required to exceed measurement variability. 31,32 When interpreting clinical measures, it is important to consider that, although small changes may be significant statistically, they may not be relevant clinically. 31,33 Numerous methods to derive the MCID have been described. 31,32,[34][35][36][37][38][39][40][41][42][43][44] These include anchorbased methods, 31,36,39,40 which compare a patient's change score with another measure of clinically relevant change, and distribution-based methods, 36,[41][42][43] such as the standard error of measurement (SEM), effect size, and one third the standard deviation (SD) at baseline, which are built on the statistical distribution and psychometric properties of the measure in a population. Anchor-based and distribution-based methods are seen as complementary approaches for MCID determination. 44

AIMS
There are 3 aims in this report: (1) to assess safety and feasibility of the 6MWT in a large multicenter context; (2) to assess the reproducibility and concurrent validity of the 6MWT results in comparison with other commonly used clinical endpoints utilizing pretreatment results from the international multicenter, randomized, placebocontrolled study of ataluren in ambulatory boys with nmDMD (PTC124-GD-007-DMD, Study 007); and (3) to estimate the MCID using distributionbased methods. These analyses lay the groundwork for longitudinal studies of the 6MWD in DMD.

METHODS
Participants. The pretreatment data under evaluation were derived from Study 007; this study enrolled males !5 years old at 37 sites in 11 countries (Australia, Belgium, Canada, France, Germany, Israel, Italy, Spain, Sweden, UK, and USA). All patients had phenotypic evidence of dystrophinopathy; had a nonsense mutation in the dystrophin gene as determined by gene sequencing; and walked !75 m unassisted during a 6MWT at screening. There was an inclusion criterion of phenotypic evidence of more severe dystrophinopathy based on the onset of characteristic clinical symptoms (i.e., proximal muscle weakness, waddling gait, and Gower maneuver) by 9 years of age, an elevated serum creatine kinase (CK), and ongoing difficulty with ambulation. Patients on systemic glucocorticoids were required to be on a stable dose for 6 months prior to study entry. Institutional review boards/institutional ethics committees and health authorities approved the study protocol. All parents/participants provided signed informed consent/assent before study initiation.
Overall Study Design and Procedures. Study 007 was a phase 2b, international, multicenter, randomized, double-blind, placebo-controlled, dose-ranging study to evaluate the efficacy and safety of ataluren in ambulatory male patients with nmDMD !5 years old.
Initial study evaluations (the subject of this report) were performed at screening and baseline separated by up to 6 weeks. Test-retest reliability was determined using the baseline and screening values for all endpoints. Concurrent validity and MCIDs were determined using pretreatment data.
Subject Disposition and Characteristics. There were 174 randomized patients. All patients screened and randomized were males, ranging in age from 5 to 20 years (Table 1). Approximately 56% of patients were age <9 years, 57% had a baseline 6MWD !350 m, and 71% were receiving glucocorticoids. Nonsense mutations were distributed across the 79 exons of the dystrophin gene, with no mutational hotspots identified, and represented all 3 types of premature stop codons.
Outcome Measures. Before the study began, clinical evaluators from each of the 37 participating clinical Table 1. Grading used during timed function tests (grades 1-6).
Standing from supine sites participated in a clinical endpoint training and standardization session to harmonize the testing protocol and logistics across sites. A centralized retraining session was also held $1 year after study start. This report focuses on the outcome measures obtained from the patients during screening and baseline.
6-Minute Walk Test/6-Minute Walk Distance. Ambulation was assessed via the 6MWT following standardized procedures as developed at the University of California Davis, 26 by measuring the 6MWD in meters. The 6MWT for this study included modifications to the method recommended by the ATS for use in adults described in Appendix 2 in the Supporting Information. 17 Timed Function Tests. Timed function tests (TFTs) included time taken to stand from a supine position, time taken to run/walk 10 m, time taken to climb 4 standard-sized stairs, and time taken to descend 4 standard-sized stairs. [45][46][47][48][49][50][51] TFTs provide a measure of functional capability in ambulatory patients that is complementary to the 6MWT. The tests are reproducible, simple to administer, and have documented response to therapeutic intervention with steroids. 47,48 A stopwatch was used to time the 10m run/walk, standing from supine, and 4-stair climb/ descend. For standing from supine the velocity was calculated as 1 divided by the time to complete the task. For the total task of climbing 4 standard stairs, velocity was calculated as 1 divided by the time to complete the task. Subjects were given 30 s to complete all tasks. Use of velocities for timed function measures results in a linear pattern of decline that adequately represents the impact of the "zero velocities" of individuals who are unable to perform the evaluation. 52 Timed Function Test Grades. Functional adaptations employed by patients during the TFTs were evaluated and graded by clinical evaluators according to standardized scales developed by one of the investigators (M.E.). Table 1 provides a description of the standardized scales.
Myometry. Upper and lower extremity myometry was performed using a hand-held myometer following standardized procedures. [53][54][55][56] Muscle groups evaluated included knee flexors, knee extensors, elbow flexors, elbow extensors, and shoulder abductors. Bilateral assessments were done, and 3 measurements (in pounds) were recorded from each muscle group on each side.
Health-Related Quality of Life. Health-related quality of life (HRQL) was measured via the Pediatric Quality of Life Inventory (PedsQL). [57][58][59][60] The generic core module comprises 23 questions. The PedsQL is available in all languages relevant for this study and was to be completed by both the patient and parent/ caregiver. The appropriate age-specific version was completed. It was planned that a patient would be evaluated with the same age-specific form even if during the study an age change made him eligible for a different form. If the patient lacked the ability to complete the PedsQL, the parent/caregiver was still to complete the instrument. If possible, the same parent/caregiver was asked to complete the instrument each time. HRQL was measured by all domains of the PedsQL (Physical, Emotional, Social, and School Functioning domain scores); however, only physical scores are included in this report due to the relative insensitivity of the other domains to disease progression in DMD. 61 Data Analysis. Available pretreatment data for all 174 patients from all sites were pooled for analysis of reliability (screening versus baseline performed within 6 weeks), concurrent validity (6MWD in comparison to selected secondary endpoints), and MCID determination for clinical endpoints. For the testretest analysis in boys with DMD, subjects who had observations at both visits for the parameter of interest were included. Pearson r and intraclass correlations (ICCs) for visits 1 and 2 were recorded. For concurrent validity, either the Pearson r or Spearman rho (r s ) rank order correlations were calculated. MCID was determined for clinical endpoints using 2 distribution methods: (1) the standard error of measurement method [baseline SD Á ͱ(1 2 r)]; and (2) one third of SD method (baseline SD Á 1 = 3 ).
Percent Predicted 6MWD. To account for maturational effects, including age, height, and associated stride length, 26,27 we calculated a percent predicted 6MWD. 62,63 This prediction equation has been validated in DMD 62 using the same DMD modified 6MWT protocol as in this study.
Energy Expenditure Index. Mean heart rate was measured before (during 5-min rest), during, and for 3 min after the 6MWT with a Polar RS400 heart rate monitor. Using these data, a post hoc analysis of energy expenditure index (EEI) was performed. EEI is the active heart rate (beats per minute) minus resting heart rate (beats per minute) divided by walking velocity (meters per minute). Thus, EEI was measured in units of beats per meter. EEI has been documented by Rose and colleagues to be a validated measure of energy expenditure in comparison to oxygen uptake by a metabolic cart in disabled children. [64][65][66][67][68] RESULTS Patient Characteristics. Patient characteristics are shown in Table 2. All patients were males, ranging from 5 to 20 years of age. All 3 premature stop codon types were represented in the study population.

Test-Retest
Reliability of Clinical Endpoints.
Pretreatment screening and baseline tests were compared for test-retest reliability. The median (range) between-test interval was 42 (0-91) days. Data were available from 174 subjects enrolled at 37 study sites, as shown in Table 3. In general, test-retest reliability for most measures was high. The 6MWD had the highest test-retest reliability of any clinical endpoint (ICC 5 0.92), as shown in Figure 1. ICCs for 6MWD, TFTs, and hand-held myometry were all strong (0.72-0.92), as shown in Table 3.

Invalid 6MWT Values due to Musculoskeletal
Injuries. Two patients had baseline 6MWD values that deviated markedly from their values at screening (up to 6 weeks earlier) and their first ontreatment values (at week 6). Both patients were documented to have sustained lower limb injuries prior to the baseline test (sprained ankle and right knee injury), which negatively affected their performances at baseline. Their 6MWD values at screening, baseline, and week 6 were 303, 125, and 309 for the first patient and 395, 309, and 481 m for the second patient, respectively. Lower limb injuries are a known source of 6MWD variability. Through a comprehensive analysis, it was verified that these were the only patients in whom the baseline 6MWT results were affected by lower limb injuries. Considering the strong influence the injuries had on their baseline 6MWD values, it was considered appropriate to declare the baseline test for these patients invalid and to use the screening value as a more accurate reflection of their 6MWD at baseline. Concurrent Validity of Clinical Endpoints. For concurrent validity, the following were evaluated: 6MWD vs. Timed Function Tests. Interpretation of time values (in seconds) obtained from timed function testing is limited to those patients who are able to complete the testing, thus creating results biased in favor of more functional individuals. As a result, time scores (in seconds) were converted into velocities. Table 4a shows correlations between 6MWD and velocities for TFTs at baseline (using   Pearson r). Table 4b shows Spearman rank correlations between method of TFT and the respective time to perform the test. Figure 2 shows the relationship between the baseline velocity during the 10-m run/walk test and baseline 6MWD. It should be noted that a mean value of 358 m on the 6MWD corresponds to a velocity of 1.64 m/s on the 10-m run/walk, which is a time of 6 s for this test. Timed Function Test Comparisons. As shown in Table 4a, all TFT velocities had a moderate to high correlation with one another and strong correlations with the specific methods for performance of the TFTs.
Myometry vs. 6MWD and Timed Function Tests. Table 4 shows the correlations between right and left myometry measures and 6MWDs, TFTs, and methods of timed function. Knee extension strength correlated better with timed function velocities and with timed function grades based on function, whereas knee flexion strength had lower correlations with timed function velocities and methods of timed function.
6MWD vs. Knee Extension Strength. Figure 3a and b shows the relationships between knee extension strength and ambulatory function as measured by percent predicted 6MWD. Figure 3a depicts absolute quantitative knee extension strength (in pounds). Figure 3b depicts knee extension strength normalized to body weight. It should be noted that the relationship between these 2 variables (strength and 6MWD) is not linear but logarithmic. In DMD, at reduced 6MWD values below 50-55% predicted (based on age and height), there are substantial declines in ambulatory function as measured by the 6MWD. These declines occur despite relatively small changes in knee extension strength values, which have reached a floor effect.
6MWD vs. Energy Expenditure Index. Figure 4 shows the relationship between heart rate-derived EEI and percent predicted 6MWD in 174 DMD subjects at baseline. The relationship between percent predicted 6MWD and heart rate-derived EEI is logarithmic. When percent predicted 6MWD approaches 50% of control values, there appears to be a precipitous increase in the energy cost of locomotion as measured by the EEI. Thus, the relationship between 6MWD and the EEI is well represented by a negative exponential model.    moderate association between 6MWD and the patient-derived or parent-proxy PedsQL Physical Function Scale (r 5 0.47).
Minimal Clinical Important Differences. Distributionbased methods were applied to pretreatment data on all subjects to generate a DMD-specific estimate of the MCID for the following clinical endpoints: 6MWD: Based on the standard error of measurement [defined as baseline SD Á ͱ(1 2 r) where r is test-retest reliability], the MCID for 6MWD in DMD is estimated to be 28.5 m (Table 5). Based on a definition of one-third of the standard deviation at baseline, the estimated MCID for 6MWD in DMD is 31.7 m (Table 5). These MCID values represent 8.0% and 8.9% of the mean baseline 6MWD.
TFT: Using similar distribution-based methods for all baseline data for time to stand from supine, time to climb 4 stairs, and time to run/walk 10 m (Table 5), the MCID values represent 18.9-33.9% of the mean baseline TFT values.
Knee extension: Using these same distributionbased methods, Table 5 shows the MCID values for knee extension strength by hand-held myometry to be 15.7% and 17.9% of mean baseline knee extension strength.

DISCUSSION
The findings from this report reflect an evaluation of the largest data set collected to date in a multicenter context for determination of reliability and concurrent validity of 6MWD and other clinical endpoints in DMD. In addition, the investigation has addressed the distribution-based MCID for commonly employed endpoints, including TFTs and quantitative knee extension strength measures, as well as the 6MWD, which is the most common primary endpoint for ambulatory DMD clinical trials.
Test-Retest Reliability of Measures. The 6MWT showed the highest test-retest reliability of any endpoint used in the study. TFTs, such as the 10-m run/walk test, are easy to administer and conveniently applied in a clinical setting. However, these tests have inherent disadvantages for clinical trials, including slightly reduced test-retest reliability relative to the 6MWT. Advantages of the TFTs as secondary endpoints for DMD trials include previous steroid-naive natural history data 16 and   contemporary long-term natural history data available through the Cooperative International Neuromuscular Research Group (CINRG) Duchenne Natural History Study, 52 relating the measures to loss of ambulation, the ease of utilizing the measures in the clinic in everyday clinical practice, and their potential inclusion as core measures for registries. Approximately 20% of patients, however, were unable to perform stand from supine at study entry.
To maximize reproducibility, the 6MWT should not be performed in the setting of an acute condition (e.g., musculoskeletal injury) that affects walking ability. In addition, future inclusion criteria can reduce variability by requiring that screening and baseline 6MWD values be within a certain percentage of one another (e.g., 20%).

Concurrent Validity of Clinical Endpoints.
With regard to concurrent validity, the 6MWD was shown to be associated with other measures of disease progression in DMD, but in general it showed closer correlation with TFTs when compared with quantitative strength measures. Time to climb 4 stairs was the timed function measure most highly correlated with knee extension strength, so this may be a particularly useful secondary endpoint for DMD trials. Concurrent validity between the NSAA and 6MWD has been demonstrated previously. 28,29 In this multicenter study the 6MWD correlated highly with the graded methods of performing TFTs, which are evaluator-derived DMD-specific measures of disease progression analogous to several components of the NSAA. Comparisons with method of timed function use Spearman rho rank order correlations (r s ); all other comparisons done using Pearson r. 6MWD, 6-minute walk distance.
P < 0.0001 except where noted (NS); *P < 0.05 † P < 0.01 and ‡ P < 0.001. Myometry has been found previously to be less sensitive to changes in disease status than TFTs in ambulatory DMD boys. 69 Our study has shown that the 6MWT has obvious advantages over quantitative strength measures as an endpoint in DMD due to its sensitivity to detect change in children who have a decline in ambulatory function. In DMD, at reduced 6MWD values below 50-55% predicted (based on age and height), there are significant continued declines in ambulatory function as measured by the 6MWD that occur despite relatively small changes in knee extension strength values, which appear to have approached a floor effect. An alternative concept is that, once lower extremity strength reaches a critically low threshold value, a more precipitous deterioration in ambulatory function may occur over 12 months with relatively little incremental loss of strength during that time.
Although there is a moderate correlation between 6MWD and the parent-proxy-reported PedsQL Physical Function Scale, the relationship is not as strong as that reported between 6MWD or walking speed and other patient-reported outcomes, such as the Transfers/Basic Mobility scale, Sports Physical Functioning Scale, or Global scale from the POSNA/PODCI Pediatric Outcomes Instrument. 61,70 6MWT Is an Integrated Global Measure of Ambulatory Function and Metabolic Efficiency. Short-term assessments in DMD that measure transient peak physical activities, such as 10-m run/walk, do not measure endurance, a crucial aspect of ambulatory functioning. Due to the combination of strength loss and cardiopulmonary involvement, children with DMD experience increases in the energy cost of locomotion (more metabolic energy consumed per distance traveled) with increasing disease progression. Other investigators have validated the use of the heart rate-determined EEI as a measure of energy cost in disabled children. [64][65][66][67][68] Heart ratedetermined EEI has been validated previously in DMD using a COSMED portable metabolic cart. 71 Other studies also showed increased energy cost of locomotion in DMD. 72 Our study has shown that the 6MWD can be considered a proxy measure for the energy cost of locomotion in DMD. In general, higher EEI is associated with more metabolically inefficient ambulation and more impaired endurance. A recent report demonstrated the 6MWT to also be highly correlated with an assisted 6-minute cycling test, 73 which is a measure of endurance in both ambulatory and non-ambulatory patients with DMD. The 6MWT is therefore an integrated global measure of ambulatory function that is influenced by decreased lower extremity strength, biomechanical inefficiencies during gait, diminished endurance, and compromised cardiorespiratory status.
Minimal Clinically Important Differences. MCID is a construct that can be determined by statistical distribution approaches, anchor-based methods with patient-reported outcome measures, and determination of clinically meaningful changes with treatments. Prior to the acquisition of disease-specific data, the study of MCID in other diseases is instructive. As shown in Appendix 3 in the Supporting Information, data from placebo-controlled studies of laronidase for mucopolysaccharidosis type I (MPS I), idursulfase for MPS II, bosentan for primary pulmonary hypertension, and alglucosidase-alpha for Pompe disease support the clinical meaningfulness of a 30-m treatment effect for the 6MWT. 19,21,24,25 In these studies, differences in mean changes in 6MWD in drug-treated patients versus placebotreated patients ranged from 28 to 44 m, or 8-13% of baseline 6MWD (Appendix 3 in Supplementary Material). The data from the trials in MPS I, MPS II, and Pompe disease are especially relevant given that patient activity limitations in these diseases and those in DMD result from disease-related impairments in neuromuscular and pulmonary systems.
Subsequent to the initiation of the ataluren Study 007 in 2008, additional research had been conducted to define the MCID for 6MWD across multiple diseases, including interstitial pulmonary fibrosis, coronary artery disease, chronic obstructive pulmonary disease, and parenchymal lung disease. [74][75][76][77] In each of these diseases, as in DMD, patients experience disease-related deficits in 6MWD relative to healthy controls. Those studies employed multiple methods to determine the MCID for 6MWD, including distribution-based methods utilizing statistical properties (e.g., effect size, standard error of measurement). In these pulmonary and cardiac diseases, estimates of the MCID for 6MWD ranged from 23 to 45 m (Appendix 3 of Supporting Information). [74][75][76][77] These estimates of the MCID for 6MWD correspond to 4.7-11.6% of mean baseline 6MWD.
Distribution-based methods are commonly used for initial MCID determination. In this study, we addressed initially the question of MCID for the 6MWD-a relatively new endpoint in DMD-by applying 2 commonly utilized distribution-based methods for DMD-specific MCID determination. The clinical trials that used the 6MWD in other diseases formed the basis of the a priori choice of a 30-m 6MWD treatment effect when originally powering the ataluren trial for ambulatory DMD subjects. In the present study of placebo-treated patients, the MCIDs for 6MWD in DMD (corresponding to 8.0% and 8.9% of the mean baseline 6MWD) are both mid-range values compared with the MCIDs determined for the other diseases. Thus, the statistical distribution data provide support for a 6MWD MCID of 30 m in DMD and a targeted 30m difference between treatment arm and placebo for DMD therapeutic trials. Future longitudinal investigations of anchor-based approaches to MCID are suggested to validate these initial statistical distribution approaches and compare specific changes in 6MWD with the occurrence of both clinically meaningful milestones and significant changes in patient-reported outcomes. These studies may support a lower MCID for 6MWD than 30 m based on distribution-based methods. In addition, more experience evaluating the 6MWD in DMD patients treated with therapeutic agents known to be efficacious will help refine the determination of the MCID for this endpoint.
In this large, multicenter, international clinical trial, the 6MWT proved to be feasible and highly reliable, and it showed excellent concurrent validity with other commonly used clinical endpoints in DMD such as timed function tests and quantitative strength measures. The 6MWD proved to be a more sensitive clinical endpoint when compared with timed function and quantitative strength measures. Statistical distribution approaches support an MCID of $30 m for DMD. The 6MWT is an integrated global measure of ambulatory function that is influenced by lower extremity strength, biomechanical inefficiencies, endurance, and cardiorespiratory status. This study and additional longitudinal natural history data 78 from the ataluren clinical trial (Study 007) support acceptance of the 6MWT as the primary outcome measure of choice for ambulatory DMD clinical trials.