Differential Sensitivity of Mindfulness Questionnaires to Change With Treatment: A Systematic Review and Meta-Analysis

In support of the construct validity of mindfulness questionnaires, meta-analytic reviews have reported that scores increase in mindfulness-based interventions (MBIs). However, several studies have also found increased mindfulness scores in interventions with no explicit mindfulness training, raising a question about differential sensitivity to change with treatment. We conducted a systematic review and meta-analysis of 37 randomized controlled trials in which mindfulness questionnaires were administered before and after an evidence-based MBI and a nonmindfulness-based active control condition. The central question was whether increases in mindfulness scores would be greater in the MBI than in the comparison group. On average, participants in MBIs showed significantly greater pre–post changes in mindfulness scores than were seen in active control conditions with no explicit mindfulness elements, with a small overall effect size. This effect was moderated by which mindfulness questionnaire was used, by the type of active control condition, and by whether the MBI and control were matched for amount of session time. When mindfulness facet scores were analyzed separately, MBIs showed significantly greater pre–post increases than active controls in observing, nonjudging, and nonreactivity but not in describing or acting with awareness. Although findings provide partial support for the differential sensitivity of mindfulness questionnaires to change with treatment, the nonsignificant difference in pre–post change when the MBI and control were matched for session time highlights the need to clarify how mindfulness skills are acquired in MBIs and in other interventions and whether revisions to mindfulness questionnaires would increase their specificity to changes in mindfulness skills.

called the what and the how of mindfulness (Linehan, 1993), are shown in Table 1 and suggest that mindfulness involves paying attention to the present moment with qualities of openness, nonjudgment, acceptance, friendliness, curiosity, kindness, and compassion.
The assessment of mindfulness is important in understanding its relationships with other variables and its role in health and wellbeing (Baer, 2011;Park, Reilly-Spong, & Gross, 2013;Quaglia, Braun, Freeman, McDaniel, & Brown, 2016). Measurement of mindfulness relies largely on self-report questionnaires designed to assess a general disposition or trait-like tendency to be mindful in daily life. This tendency is understood to vary in the general population and to be susceptible to change with training and practice. The most commonly used mindfulness questionnaires are the Mindful Attention Awareness Scale (MAAS: Brown & Ryan, 2003), the Five Facet Mindfulness Questionnaire (FFMQ; Baer, Smith, Hopkins, Krietemeyer, & Toney, 2006), the Kentucky Inventory of Mindfulness Skills (KIMS; Baer, Smith, & Allen, 2004), the Freiburg Mindfulness Inventory (FMI; Buchheld, Grossman, & Walach, 2001), and the Cognitive Affective Mindfulness Scale-Revised (CAMS-R; Hayes & Feldman, 2004).
The psychometric properties of these questionnaires have been widely studied. In a review, Park et al. (2013) described evidence for their internal consistency as strong, meaning that multiple studies of good quality have reported Cronbach's alphas Ն.70 for unidimensional scales or subscales. Test-retest reliability has been examined less often. Park et al. reported adequate values for three of four KIMS subscales (intraclass correlation [ICC] Ն .70 or Pearson's r Ն .80). Mixed findings have been reported for the FMI, with an unspecified coefficient of .67 in a Chinese sample (Chen & Zhou, 2014) and ICC ϭ .80 in a French sample (Trousselard et al., 2010). The most comprehensive studies of test-retest reliability were reported by Jensen, Krogh, Westphael, and Hjordt (2019) and Jensen, Niclasen, Vangkilde, Petersen, and Hasselbalch (2016), who examined both the MAAS and the FFMQ in Danish student and community samples. Test-retest reliabilities were good over a 2-week interval (MAAS: ICC ϭ .88, FFMQ: ICCs Ն .82 for all facets), and satisfactory over a 6-month interval (MAAS: ICC ϭ .74, FFMQ: ICCs Ն .74 for all facets). Both instruments showed greater 6-month stability than was seen for a measure of psychological distress.
Overall, mindfulness questionnaires have performed reasonably well on a variety of psychometric tests. However, a question has arisen about their differential sensitivity to change with intervention, with several studies showing that mindfulness scores increased about equally in MBIs and in other active treatments. For example, Goldberg et al. (2016) found that FFMQ scores showed similar increases in MBSR and in the Health Enhancement Program (HEP; MacCoon et al., 2012), an active control designed to match many aspects of MBSR (group size, session time, home practice, etc.) while including no mindfulness training. Both groups showed larger increases in FFMQ scores than were seen in a waitlist control group. In a meta-analysis, Visted et al. (2015) found no significant differences in mindfulness scores between mindfulness training and active control groups. In contrast, other studies have shown higher posttreatment mindfulness scores in MBIs than in other treatments. In adults with generalized anxiety disorder, Hoge et al. (2015) reported higher mindfulness scores in MBSR than in a stress management education group. Johns et al. (2016) found similar results in adults with cancer and fatigue.
To investigate these conflicting findings, we conducted a systematic review and meta-analysis of differential sensitivity of mindfulness questionnaires to change with intervention. We included only randomized controlled trials that compared an evidence-based MBI to an active control with no explicit mindfulness training. We hypothesized that mindfulness questionnaires would show greater pre-post increases in MBIs than in active  Kabat-Zinn, 1994 Paying attention, or the awareness that arises through paying attention On purpose, in the present moment, and nonjudgmentally; with an affectionate, compassionate quality, a sense of openhearted friendly presence and interest Marlatt & Kristeller, 1999 Bringing one's complete attention to present experiences On a moment-to-moment basis, with an attitude of acceptance and loving kindness Bishop et al., 2004 Self-regulation of attention so that it is maintained on immediate experience With an orientation characterized by curiosity, openness, and acceptance Germer, Siegel, & Fulton, 2005 Awareness of present experience With acceptance: an extension of nonjudgment that adds a measure of kindness or friendliness Linehan, 2015 The act of focusing the mind in the present moment Without judgment or attachment, with openness to the fluidity of each moment controls. Confirmation of this hypothesis would add to the evidence supporting the construct validity of mindfulness questionnaires by showing that scores increase as expected with mindfulness training but not with other types of intervention. Disconfirmation of the hypothesis (i.e., similar changes in mindfulness scores for MBIs and comparison conditions) might suggest that the questionnaires, though written to be specific to mindfulness skills, are sensitive to changes in other constructs, such as distress, that improve with a variety of interventions. Alternatively, other programs may implicitly teach mindfulness or related skills such as awareness of thoughts and feelings and willingness to experience them. Differential sensitivity to change with treatment can be tested more clearly when there is a high level of confidence that the MBI should teach mindfulness skills. For this reason, we included only studies of MBSR, MBCT, and well-established variants that have a strong evidence based and are consistent with the defining features of MBIs as described by Crane et al. (2017). These features include intensive training in mindfulness meditation through in-session and home practice over several weeks, an experiential inquiry-based learning process, and other exercises designed to help participants develop a new relationship to present-moment experience based on friendly interest, decentering, equanimity, and compassion (see Crane et al. for more detail). Exclusion of single-session and laboratory-based mindfulness inductions and other mindfulness trainings with little empirical support provides a clearer test of the hypothesis by strengthening the expectation that the MBI should lead to increased mindfulness skills and minimizing the possibility that the two interventions yielded similar mindfulness scores because of inadequate mindfulness teaching in the MBI.
We expected that differential sensitivity of mindfulness questionnaires to change with treatment could be influenced by aspects of the questionnaires themselves, aspects of the comparison treatments, or the design of the trials in which the questionnaires were used. We conducted planned moderator analyses for four such variables. First, measures differ in their conceptualization of mindfulness and some may have better differential sensitivity to change with intervention than others; therefore, we examined whether findings differed depending on which mindfulness measure was used. Second, we expected that the type of active control intervention could affect the extent to which mindfulness questionnaires show differing levels of change in the two groups. Some comparison treatments might cultivate mindfulness-related skills, such as awareness of thoughts and feelings and willingness to experience them, even if they include no explicit mindfulness training. In particular, cognitive-behavioral therapy is known to cultivate decentering, which is strongly correlated with mindfulness (Carmody, Baer, Lykins, & Olendzki, 2009) and improves in both CBT and MBIs (Carmody et al., 2009;Farb et al., 2018;Fresco, Segal, Buis, & Kennedy, 2007). We predicted that differences between interventions in cultivation of mindfulness skills would be smaller when comparing MBIs to CBT but larger when comparing MBIs to medication, which is not intended to teach skills, and larger when comparing MBIs to psychosocial interventions that are not designed to teach mindfulness or decentering.
Another aspect of the control intervention that might influence differential sensitivity to change with treatment is whether it is matched to the MBI for number and duration of sessions. If mindfulness questionnaires show differential sensitivity to change when session time is matched, it would suggest that the questionnaires measure something that changes with mindfulness training but not with treatments that provide equal time for the development of other skills or nonspecific factors such as support. On the other hand, if differential sensitivity is seen only when the MBI has greater session time, the possibility would remain that the non-MBI might have led to similar increases in mindfulness scores if more session time had been provided. This would suggest either that the questionnaires are sensitive to change in constructs other than mindfulness skills, or that both treatments cultivate mindfulness skills. Therefore, we examined whether matching for session time moderated the findings. Finally, cultivation of mindfulness skills may be stronger when the mindfulness training adheres to an evidence-based protocol. All included studies used MBIs with well-established protocols, but not all included fidelity checks; therefore, we examined whether findings differed depending on whether the study included a formal check of fidelity to the MBI protocol (with fidelity checks used as a proxy for quality of protocol adherence). A significant moderation would support the differential sensitivity of the mindfulness questionnaires by showing that scores increase more when there was an indication that mindfulness skills were well taught.
Our review focused only on differential sensitivity to change with treatment and did not address other aspects of validity which have been reviewed elsewhere. We did not analyze effects on clinical outcomes because we assumed that MBIs should teach mindfulness skills regardless of whether the intervention led to clinically meaningful reductions in symptoms, and because numerous meta-analyses examining the effects of MBIs on clinical outcomes are available. We included only measures of mindfulness and did not include measures of decentering, self-compassion, or other related constructs, which have been used less often in trials of MBIs.
Our review adds to previous meta-analyses in several ways. Khoury et al. (2013) focused primarily on clinical outcomes and did not examine differences between mindfulness questionnaires in sensitivity to change. Quaglia et al. (2016) collapsed across questionnaires to test common dimensions of mindfulness rather than examining each questionnaire separately and did not consider differences between types of active controls. Visted et al. (2015) did not exclude active controls with explicit mindfulness elements and included only 12 studies, whereas we found 37 comparing an MBI to an active control. Our review is unique in including only MBIs based on the gold standard curriculums of MBSR, MBCT or close variants, which are intensive courses designed to teach mindfulness skills. Our review is also unique in testing differences between mindfulness questionnaires and the effects of different types of active control groups on differential sensitivity to change with treatment.

Method
The protocol for this meta-analysis was registered on PROS-PERO (Registration Number: CRD42017065786) and conducted in accordance with the PRISMA guidelines (Moher, Liberati, Tetzlaff, & Altman, 2009).

Search Strategy
The following databases were searched for studies up to 12 December 2017: PsycINFO, Scopus, Web of Science, and Medline. Abstracts or titles were searched using the following search term: ("mindfulness-based" OR MBCT OR MBSR OR Breathworks OR MBLC OR MBCP OR MBRP) AND random ‫ء‬ . Clinical trial registers (ClinicalTrials.gov, ISRCTN.com) were also searched, using the search term mindfulness, to identify unpublished, completed interventional studies of MBIs which recruited adults. Corresponding authors of the final set of papers were e-mailed for any additional unpublished data (e.g., facet scores in addition to total scores), if sufficient data were not reported (e.g., papers reporting only baseline data), and for clarification (e.g., on number of participants in each condition). When authors failed to respond to the initial request for data, a further e-mail was sent. Where findings from a trial have been reported across multiple papers, we selected the paper in which mindfulness data are reported or the study with the larger sample size. Reference lists of the final set of papers were searched manually to identify additional papers not identified in the original search.

Inclusion and Exclusion Criteria
We included studies that (a) were randomized controlled trials, (b) recruited adults (aged 18 years or over), (c) compared an MBI to an active control condition (face-to-face or non-face-to-face condition) that did not include explicit mindfulness training, where "active control" is defined in line with the Cochrane Handbook 5.1 as a different kind of therapy or treatment (Higgins & Green (2011), (d) included an empirically supported measure of mindfulness, and (e) evaluated MBSR, MBCT, or a well-established variant (Breathworks, mindfulness-based living course, mindfulness-based childbirth and parenting, and mindfulness-based relapse prevention).
We excluded studies that were not reported in the English language and evaluated an MBI that (a) was not delivered in person (self-help or online MBIs), (b) was not delivered in a group format, (c) had fewer than eight sessions or less than 12 hr of face-to-face contact with a trained MBI facilitator, or (d) compared the MBI only to an inactive control condition where inactive control is defined in line with the Cochrane Handbook 5.1 as including a placebo, no treatment, standard care, or a waiting list control. Group format was required because the well-established MBIs we tested were designed for group delivery and the evidence base supporting them is based almost entirely on this format.

Data Extraction and Analysis
For each condition, baseline and postintervention means, standard deviations, and number of participants for measures of mindfulness were extracted and entered into Comprehensive Meta-Analysis (Version 3.0; Borenstein, Hedges, Higgins, & Rothstein, 2013). Study characteristics for the moderator analyses described later also were entered. All data were extracted by Jenny Gu and any uncertainties or queries that arose were resolved in discussion with the other authors.
Pre-post between-Group Hedges' g effect sizes, 95% confidence intervals (CIs), and z and p values were computed. The pre-post between-groups effect size reflects the difference between pre-post change in the MBI group and pre-post change in the active control. By convention, a small effect size is considered to be 0.2, a medium effect size is 0.5, and a large effect size is 0.8 (Cohen, 1988). The overall Hedges' g effect size was computed using a random effects model because of differences between included studies (e.g., in the mindfulness measure used, control group). Under a random effects model, the pooled effect size is the weighted average of individual Hedges' g effect sizes, with each study weighted by the inverse of its variance (sum of within-study and between-study variance).
Data were extracted and meta-analyses were performed for six outcomes: the total mindfulness score from any empirically supported measure of mindfulness, facet scores for observing, describing, acting with awareness, and nonjudging from the KIMS or FFMQ, and the nonreactivity facet from the FFMQ. Where standard deviations were not provided, they were calculated from standard errors and confidence intervals.
Forest plots of pre-post between-groups effect sizes were produced for each of the six outcomes and for moderator analyses. Heterogeneity of effect sizes was assessed using the chi-square statistic (Cochrane's Q) and I 2 index. A significant Q value indicates heterogeneity of effect sizes. I 2 indicates the percentage of variance in effect sizes attributable to true, between-study heterogeneity rather than sampling error or chance. I 2 values of around 25%, 50%, and 75% can be considered as indicating low, moderate, and high heterogeneity, respectively (Higgins & Thompson, 2002).
Moderator analyses were planned for (a) which mindfulness measure was used (e.g., FFMQ, MAAS, KIMS), (b) type of control condition (CBT or CBT-based, medication, other), (c) whether the control intervention was matched to the MBI for same/greater amount of face-to-face contact and number of sessions, and (d) whether formal fidelity checks for the MBI were reported. Subgroup effect sizes are reported when the moderator analysis is significant.
To address publication bias, a funnel plot was produced and the trim and fill method was used. Rosenthal's (1979) Fail-Safe N and Begg and Mazumdar's (1994) rank correlation test were also computed for the analysis of mindfulness total scores. Funnel plots display study effect sizes against their standard errors; points evenly distributed around the mean effect size (represented as a vertical line) and forming a symmetrical inverted funnel shape indicate that publication bias is unlikely. Publication bias is suggested if the funnel shape is distorted such that there is a disproportionate number of studies with larger standard errors (generally studies with smaller samples) on the side of the mean favoring the intervention condition. This would suggest that smaller studies are more likely to be published if they found larger effects and that studies with effects favoring the control condition may be missing from the published literature. The trim-and-fill method provides an estimate of the number of missing studies and an adjusted overall mean effect size. Rosenthal's Fail-Safe N estimates the number of unpublished studies with similar sample sizes and with effect sizes of zero that would be needed to reduce the mean effect size to nonsignificance. Effect sizes can be considered robust if the required number of unpublished studies is greater than or equal to 5k ϩ 10, where k is the number of studies in a meta-analysis (Rosenberg, 2005). Begg and Mazumdar's rank correlation test examines the rank correlation between standardized effect sizes and their standard errors using Kendall's tau. Publication bias would be indicated by a significant correlation between effect size and standard error, with smaller studies (with larger standard errors) associated with larger effect sizes.
The Cochrane Collaboration's risk of bias tool  was used to assess the risk of bias in each study (low, unclear, or high risk of bias) using the following seven criteria: adequacy of random sequence generation, concealment of the allocated intervention from participants and investigators, blinding of participants and personnel to the intervention allocation, blinding of outcome assessors to intervention allocation, completeness of outcome data (whether attrition, exclusions, and missing data were adequately addressed), evidence of selective outcome reporting, and other sources of bias. Risk of bias was not assessed for one unpublished study (Simshauser, Luking, Kaube, Schultz, & Schmidt, in press;ClinicalTrials.gov: NCT00826475). A total quality score was computed for each study, with 1 point awarded for low risk of bias and 0 points awarded for high or unclear risk for each of the seven criteria. Quality scores ranged from 0 to 7. Correlations (Pearson's r) between quality scores and effect sizes for total mindfulness scores were computed. Figure 1 shows the study selection process. Searching of databases using the terms described earlier yielded 2,343 records. An additional 1,249 papers were identified through clinical trials registers, and 3 were identified by contacting authors. After removing duplicates, 2,401 records remained. Of these, 1,361 were excluded based on the title and 833 were excluded based on the abstract. The full texts of the remaining 207 papers were examined and inclusion and exclusion criteria applied. After exclusions for the reasons detailed in Figure 1, 37 studies remained for inclusion in the meta-analysis.

Study Characteristics
Characteristics of the 37 included studies are displayed in Table  2. All measured mindfulness pre-and posttreatment in a randomized trial comparing an MBI to an active control. The most commonly used mindfulness measure was the FFMQ (k ϭ 19), Study did not measure or provide data on mindfulness (n = 46), Study did not compare an MBI to an active control condition (n = 39), Relevant data not available (for studies identified in clinical registers) (n = 31), Study was embedded in an included trial (n = 18), Study did not use a well-established MBI (n = 11), MBI had fewer than eight sessions or less than twelve hours of contact time (n = 8), Control included mindfulness elements (n = 5), Study was not an RCT (n = 5), MBI was not delivered in person or in group format (n = 2), Study did not use validated measure of mindfulness (n = 1), Study did not specify control group (n = 1), Study was a trial protocol (n = 1), Study was not reported in English (n = 1), Study full text was not available (n = 1)   The total number of hours in the MBI includes the retreat. If the length of the retreat was not mentioned, this was assumed to be six hours. b Whether the active control condition was matched or unmatched for same or greater amount of face-to-face contact time and number of sessions as the MBI. c Whether formal checks of fidelity to the MBI are reported in the paper.
followed by the MAAS (k ϭ 9), FMI (k ϭ 3), KIMS (k ϭ 3), CAMS-R (k ϭ 2), and the Toronto Mindfulness Scale (TMS; Lau et al., 2006; k ϭ 1). The total number of participants was 4,108 at baseline; 2,056 of these were randomized to MBIs and 2,052 to control conditions. Mean age ranged from 29 to 75 years. In most studies, participants were experiencing a current episode of a diagnosed mental health disorder (k ϭ 10) or a diagnosed physical health condition (k ϭ 11). Other studies included participants who were currently in remission from a diagnosed mental health disorder (k ϭ 6) or community samples (k ϭ 3). Seven studies recruited participants who did not clearly fall under these subgroups (e.g., caregivers scoring above a threshold on a measure of strain, current cigarette smokers). Almost all studies examined MBSR (k ϭ 21) or MBCT (k ϭ 15). One study examined MBRP (Witkiewitz et al., 2014). Most studies (k ϭ 24) used modified protocols of MBCT or MBSR. Modifications included adaptations for the population, providing more than eight sessions, shortening the duration of sessions, and omitting the all-day retreat. The number of weekly sessions for MBIs ranged from eight to 16 and the total number of in-session hours ranged from 12 to 30. Of the 37 included studies, 18 used active control interventions matched for the same or greater amount of face-to-face contact time and number of sessions as the MBI. There was a range of active control conditions, including exercise programs, medication, group health enhancement or education programs, group CBT, and self-help materials.

Meta-Analysis Results for Mindfulness Total Scores
Mean effect sizes (weighted by sample size) for mindfulness total scores are shown in Table 3. A random effects model on the 33 studies that reported mindfulness total scores (see Figure 2) showed a pre-post between-groups difference in favor of the MBI over the active control condition. The effect size was small (Hedges g ϭ 0.19, 95% CI [0.08, 0.30]) and statistically significant (z ϭ 3.25, k ϭ 33, p Ͻ .001). Heterogeneity was significant and moderate-high, Q(32) ϭ 82.86, p Ͻ .001; I 2 ϭ 61.38%. Moderator analyses were conducted to examine potential sources of heterogeneity.

Moderator Analyses
Moderator analyses were conducted only for total mindfulness scores because fewer studies reported facet-level scores. Mean effect sizes for mindfulness total scores for each questionnaire were shown in Table 3; mean effect sizes for the other potential moderators are shown in Table 4. Effect sizes for individual studies reporting total scores, classified by the four potential moderators, are shown in Table 5. Forest plots are shown in the Supplemental Figures S1-S9 in the online supplemental material.
Matching for number and duration of sessions. Pharmacotherapy typically does not involve lengthy sessions and is not designed to match the session time of psychosocial interventions. Therefore, this analysis excluded the five studies for which medication was the control condition, leaving 28 studies. To provide a rigorous test of matched session time as a moderating variable, studies in which session number and duration for the active control group equaled or exceeded the MBI were coded as matched; studies in which the active control had less session time than the MBI were coded as unmatched. Moderator analysis showed a significant difference in pre-post change in mindfulness between studies which were matched or unmatched for number and duration of sessions, Q(1) ϭ 7.83, p ϭ .005. MBIs showed significantly greater pre-post change in mindfulness scores when compared to unmatched control conditions, with a small-medium effect size (Hedges g ϭ 0.34, 95% CI [0.17, 0.51], z ϭ 3.90, k ϭ 12, p Ͻ .001), but not when compared to matched active control conditions (Hedges g ϭ 0.02, 95% CI [Ϫ0.16, 0.16], z ϭ 0.33, k ϭ 16, p ϭ .74). Fidelity checking for the MBI. Moderator analysis showed that the difference in pre-post mindfulness between studies report-ing and not reporting formal fidelity checks was not significant, Q(1) ϭ 0.51, p ϭ .48.

Meta-Analysis Results for Mindfulness Facet Scores (FFMQ/KIMS)
Random effects models were examined for each of the five facets of mindfulness as measured by the KIMS or FFMQ. Mean effect sizes for mindfulness facet scores are shown in Table 3.

Simshauser
Ϫ.03 Hou .07 Morone ؊.28 Only one study used the TMS and this was excluded from the mindfulness measure moderator analysis.

Publication Bias
The Trim and Fill method indicates that two studies would need to fall on the left of the mean effect size to make the funnel plot symmetrical (see Figure 3). In a random-effects model, the new imputed mean effect size would be Hedges g ϭ 0.17, 95% CI [0.05, 0.28]. Rosenthal's Fail-Safe N analysis found that an additional 193 unpublished studies with effect sizes of zero would be needed to reduce the mean effect size for mindfulness total scores to nonsignificance. This figure is greater than 175 (5k ϩ 10, where k ϭ 33), which suggests that effect sizes can be considered robust (Rosenberg, 2005). Kendall's tau was small and nonsignificant (Kendall's ϭ .09, k ϭ 33, p ϭ .439). Taken together, these do not indicate the presence of publication bias.

Relationship Between Study Quality and Effect Size for Mindfulness Total Scores
Total quality scores for each study, based on risk of bias, are shown in Table 2. Scores for each criterion are shown in Supplemental Table S1 and Supplemental Figure S10 (in the online supplementary materials). Supplemental Figure S11 (in the online supplementary materials) displays percentages of studies with low, unclear, and high risk of bias for each criterion. Most studies had a low risk of bias for all criteria apart from the 'selective outcome reporting' criterion, for which most had an unclear risk of bias.
The correlation between study quality scores and pre-post between-groups effect sizes for mindfulness total scores was nonsignificant, r(30) ϭ Ϫ.02, p ϭ .935. This suggests that greater risk of bias, indicated by lower quality scores, is not associated with larger effects.

Discussion
The psychometric properties of mindfulness questionnaires are generally well supported; however, studies showing that selfreported mindfulness sometimes improves in interventions with no explicit mindfulness training have raised a question about their differential sensitivity to change with treatment (Goldberg et al., 2016;Visted et al., 2015).We synthesized 37 studies to examine whether interventions explicitly designed to teach mindfulness lead to greater changes in self-reported mindfulness skills than comparison interventions with no explicit mindfulness training. When all studies were included in the analysis, results were as expected. That is, participants in MBIs showed significantly greater pre-post improvements in mindfulness scores than were seen in active control conditions with no explicit mindfulness elements. The mean effect size was small. The trim-and-fill method and Rosenthal's fail-safe N suggested that publication bias was not a concern. However, the overall finding was moderated by several variables, including which mindfulness questionnaire was used, the type of treatment offered in the control condition, and whether the MBI and control condition were matched for session time. The implications of each of the moderator analyses are discussed in turn.

Which Questionnaire Was Used
The FFMQ and CAMS-R showed significant differential sensitivity to change with treatment but the other measures did not. For the KIMS, mean effect size was larger than for the FFMQ but was not statistically significant, perhaps because the KIMS was used in only three studies. The TMS also showed a medium effect size but was used in only one study. In contrast, mean effect sizes for the MAAS and FMI were near zero. Of the nine effect sizes for the MAAS, one was large whereas eight were close to zero or in the unexpected direction. Of the three effect sizes for the FMI, two were near zero and one was small.
It is unclear why some of the mindfulness questionnaires showed better differential sensitivity than others. Facet-level analyses showed significant effect sizes for observing, nonjudging, and nonreactivity but not for describing or acting with awareness. It is possible that the multifaceted instruments more fully represent the breadth of the mindfulness construct, and therefore are better able to capture skills that change more with mindfulness training than with other interventions. This could explain the larger effect sizes for the FFMQ and KIMS. The CAMS-R, though providing only a total score, also includes considerable breadth of content (presentmoment focus, awareness of thoughts and feelings, nonjudgment, acceptance). In contrast, the MAAS, which had a mean effect size near zero, is more narrowly focused on general attentiveness. The FMI includes content related to awareness, nonjudging, and nonreactivity, but also includes more general items that may change with other interventions, such as impatience, staying calm under stress, considering different perspectives, and general selfacceptance (rather than acceptance of thoughts and feelings). This more general content may explain why the FMI showed similar increases in MBIs and other psychosocial interventions.

Type of Active Control Condition
Mindfulness skills increased significantly more in MBIs than with medication. This was predicted because medication is not expected to teach mindfulness skills. However, pre-post change in mindfulness did not differ significantly between MBIs and CBTbased controls. This finding could be explained in several ways. The MBIs may have failed to teach mindfulness adequately, or the questionnaires may be sensitive to changes in distress, which improves in wide range of interventions. Alternatively, the questionnaires may measure mindfulness-related skills that are taught explicitly in MBIs and cultivated implicitly in CBT. We argue that the latter explanation is the most likely, for several reasons.
First, although the studies provide little information about the adequacy of the mindfulness teaching, it seems unlikely that they failed to teach mindfulness skills. All used MBIs with strong empirical support that are consistent with the defining features of MBIs as described by Crane et al. (2017). Second, medication is expected to improve distress but does not directly teach skills; thus, the significant effect size for the comparison of MBIs to medication (g ϭ .43) suggests that the mindfulness questionnaires measure something that changes with mindfulness training but not with medication. Third, CBT cultivates decentering (Farb et al., 2018;, which is strongly correlated with self-reported mindfulness (Carmody et al., 2009). This suggests that any intervention that increases decentering is likely to lead to increases in self-reported mindfulness skills, even if decentering is taught using nonmindfulness-based methods.
Finally, a randomized trial comparing MBSR and CBT for social anxiety (Goldin et al., 2016, included in this meta-analysis) showed that changes can occur in psychological process that are not explicitly targeted by the treatment. The study found that MBSR and CBT were equally effective in reducing social anxiety and more effective than a waitlist control. Measures of potential mechanisms of action for both interventions were included.
Unexpectedly, both treatments led to significant and similar improvements in most of the potential mechanisms, including mindfulness skills, cognitive distortions, and cognitive reappraisal. That is, CBT led to increased mindfulness despite the absence of explicit mindfulness training; similarly, MBSR led to changes in cognitive reappraisal and cognitive distortions, despite the absence of explicit training in cognitive restructuring. Although this could be interpreted as a lack of differential sensitivity for the FFMQ, the Emotion Regulation Questionnaire (Gross & John, 2003), and the Cognitive Distortions Questionnaire (Morrison et al., 2015), Goldin et al. (2016) concluded that CBT and MBSR share more underlying psychological processes than is commonly recognized and that, in both interventions, some of these processes may change without explicit training.

Matching of Session Time
When the MBIs and the active psychosocial controls were matched for session time, there was no significant difference in pre-post change in mindfulness scores, suggesting a lack of differential sensitivity to change with treatment. This might suggest that the questionnaires are sensitive to changes that occur in a variety of interventions, such as reductions in psychological symptoms. It is also possible that when CBT-based and other psychosocial interventions are matched for session time with the MBI, the cultivation of decentering and other mindfulness-related skills approximates the cultivation of mindfulness in the MBIs, leading to similar increases in self-reported mindfulness. Only additional research can show whether either of these explanations is correct. Studies are needed to clarify the conditions that lead to acquisition of mindfulness skills in evidence-based MBIs and other interventions. Dismantling studies that allow testing of the effects of specific elements of MBIs and other interventions on self-reported mindfulness skills may be informative. Studies could also examine whether revisions to mindfulness questionnaires would increase their specificity to increases in mindfulness skills.

Presence or Absence of a Fidelity Check
The only nonsignificant moderation analysis compared MBIs with and without a fidelity check. We argued earlier that the examination of our central research question is less ambiguous when the MBIs can be expected to teach mindfulness skills effectively. Accordingly, we included only studies using evidencebased protocols that meet the definition of MBI proposed by Crane et al. (2017). Even with this restriction, it is possible that some studies implemented the MBI more skillfully than others. Because many studies do not include or report the results of fidelity checks, we had no direct information about how competently the MBIs were implemented and relied on presence or absence of a fidelity check as a proxy for adherence to the protocol. The nonsignificant moderation analysis may mean that presence of a fidelity check does not reflect competence in intervention delivery, or that competence in intervention delivery was not related to the cultivation of mindfulness skills, perhaps because of the restriction of range in therapists' competence.

Limitations
The included studies yielded a wide range of effect sizes and the moderating variables seem to account for only some of this heterogeneity. For example, when considering only studies using the FFMQ, effect sizes ranged from Ϫ.26 to .74 (see Table 5). For FFMQ studies that were matched for session time and included a fidelity check, effect sizes ranged from Ϫ.26 to .62. Variables other than the moderators we tested may be important in accounting for some of this heterogeneity. Additional work is necessary to identify factors related to differences between MBIs and other treatments in the cultivation of mindfulness skills.
The inclusion of only MBSR, MBCT, and evidence-based variants that meet the definition of MBI proposed by Crane et al. (2017) may be a limitation, in that it omits single session mindfulness trainings, laboratory-based inductions, and other training with little empirical support. This decision was made to circumvent the difficulty in interpreting the findings when the MBI and the active control show similar increases in self-reported mindfulness. By including only well-established MBIs, we made it unlikely that an apparent lack of differential sensitivity of a mindfulness questionnaire could be attributed to poor teaching of mindfulness in the MBI. This leaves two other explanations, as noted earlier. First, the questionnaire may be sensitive to changes in a more general construct, such as distress, that improves with a variety of interventions. Second, the active control conditions may implicitly teach mindfulness-related skills. Our findings suggest that the second explanation is more likely, at least for some mindfulness questionnaires, because effect sizes were larger when comparing MBIs to medication controls, which reduce distress but are not expected to teach mindfulness skills, than to CBT or other psychosocial controls, which may implicitly teach skills related to mindfulness.
The number of studies available may be a limitation for the moderation analyses, which must be interpreted cautiously. Although three of the four moderation analyses were significant, they should be replicated when the number of available studies has grown. Moreover, we conducted only univariate moderation analyses despite the potential importance of combined effects of the proposed moderators. For example, it could be argued that the most stringent test of the differential sensitivity of mindfulness questionnaires would examine only studies that were matched for session time, included a fidelity check, and compared an MBI to a non-CBT and nonmedication control condition. Unfortunately, as shown in Table 5, there are only five such studies (three with the FFMQ, two with the MAAS). If we expand to include all types of comparison groups (and collapse across them), there are only nine studies (six with the FFMQ, three with the MAAS). Multivariate moderation analyses with such small cell sizes are likely to be misleading (Lipsey, 2003).

Conclusions
Although findings provide partial support for the differential sensitivity of mindfulness questionnaires to change with treatment, this effect was not found when the MBI and control were matched for session time. Potential explanations for this were suggested, but further research is needed to clarify whether revisions of mindfulness questionnaires would increase their specificity to the changes that occur with mindfulness training, or whether both MBIs and other psychosocial interventions cultivate mindfulness skills. The findings suggest that for continued work in this area, multifaceted mindfulness measures, particularly the FFMQ, may be helpful in discriminating changes in mindfulness skills attributable to explicit mindfulness training from changes attributable to implicit cultivation of related skills or other factors.