Hospital versus individual surgeon’s performance in laparoscopic hysterectomy

Purpose To compare hospital versus individual surgeon’s perioperative outcomes for laparoscopic hysterectomy (LH), and to assess the relationship between surgeon experience and perioperative outcomes. Methods A retrospective analysis of all prospective collected LHs performed from 2003 to 2010 at one medical center was performed. Perioperative outcomes (operative time, blood loss, complication rate) were assessed on both a hospital level and surgeon level using Cumulative Observed minus Expected performance graphs. Results A total of 1618 LHs were performed, 16 % total laparoscopic hysterectomies and 84 % laparoscopic supracervical hysterectomies. Overall outcomes included mean (SD±) blood loss 108.9 ± 69.2 mL, mean operative time 95.4 ± 39.7 min and a complication occurred in 76 (4.7 %) of cases. Suboptimal perioperative outcomes of an individual surgeon were not always detected on a hospital level. However, collective suboptimal outcomes were faster detected on a hospital level compared to individual surgeon’s level. Evidence of a learning curve is seen; for the first 100 procedures, a decrease in operative time is observed as individual surgeon experience increases. Similarly, the risk of conversion decreases up to the first 50 procedures. Conclusion An individual outlier (i.e., surgeon with consistently suboptimal performance) will not always be detected when monitoring outcome measures only on a hospital level. However, monitoring outcome measures on a hospital level will detect suboptimal performance earlier compared to monitoring only on an individual surgeon’s level. To detect performance outliers timely, insight into an individual surgeon’s outcome and skills is recommended. Furthermore, an experienced surgeon is no guarantee for acceptable surgical outcomes.


Introduction
In an effort to improve patient safety in gynecologic surgery, there has been an increasing focus on measures of perioperative outcomes. As the field of minimally invasive surgery involves new and evolving technology, these procedures may be particularly vulnerable to adverse incidents [1]. Individual surgeon outcomes as well as hospital-wide complication rates have been reported; possible uses for this information vary from quality improvement projects, credentialing, ranking list and reimbursement profiles [2]. One of the main problems of this widely released data is S.Y Brucker and F.W. Jansen shared last authors.
& Frank Willem Jansen f.w.jansen@lumc.nl the lack of an accurate case-mix correction (patient characteristics that could influence outcomes). As referral hospitals perform more complex procedures and treat more challenging patients, this can potentially result in less optimal surgical outcomes [3]. This case-mix correction may be appropriate when analyzing data on a surgeon level as well, and has been recommended for parameters including uterine weight and BMI regarding laparoscopic hysterectomy (LH) [3]. In addition, many of the quality assessment registries focus only solely on hospital outcome measures, merging all individual surgeon outcomes. This can result in the lack of detection of lesser-skilled surgeons who may exhibit suboptimal performance. Furthermore, the experience of a surgeon is increasingly being used as a component in the assessment of surgical quality [4][5][6][7][8], and it is important to determine the value of an individual surgical skills factor [9]. The aim of this study is to compare hospital outcome measures versus individual surgeon outcomes for LH. Further, we aim to assess the relationship between surgeon experience and perioperative outcomes once corrected for case-mix characteristics.

Materials and methods
In this retrospective study, all consecutive cases of laparoscopic hysterectomy (laparoscopic supracervical hysterectomy (LSH) and total laparoscopic hysterectomy (TLH) performed for benign uterine disease between January 2003 to December 2010 at the Department of Obstetrics and Gynecology of the University of Tübingen, Germany were collected. Exclusion criteria included indication of malignancy, deep infiltrating endometriosis or urogenital prolapse in order to limit confounding factors which may be attributed to more complex operations.
The Ethics Committee of the Medical Faculty of the University of Tübingen approved this study.

Data analysis
Statistical analyses were performed using R statistical software, version 20 for Windows and SPSS version 22 (IBM Corp., Armonk, NY). In addition to descriptive statistics, we fitted regression models for the primary outcomes measures. For the numerical outcomes of blood loss and operative time, a gamma regression model with the logarithmic link function was used. For the categorical outcome of perioperative complications (defined as none, level 1 or level 2) a multinomial regression model with cumulative logistic link function was used. Adjustment factors were adapted from previous research [9]; all outcomes were adjusted for uterine weight. In addition, blood loss was adjusted for BMI and complication was adjusted for the number of previous abdominal surgeries. We computed a numerical complication score by rating a level 1 complication at 1 point and a level 2 at 2 points.
Upon fitting the regression models, we obtained expected outcomes (given the relevant patient characteristics) for each surgery. From these, we constructed individual performance graphs [cumulative Observed minus Expected (O -E)] for every surgeon per surgical outcome (operative time, blood loss and complication score). These individual O -E graphs provided an intuitive representation of the performance in risk-adjusted outcomes over time. Furthermore, we combined the results of all surgeons into a single O -E graph to show the performance at the hospital level. It should be noted, that since we determined the expected performance on the same data, the perceived performance will be exactly according to the benchmark. However, the combined graph shows the progression over time.
Furthermore, we studied the learning effect by regressing the three outcomes on each surgeon's experience (i.e., number of previous LH performed) in addition to the above-mentioned patient characteristics. We modelled the effect of experience using penalized regression splines as implemented in the R package mgcv [12].

Results
A total of 1618 LHs were performed by 12 gynecologists over the study period. Overall mean (±SD, range) blood loss was 108.9 (±69, 709)mL, mean operative time 95.4 (±39.7, 390) minutes and there was a 4.7 % complication rate. The surgical experience of the 12 gynecologists ranged between 18 and 202 procedures at the end of the study period. Table 1  For blood loss (Fig. 1a), the outcome measures were diverse and the graph line alternately moved downward and upward. The downward part of the graph line indicated a cumulative better outcome than expected; the upward part of the graph line indicated a cumulative less optimal outcome than expected. For operative time (Fig. 2a), less optimal outcomes were observed for the first 2 years, indicating a learning curve. After 2 years a cumulative operative time of 4900 min more than expected was observed. Thereafter, the graph line continued to move downward, indicating that cumulative better outcomes for this hospital were observed than expected.
For complications (i.e., level 1 and level 2 complications) (Fig. 3a), in the first year there was an upward trend in the graph, which indicated less optimal outcomes, with cumulative 3.9 complications more than expected. Thereafter, the graph line moved downward and the complication outcome measure for the hospital continued below zero, indicating that the complication score for the hospital was better than expected.
Comparing individual versus hospital outcome measures, a more rapid detection of suboptimal outcomes was detected for all three outcomes on hospital level (Figs. 1, 2,  3).
Individual outcome measures (Figs. 1b, 2b, 3b) For blood loss (Fig. 1b), a considerable difference between all individual outcome measures was observed. Surgeon 8 can be considered an outlier, since the graph of this surgeon continued to move upward (ended with cumulative 915 mL more blood loss than expected). The same applied for surgeon 4 (ended with cumulative 873 mL more blood loss than expected). The best individual outcome measure for blood loss was observed for surgeon 5 (cumulative 1537 mL blood loss less than expected).
With regards to operative time (Fig. 2b), an upward trend in the graphs of almost all individual surgeons was observed for the first 2 years, indicated less optimal   Observed minus Expected (O -E) graphs for outcome complication score. Explanation of the graphs: when the line drops, the surgeon/hospital performed better than expected. When the line rises, the surgeon/hospital performed less optimal than expected performance. Thereafter, most of the surgeons performed better than expected, indicated by a descending graph line. However, surgeon 8 was observed as an outlier, as the graph of this surgeon continued to move upward (ended with cumulative 2267 min more operative time than expected). Surgeon 1 and surgeon 5 can be considered as better skilled surgeon of this hospital, and these outcomes compensated the suboptimal outcome of surgeon 8 (resulting in good outcome measures on a hospital level; i.e., descending graph, Fig. 2a).
For complication score (Fig. 3b), three inferior outliers were observed (surgeon 4, surgeon 6 and surgeon 7) with a score of, respectively, 2.5, 3.9 and 3.92 more complications than expected. The graph line of these surgeons continued to move upward.
Surgeon's experience Figures 4,5,6, and 7 showed the log odds graphs of surgeon's experience per surgical outcome, corrected for casemix characteristics. For blood loss, an association was observed between increasing surgical experience and decreased blood loss; however, this should be interpreted with caution given the large standard deviation observed (Fig. 4).
For operative time, up to 100 procedures a clear decrease was observed as experience increased (Fig. 5). A higher complication rate was found when experience increased; however, this was not statistically significant (Fig. 6). Up to 50 procedures a clear decrease was observed for conversion rate, with a plateau thereafter (Fig. 7).

Discussion
Surgeons and hospitals may be expected to provide evidence of the quality of care which they deliver by documenting outcome measures [13]. To date, most of the publically reported quality indicators are based on hospitallevel outcome measures, such as complication and reoperation rates. As demonstrated in our results, monitoring outcome measures exclusively on the hospital level will not always detect individual surgeon with extreme outcomes. We have demonstrated that suboptimal outcomes of a lesser-skilled surgeon will be compensated by the superior skills of other surgeons in the same hospital, resulting in a normal or good quality outcome measure for the hospital (Figs. 2, 3, e.g., surgeon 8 is compensated by surgeon 1 and surgeon 5). Therefore, to evaluate quality of care accurately, outcome measures should also be assessed on individual surgeon's level. As we observed, good hospital outcome measures do not necessarily reflect good surgeon outcome measures and vice versa. However, when all surgeons of one hospital perform less optimal, this will be detected quicker on a hospital level (Fig. 2). This can be considered as strength of monitoring outcome measures on a hospital level instead of individual.
Surgical experience is often discussed as a proxy for quality assessment measurement [4][5][6][7][8]. Our data also showed a clear association between increased surgical experience and both a decreased operative time (after 100 procedures) and conversion rate (after 50 procedures). Compared to previous literature which has suggested a learning curve of 30 cases for LH, this demonstrates a slower rate of improvement [5,14]. One possible explanation for the longer learning curve found in this study is that a more experienced surgeon may take on more complex procedures, which can consequently cause more complications and less optimal outcomes [4]. The outcomes in this study were corrected for case-mix characteristics such as uterine weight, BMI and previous abdominal surgery, although there may be unknown variables for which no correction was applied such as severe endometriosis, age and other comorbidities [3]. Hence, our data suggest that experience alone is not sufficient to assure the quality of surgical care; individual skills may provide more information about the actual quality of individual surgical performance.
Strengths of this study include the correction for casemix characteristics in all performed analyses, which makes the comparison of surgical outcomes more precise. Additionally, we were able to longitudinally follow all 12 surgeons and record all their consecutive procedures from the beginning of their (laparoscopic) career. A potential limitation of our study was the necessity to calculate blood loss using the value of Hemoglobin drop, as opposed to surgeons estimated blood loss or a different objective marker. Furthermore, it is difficult to confirm external validity of the complication rates as our chosen definition of complications differs from the more frequently reported Clavien Dindo scale. Other limitations inherent to the study of quality and performance include the issues of rare outcomes and small case numbers. For example, if the incidence of a particular adverse outcome is relatively low, one can not presume that the absence of a complication in a small series of patients implies optimal care [15]. This phenomenon occurred in our results; two surgeons had a complication rate of 0 % (surgeon 10 and 12), which was based on only a few procedures (18 and 21 procedures, respectively). Additionally, if we look closer to the surgeon with the highest mean operative time (surgeon 10), this was based on 18 procedures and the high mean was only due to one single procedure with an operative time of 284 min. Therefore, small sample sizes should always be taken into account when measuring surgical quality [15]. Small sample size is in general a problem in (advanced) gynecologic surgery [16].Therefore, surgical outcomes with a low incidence should be measured on both hospital level and individual level in an effort to detect consistently suboptimal performance timely.
An important subject for future research is the definition of a performance outlier. Different methods are defined to determine an outlier [17]. In our study we choose to define the outliers as the best and worst performers, compared to their own benchmark. However, this does not necessarily mean that these surgeons are also superior or inferior skilled compared to the national or worldwide benchmark. Therefore, before drawing any conclusion of quality assessment outcomes, benchmark and outlier definition should be defined first, and we urge that international definitions should be adopted. In addition, it is also important to define clinically relevant quality outcomes since, for example, blood loss of 50-100 mL more or less is not always clinically relevant for the patient, and the same applies for operative time. However, recent studies have shown significant associations between increased operative time and complication rates or reoperations [18].
Although performance ratings may be useful, there is potential for falsely low or high ratings both on the surgeon and hospital level. For this reason, reliable case-mix adjustment is of major importance to benchmark surgical outcomes correctly. Our study showed that measurement of quality on a hospital level would detect suboptimal performances quicker and in a more consistent fashion. However, it is still possible to misidentify an individual surgeon who is either a high or low performer. Further insight into the individual surgeon's outcome measures and skills is required to detect suboptimal performances timely. Furthermore, experience alone is not a sufficient measurement assessment to assure surgical quality and a very experienced surgeon is unfortunately no guarantee for acceptable surgical outcomes.

Compliance with ethical standards
Funding This study was not funded.
Conflict of interest All authors declare that they have no conflict of interest.
Ethical approval All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed consent Because the requested patient data were completely anonymous, not identifiable and retrospective collected between 2003-2010, this study was exempt for informed consent from all participants.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://crea tivecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.