Development and validation of a convenient formula evaluating the value and applicability of medical literature in clinical practice

Objective: Evidence-based medicine offers explicit methods to evaluate the evidence grades of literature. However, evidence grades do not meet all the practical needs of physicians. This study is aimed to develop a convenient method for evaluating the clinical value of medical literature from the perspective of the clinician. Methods: A literature applicability equation was formulated through the Delphi method and the analytic hierarchy process. A consistency check was used to ascertain the efficacy of the formula. Three senior clinicians assessed 30 articles based on their clinical experiences and subjective opinions, while one independent researcher performed independent assessments of the applicability of 30 articles using the evaluation formula. Results: The literature applicability equation was Y = 3.93X1 + 11.78X2 + 14.83X3 + 44.53X4 + 24.93X5, where Y = literature applicability, X1 = years since publication, X2 = target question covered or not, X3 = sample size, X4 = study type, and X5 = journal quality. Consistency index (CI) values for the first-level indicator (“literature applicability”) and the second-level indicators (“pertinence and timeliness” and “quality of results”) were 0.0325, 0.0012, and 0.0001, respectively. The weights used to calculate the matrix indicators had satisfactory accordance (random coincidence coefficient = 0.056). A consistency check for the efficacy of the formula revealed kappa = 0.749 and P < .001. Conclusion : The developed and validated literature applicability evaluation formula may be a useful and convenient tool for identifying clinically valuable medical literature.


INTRODUCTION
When making treatment decisions, clinicians consider not only their own experiences but also relevant studies, especially when they encounter new clinical problems. In recent decades, clinical research methods and trial registration systems have been greatly improved, 1,2 and evidence-based medicine (EBM) has been used to classify distinct evidence levels. 3,4 The Grading of Recommendations Assessment, Development, and Evaluation (GRADE) approach was recently developed to clarify the evidence grades of outcomes in a systematic review. 5 However, although they offer explicit and reasonable methods to confirm the evidence strength of articles, EBM and the GRADE approach 1.
Ying Zhou, 3. can be difficult and inconvenient for clinicians to apply in general practice. Moreover, readers are confronted with thousands of results whenever they query a literature database. For instance, Bastian et al. reported that almost 75 clinical trials and 11 systematic reviews are added to PubMed each day. 6 Every study type has its own drawbacks that must be considered. 7,8 An absolute conclusion can rarely be made, even for randomized controlled trials, due to the use of a poorly representative sample. Moreover, in practice, clinicians have unique and varying perspectives when assessing the value of a study. For instance, a clinician might weight studies from leading scientists more heavily, might disregard studies with high evidence grades if they are not consistent with his or her individual judgment criteria.
Sackett, one of the main initiators of EBM, stated that individual clinical expertise should be integrated with the best-available clinical evidence. 9 However, no research to date has examined how to integrate clinicians' experiences with the literature evidence grade. Thus, to understand the practical value of a study, it is important to consider both the evidence level of the literature and clinicians' expertise-based internal criteria. The present study was designed to explore a concise and convenient method for assessing the applicability of literature for clinicians.

Determining evaluation indicators: Delphi method:
Evaluation indicators were determined by the Delphi method. 10 The Delphi method is an expert panel-based forecasting method that is systematic and interactive. Multi-round questionnaires were sent to experts. After each round, the responses and reasons of each expert were summarized anonymously. In the next round, each expert was sent the summary of all experts' answers and was given the opportunity to adjust his or her answers specifically. Finally, the 'correct' result was sought through consensus. We invited 12 physicians to participate in the Delphi method process. All of the participants in the Delphi method process were familiar with the fields of clinical research and epidemiology. The research group constructed the Delphi method outline and developed the questionnaire. All questionnaires were delivered by e-mail. Participants were asked to reply within 2 weeks. After every round, the research group complied the results. Final indicators were determined by at least 70% of the experts in the last round.
Analytic hierarchy process (AHP) 11,12 : After the indicators in different levels were confirmed by the Delphi method, they were randomly listed on a form that was delivered to as many clinicians as possible, including physicians, surgeons, and anesthesiologists. Clinicians were asked to list the indicators in descending order, according to the priority that they attributed to that indicator. The results of the survey were used to calculate the weights for the indicators by AHP. After the weights were attributed, the literature evaluation formula was obtained. Assessing the efficacy of the literature evaluation formula: Three senior doctors were invited to supply one specific clinical question each and a certain number of articles that addressed their specific question. The doctors were asked to recommend a grade for each of the articles, with at least three papers for each grade. The recommendation grades were made on the basis of the clinical experiences and subjective opinions of the doctors. The grades were classified as "positive recommendation", "general recommendation", and "negative recommendation". A final total of three questions and 30 articles were obtained.
One independent researcher calculated the literature score for each of the articles with the evaluation formula. The scores of the articles were sorted in a descending manner and divided into three groups: the portion of articles with the highest one-third of scores was defined as "positive recommendation," the middle third as "general recommendation," and the lowest third as "negative recommendation". If the number of articles divided by 3 resulted in a remainder of 1, then one article was added to the "negative recommendation" grade; if the remainder was 2, then one article each was added to the "general recommendation" and "negative recommendation" grades.
Finally, the results with the two evaluation measures graded by the senior doctors or graded by the formula were tested by the consistency check to assess the efficacy of the evaluation formula.

Method of blinding:
The study design involved several levels of blinding. The experts involved in the Delphi method process did not participate in the questionnaire survey. The researcher who calculated the literature score according to the evaluation formula did not know the recommendation level made by the senior doctors, and the senior doctors who provided the articles for evaluation did not know the literature score of the articles made by the formula. Participants remained blinded until after the score had been calculated.

Statistical analysis:
The Delphi method and the analytical hierarchy process were used to obtain the evaluation indicators and their weights, respectively. The consistency index (CI) was calculated to test whether logical errors existed among the indicators, with CI < 0.1 indicating logical error. 11 The accordance of the matrix of the weights of the indicators was tested by the random coincidence coefficient (CR), with CR < 0.01 indicating satisfactory accordance. The efficacy of the literature evaluation formula was assessed by consistency check; consistency was unaccepted with kappa = 0, was considered not well-satisfied with kappa < 0.4, and was considered satisfied with kappa ≥ 0.75. 13 Differences with P < .05 were considered statistically significant.

Literature evaluation indicators:
In the first round of the Delphi method, all of the experts confirmed the literature applicability as the first-level indictor. Three second-level indicators were identified by experts: "pertinence and timeliness," "quality of results," and "credibility of study". The thirdlevel indicators included "publication time," "target question was covered or not," "race or region of the participants," "sample size," "study type", 14 "journal quality," "study performed by a professional academic organization," and "h-index of corresponding author". 15 After three rounds of the Delphi method, the final indicators were determined. Table-I shows the final first-, second-, and third-level indicators that were included in this study. Indicator weights: The combination weights were calculated by the analytic hierarchy process (Table-II). The values of the indicators were assigned according to their relative clinical meaning. The indicators were individually calculated by using Arabic numbers, and the combination weights were multiplied by 100 for convenience. As a result of this process, the following formula for the literature applicability (Y) was obtained: Y = 3.93X 1 + 11.78X 2 + 14.83X 3 + 44.53X 4 + 24.93X 5 where X 1 = years since publication, X 2 = target question covered or not, X 3 = sample size, X 4 = study type, and X 5 = journal quality. The CI values indicated that there were no logical errors in the determination of any of the indicator grades, and the CR value suggested that the matrix of indicator weights possessed satisfactory accordance (Table-III).

Consistency check for the efficacy of the literature evaluation formula:
The results are shown in Table IV. Some inconsistencies in the grading were noted. One article that was graded as "positive recommendation" by a senior doctor was given Evaluation method for Medical Literature a calculated score of "general recommendation" by the formula. Two articles that were graded as "general recommendation" by doctors were given calculated scores of "positive recommendation" by the formula. Two articles that were rated as "negative recommendation" by doctors were given calculated scores of "general recommendation" by the formula. Overall, for the inconsistent results, the formula tended to elevate the recommendation grade of articles compared to the grades given by the doctors.

DISCUSSION
In this study, we developed a method for evaluating the clinical value of literature from the perspective of the clinician. We defined the gold standard of the "real value" of articles according to the opinions of senior clinicians. In addition to improving the practicability of the results, this gold standard criterion was concise and convenient. We used the Delphi method to obtain evaluation variables and determined the weights for these variables through the analytic hierarchy process. These procedures ensured the objective and scientific nature of the literature evaluation formula. Finally, to test the validity of the method, a consistency check was used to correlate the formula with the opinions of the senior doctors (i.e., the gold standard). The results showed the satisfactory validity of the evaluation formula.    The clinicians prioritized "study type," "journal quality," "sample size," "target question covered or not," and "years since publication," respectively, according to their weights. The applicability of a paper depended on the confidence of the clinician regarding the objectivity and accuracy of its results, as evidenced by the high priority attributed to the "study type". The confidence in the results increased as the evidence strength increased from in vitro research to systemic reviews. 16 These findings are consistent with the main idea of EBM.
Studies that are published in higher-impact journals typically require more professional and stricter peer review mechanisms for contributions. Although not all journals with high Impact Factors publish only high-quality articles, 17 manuscripts in high-level journals are more convincing to doctors. Journals in different academic fields might have different ranges of Impact Factors. Nevertheless, for one specific literature retrieval, the search field is relatively confined. Thus, it was reasonable for "journal quality" to be chosen as an important indicator.
"Sample size" was the third-most important indicator for applicability. A study with a larger sample size might have more representative and reliable results than a smaller trial. Use of a small sample size can result in inconclusive results. 18 For specific study types, an adequate size can be calculated by statistical methods. [18][19][20] However, an appropriate sample size is only the right population. Use of a larger sample size than is necessary may result in more reliable conclusions, but more potentially confounding effects might occur during the data-collection process. These errors could, however, be reduced by applying a strict study design. Overall, it would be wise to add "appropriate sample size" as an important parameter influencing the literature applicability. And this consideration might be worthwhile for other literature evaluation systems, such as GRADE.
The factor "target question covered or not" was ranked in fourth place. This finding was somewhat inconsistent with our initial hypothesis. We had hypothesized that this indicator might be the most important, because nonrelated articles seemed useless in our initial hypothesis. This result might reflect the complexity of the clinical questions; it may be that not many eligible studies exactly covered the target questions. Clinicians have to retrieve literature that is specific for their purposes. Even among eligible studies, clinicians might hesitate to adopt the information because of discrepancies, for example, in the techniques or basic characteristics of the patients. Indirect evidence might be sufficient for clinicians to support their treatment strategies, as they prefer to obtain useful knowledge from the indirect original studies.
Finally, "years since publication" was listed as an important indicator in the formula. Clinicians were very cautious about adopting the conclusions of older articles, due to the ongoing development of techniques and therapy principles.
A consistency check was applied to test the validity of the applicability formula. The applicability grades calculated by the formula showed satisfactory consistency with the recommendation levels made by the senior doctors (defined as the gold standard in this study). After unblinding, we further investigated the reasons for differences between the recommendations by the formula and the doctors. Whereas the formula judged the quality of an article on the basis of its external characteristics, clinicians synthesized the overall information of a study, combined with their own knowledge, and then made a judgment. Thus, the judgment made by clinicians was drawn from internal information.
For example, for the "study type," the formula gave a randomized controlled trial (RCT) or a systemic review the highest score. In contrast, clinicians might be skeptical towards the results of an RCT without detailed methods, especially if there was no evidence of the methods of randomization and allocation concealment. Clinicians were also cautious of adopting the conclusions from systemic reviews that lacked expected negative results 7 and would downgrade such articles. These differences could explain why, compared to clinicians' grades, the formula tended to elevate the literature grades.
Overall, the process of seeking evidence for optimizing clinical practice is full of uncertainties. 21 This method is tightly related to clinical practice and not merely dependent on the evidence grade. The indicators in the formula are easy to obtain, and the results may be expressed in a variety of forms. For example, the formula may be displayed as an equation, or a radar chart may be made into an 'Excel table'. By setting the formula 'Y= 3.93X 1 +11.78X 2 +14.83X 3 +44.53X 4 +24.93X 5 ' into an Excel table and substituting for each value of X i , users can easily obtain the score of any article in the literature. In our department, the information secretary regularly uses this formula to filter literature. The equation is extremely convenient and easy to use. www.pjms.com.pk Hsiao-Pei Mok et al.
Its use does not require a researcher to read the entire article, but only enough to determine the five key factors.
The present study offers a valid, convenient, and understandable method for evaluating literature according to its clinical relevance. Nevertheless, the sample size of this study was small, and the results require further verification.