Evaluation of Hip Preservation–related Patient Education Materials From Leading Orthopaedic Academic Centers in the United States and Description of a Novel Video Assessment Tool

Introduction: The readability, reliability, and quality of online hip preservation–related patient education materials from the top 20 orthopaedic academic centers in the United States were evaluated. Methods: The patient educational materials were evaluated with the following assessment tools: Flesch-Kincaid (FK) readability test, Flesch Reading Ease formula, LIDA instrument, and DISCERN tool. Videos were assessed using the Patient Educational Video Assessment Tool (PEVAT), an author-developed scoring system. Results: A total of 121 educational items were reviewed. Median (interquartile range) or mean ± SD of the FK level, Flesch Reading Ease, LIDA, and DISCERN scores were 11.00 (3.00), 47.32 ± 12.14, 41.00 (6.00), and 64.00 (7.00), respectively. Higher ranking was correlated with higher FK (ρ = −0.21, P value = 0.034), higher DISCERN score (ρ = −0.39, P value < 0.005), and a lower PEVAT score (r = 0.61, P value = 0.034). The PEVAT score found that 83% of videos were classified as high quality. Discussion: An analysis of the hip preservation patient education text articles found low readability. Overall, high ranking was associated with poorer readability, higher quality text content, and lower quality video content. Video content was found to be predominantly of high quality. Improving the educational accessibility and effect of hip preservation–related topics may result in improved treatment outcomes.

D eveloping technologies have revolutionized communication between health professionals and patients, completely changing patient education. With the exponential growth in web-based, health-related information, patients can access information readily; however, concerns exist regarding the relevance, readability, and accuracy of information. [1][2][3] Health literacy is a predictor of the overall patient health status, and interestingly, low health literacy has been shown to influence health outcomes toward higher hospitalization rates and poorer outcomes. [4][5][6][7][8] Studies have shown that orthopaedic patients have a limited comprehension of both their pathologic condition and potential treatment interventions. [9][10][11] Therefore, studies have recently directed focus toward evaluation and improvement of orthopaedic patient education materials. 12,13 Academic orthopaedic centers often provide reliable sources of information for patients on their websites, making trusted information always available to patients. Studies have evaluated patient education resources from leading orthopaedic organizations 3,[13][14][15][16] and orthopaedic academic centers 17 in isolated orthopaedic specialties and conditions; however, patient resources related to the bourgeoning specialty of hip preservation have yet to be extensively assessed. Because patient education materials have been shown to influence patient decisions and compliance, 18,19 review of current materials is key to minimizing irrelevant or outdated information that may ultimately affect patient outcomes. Our goal was to evaluate the readability, reliability, and quality of hip preservation-related patient education materials provided by leading orthopaedic academic centers and to propose a novel method of video content assessment.

Methods
In December 2018, we searched and reviewed the hip preservation-related patient education materials from the top 20 orthopaedic academic centers, according to the 2017 to 2018 US News and World Report orthopaedic specialty rankings. 20 The rationale for studying the top centers was that it would provide the best scenario sampling of high quality, accurate, and complete patient education materials based on the likely resources, infrastructure, and specialization of these centers.
These centers' websites were searched independently by two physicians for hip preservationrelated patient education materials, including text articles and videos, and were noted for review. The patient education materials on the websites of the centers were found via two methods. First, some orthopaedic centers had dedicated "hip preservation" website subsections, and all materials were included and reviewed from these. Second, for completion, all patient education materials within the orthopaedic departments' websites were reviewed for "hip preservation" content, regardless of distinct "hip preservation" webpages. Hip preservation education materials included were organized into the following 12 categories: general information, impingement, arthroscopy, labral tear, surgical dislocation, hip osteotomy (including hip dysplasia), rehabilitation, snapping hip, trochanteric bursitis, sports hernia, groin pain, and osteonecrosis. Only the content directed toward patient education was included, with articles, videos, or web links directed toward healthcare professionals excluded from the analyses.

Readability-Flesch-Kincaid and Flesch Reading Ease Assessments
Readability was assessed by the Flesch-Kincaid (FK) grade and Flesch Reading Ease (FRE) formula, which have been used extensively for the determination of objective, numerical reading level. 12,13,21 Each educational text content was copied into a Microsoft Office Word 2016 (Microsoft Corporation, Redmond, WA) document. The text was then edited to remove HTML tags as well as irrelevant text and punctuation not related to the subject. The articles were finally checked for spelling and grammar errors within Microsoft Word. This technique was originally presented by Badarudeen and Sabharwal 21 and has consistently been used to assess the literature readability. The FK readability grade level and FRE formula calculations were done for each article using the Microsoft Word 2016 program, as previously described ( Table  1). The FK grade level reports the level of academic education, via grade school level, necessary for an individual to read and comprehend the content of article, with increasing FK grade levels equating to increasing comprehension difficultly. The FRE formula generates a result from 0 to 100, with higher numbers equating to increasing ease of reading.

Usability-LIDA Score
The LIDA instrument (Minervation Ltd, Oxford, UK) was created to evaluate the usability, accessibility, and reliability of health-related websites, with each of the three analyses scored independently for evaluation customization. 22 Our study implemented the usability feature of the LIDA instrument to assess the following characteristics: clarity, consistency, functionality, and engagement. The articles were graded 0 to 3 (0 = never, 1 = sometimes, 2 = mostly, and 3 = always) for 18 independent questions, giving a maximum score of 54. A higher score indicated a clearer design, promoting accessibility and encouraging exploration of the website. 13,23 Quality-DISCERN Assessment The DISCERN instrument was created to evaluate the reliability and quality of consumer health-related information on treatment choices. 24 Articles were graded 0 to 5 on a three-point Likerttype scale (0 = no, 3 = partial, and 5 = yes) for 15 independent questions, giving a maximum score of 75. The final DISCERN score was then reported as a percentage of the maximum score possible. A higher score indicated a higher quality publication, conducive to concise, relevant aims and descriptive, thorough content. 25,26 Patient Educational Video Assessment Tool-Novel Video Assessment-Accessibility, Reliability, and Quality Because there is a lack of validated video assessment tools, [27][28][29] we created a novel health-related video assessment tool called the Patient Educational Video Assessment Test or Patient Educational Video Assessment Tool (PEVAT) to evaluate accessibility, reliability, and quality ( Table 2). The tool's accessibility subscale contains 10 binary questions, each graded 0 (no) or 1 (yes) with a maximum score of 10 points. The tool's reliability subscale contains four binary questions, each graded 0 (no) or 1 (yes) with a maximum score of four points. The tool's quality subscale contains eight ternary questions, each graded 0 (no), 1 (partial), or 2 (yes) with a maximum score of 16. The three subscales are then added together to obtain a maximum video assessment score of 30.

Statistical Analysis
Statistical analysis was done using R Statistical software version 3.5.2 (Foundation for Statistical Computing, Vienna, Austria). Each patient education material was classified as a either text or video modality. Then was classified as one of the following 12 topic areas within hip arthroscopy: general, impingement, arthroscopy, labral tear, surgical dislocation, hip osteotomy, rehabilitation, snapping hip, bursitis, sports hernia, groin pain, and osteonecrosis. Finally, each patient's education material was scored on the aforementioned assessment tools, as applicable: FK, FRE, LIDA, DISCERN, and PEVAT.
Study characteristics were reported as descriptive statistics, as number and percentage. Parametric statistics are reported as mean average, SD, and range, whereas nonparametric statistics are reported as median average, interquartile range (IQR), and range. Continuous variables were represented as mean average, SD, and range. Inter-rater reliability was accessed using the intraclass correlation coefficient to determine the degree of agreement between the two physician raters. Academic center ranking was the independent or predictor variable and was treated as a continuous variable. Although quantity of educational items (text articles and videos), quantity of topics covered, FK, FRE, LIDA, DIS-CERN, and PEVAT were the dependent or outcome variables, they were treated as continuous variables.
Normality of the continuous outcome variables were tested using the Shapiro-Wilk test to determine whether parametric or nonparametric analysis was necessary. The quantity of topics covered, FRE, and PEVAT were found to be parametric and were analyzed using the Pearson regression analysis, whereas the quantity of education items, FK, LIDA, and DISCERN were found to be nonparametric and were analyzed using the Spearman regres-sion analysis. This allowed characterization of the correlation between academic center rank (independent/ predictor variable) and each of the five assessment tools (dependent/outcome variable). In addition, the number of topics covered was also analyzed for correlation with rank and assessment tool scores. Finally, to determine whether subgroup differences existed among the top 20 centers, the centers will be subdivided into four groups, that is, ranks 1 to 5, 6 to 10, 11 to 15, and 16 to 20, and analyzed with the analysis of variance test if parametric or Kruskal-Wallis test if nonparametric; if found to be statistically significant, it was followed with a post hoc Tukey HSD test. P values , 0.05 were considered statistically significant.

Study Characteristics
A total of 121 educational items, including 109 text articles (90.1%) and 12 videos (9.9%), were retrieved and evaluated by two physicians. No significant observer differences were noted in the number of articles selected or the scores calculated from text or video evaluation (intraclass correlation coefficient = 0.8). The median (IQR) quantity of educational items (text articles and videos) per center was 4.00 (3.25). The quantity of items per center spanned a range of 0 (academic ranks, 9 and 19) to 30 (academic rank, 1) ( Table 3). Regression analysis between rank and number of educational items was found to be statistically significant with a moderate negative correlation of r = 20.53 (P value = 0.017).

Evaluation of Text Materials
Readability-Flesch-Kincaid and Flesch Reading Ease Assessments The median (IQR) FK level of 109 text articles was 11.00 (3.00). The range consisted of articles from the 7thranked academic center, which was the easiest to read at a FK level = 6.50, and the articles of the 12th-and 13thranked academic centers were the most difficult to read at a FK level = 13.00 (Table 3 and Figure 1). In total, only 9 articles (7.4%) were at or below an eighth grade reading level, which is the average reading level in the United States (Table 4). Regression analysis between rank and FK score found a statistically significant weak negative correlation of r = 20.21 (P value = 0.034).
The mean FRE score of text articles by center was 47.32 6 12.14. The range consisted of articles from the 7th-ranked center being the easiest to read with an FRE score of 65.00 and the 13th-ranked center's article, which was the most difficult to read with an FRE score of 30.50 (Table 3). Regression analysis between rank and FRE score was not statistically significant (r = 0.12, P value = 0.215).

Usability-LIDA Score
The median (IQR) LIDA score of text articles by center was 41.00 (1.75). The range comprising the 2nd-ranked center's website articles displayed the greatest usability with a LIDA score of 48.00 and the 13th-ranked center's website articles displayed the lowest usability with a LIDA score of 37.00. According to the LIDA score, 29 (26.6%) of all website articles demonstrated high usability (LIDA score . 44). Regression analysis between rank and LIDA score was not statistically significant. (r = 20.10, P value = 0.295).

Quality-DISCERN Assessment
The median (IQR) DISCERN score of text articles by center was 64.00 (7.00) or 85.33%. The range comprising the 3rd-ranked center's articles displayed the highest quality with a DISCERN score of 69.00 or 92.00% and the 13th-ranked center's articles displayed the lowest quality with a DISCERN score of 58.00 or 77.33%. Overall, 86 (78.9%) of the text articles were deemed to be at "good" quality rating or higher (DISCERN score of . 60 or 80%). Regression analysis between rank and DISCERN

Evaluation of Topic Assessment
Across the 12 topics assessed, the mean number of topics covered per center was 3.85 6 2.64. The most common topics were general information, 34 (28.0%), and impingement, 18 (14.9%), whereas the least common topics were sports hernia, 3 (2.5%), and surgical dislocation, 1 (0.8%), ( Table 5). The topics with the highest median FK readability grade were osteotomy, 12.5, followed by a 12th grade level for arthroscopy, surgical dislocation, snapping hip, and sports hernia ( Figure 2). The topics with the lowest median FK readability grade were rehab, 8; osteonecrosis, 9.5; and 10th grade for both labral tear and bursitis. Finally, the number of topics covered was not significantly associated with rank (P = 0.153) or any of the five assessment tools (P . 0.05).

Evaluation of Subgroup Analysis
When comparing the top 20 centers as four subgroups (ranks 1 to 5, 6 to 10, 11 to 15, and 16 to 20), three statistically significance relationships were found. First, ranks 11 to 15 had a median (IQR) FK score of 11.50 (2.75) that was higher than 10.00 (2.50) for ranks 16 to 20 (P = 0.038). Second, ranks 11 to 15 had a mean FRE score of 40.50 6 10.48 that was lower than 53.79 6 9.96 for ranks 16 to 20 (P , 0.005). Third, ranks 1 to 5 had a median (IQR) DISCERN score of 66.00 (7.00) that was higher than 60.50 (5.50) for ranks 11 to 15 (P , 0.005). No statistical difference was noted in the number of educational items (articles and videos) or the number of topics between four groups.

Discussion
Utilization of internet for patient education has greatly improved the reach of health information; however, the variability of educational resources is substantial. 1 Orthopaedic-related patient educational materials have recently been evaluated, but little research has been focused toward the educational  Graph demonstrating the FK readability grade levels by academic center, shown as median and range. Green line represents the average US reading level, eighth grade. FK = Flesch-Kincaid Assessment of Academic Hip Preservation Materials materials for hip preservation-related content. Our goal was to evaluate the readability, reliability, and quality of hip preservation-related patient education materials provided by leading orthopaedic academic centers and to propose a novel method of video content assessment. Overall, the quantity and readability of the top 20 orthopaedic academic centers was highly variable. The quantity of educational items ranged from 0 to 30, with a large IQR of 3.25. Furthermore, it was found that higher ranked (lower numerical value) centers were moderately associated with a higher number of educational items (r = 20.53, P value = 0.017).
Among the 109 text articles evaluated, readability was assessed by the FK and FRE assessments, both of which found lower readability levels with substantial variability. The median FK grade level readability score was 11.00 for hip preservation-related text articles and was three grade levels above the eighth grade national average reading level in the United States 30,31 and five grade levels above the sixth grade read level recommended by the National Institutes of Health and the American Medical Association for health-related educational information. 32 Regression analysis found that higher ranked (lower numerical value) centers were weakly associated with a higher FK level, reflective of a lower readability (r = 20.21, P value = 0.034). Furthermore, this was supported by the subgroup analysis, in which the top 20 orthopaedic centers were divided into groups of five, finding that the centers ranked 11 to 15 had lower readability than rank 16 to 20.
Unfortunately, only 9 articles (7.4%) were at or below the eighth grade reading level, showing that hip preservation patient education material is predominantly written at a level far above the US average reading comprehension level. The only three academic centers that averaged an FK grade level readability score at or below the 8th grade level were ranked 7th (FK, 6.50), 3rd (FK, 8.00), and 17th (FK, 8.00). This finding demonstrates that the ability to present hip preservation-related health information below the eighth grade reading level is achievable but presents challenges. The health content being described is, at times, difficult to accurately convey without unavoidable medical definitions and terminology, which is why other orthopaedic subspecialties have seen similarly low readability scores. 13,33,34 In addition, this problem is not exclusive to orthopaedics because many surgical subspecialties have uniformly low readability scores. [35][36][37] Similarly, the LIDA score was used to evaluate the usability of text articles and found moderate quality with a median score of 41.00. Only 27% of the articles met the threshold for high usability. Interestingly, the LIDA score was not associated with rank, indicating that the subject matter experts at the top 20 orthopaedic centers who are producing the patient education materials are useable, regardless of the rank. Many scores were considered "fair," which, although technically acceptable, may be concerning because the usability of patient education information could influence patient care. If a website presents information in a way that is difficult for users to find or understand, they may not return to the website for information. 23,38 Furthermore, an additional reduction in health literacy may result because patients discontinue the use of trustworthy educational resources.
The quality of the text articles was assessed by the DISCERN score, which found favorable results. With a median of 64.00 or 85.33%, 79% of the articles found to have "good" or higher rating. Furthermore, the higher academic center rank (lower numerical value) was moderately associated with higher quality text materials (r = 20.39, P value , 0.005). Fur-thermore, this was supported by the subgroup analysis, finding that the centers ranked 1 to 5 had a higher quality than those ranked 11 to 15. The reasoning behind such correlation is unknown; however, higher quality publications were identified by DIS-CERN score as having relevant aims and thorough content. Highly ranked orthopaedic surgery centers may have more resources, infrastructure, specialization, and subject matter expertise that facilitate higher quality patient education publications. However, further research is necessary to identify areas that academic centers can improve on.
Few tools have been implemented for evaluating video content, and to the authors' knowledge, none are known to exist specifically for health-related video assessments. [27][28][29] For this reason, we created a novel health-related video assessment tool to evaluate accessibility, reliability, and quality called the PEVAT ( Table 2). The unique video assessment done in this study was integral in capturing the full scope of patient education material provided by academic centers. We have preliminarily defined a high quality, useful video score to be .20, but further study is necessary to validate the use and interpretation of this tool. Unfortunately, only 25% of the academic center includes video materials as part of their online patient education. However, 83% of those videos met the threshold of high quality, with direct aims and simple descriptions. This is encouraging because videos may be preferred over text materials and can be universally used by patients, regardless of the literacy or reading level 29,39 ; analyses of video scores were not done because of the small number of videos retrieved and assessed. Interestingly, higher rank (lower numerical value) was negatively associated with the PEVAT score, indicating that the lower rated centers produced high-quality video materials. The reason for this association is unknown and may be due to the small sample size. When readability was assessed by hip preservation topics, the variation in scores was similar to the analysis by academic center. The topic of "osteotomy" demonstrated the most difficult readability, whereas articles associated with "rehabilitation" showed the easiest readability. Intuitively, osteotomy-related content may present a more challenging task to explain simplistically versus rehab, which may account for the range in readability scores seen when analyzing topics. However, on average, every topic was written at a higher level than recommended, regardless of the perceived complexity.
Literature investigating web-based orthopaedic patient education materials exists, yet most studies only evaluate the readability of the information analyzed. 3,[14][15][16][17]21,33,34 Previous studies have implemented the FK, FRE, LIDA, and DISCERN scores independently for educational resource evaluation, but rarely has a comprehensive evaluation of patient education materials been done with all assessment instruments. Our study has the advantage of evaluating educational materials with all four of the aforementioned tools. In addition, studies have evaluated the patient education material from national orthopaedic organizations, 3,13-16 from orthopaedic implant manufacturers, 40 and from a handful of select academic centers, 17 but our study has the benefit of evaluating the resources from the top 20 orthopaedic academic centers. This decreases selection bias, making our results more generalizable. In addition, two physician reviewers were used to procure educational materials for evaluation, which significantly limits sampling variability. Because the number of articles selected and scores from article evaluation were not deemed significantly different from each other between the two reviewers, the authors were confident that consistency was maintained throughout the study.
Although this study is the first to evaluate the readability, usability, and quality of all hip preservation-related patient education materials, the authors recognize that limitations to this study exist. The readability measures consider the number and length of words and sentence length, which has limitations because smaller words and sentences can still be difficult to understand. This is especially true with medical jargon; however, the tools used have been validated and routinely used in the literature as an effective and consistent method of evaluating readability. In addition, because the LIDA and DISCERN tools are not completely objective, variation may be seen when articles are evaluated, but consistency has still been shown when multiple observer records are done. 24,41 Third, a selection bias of focusing exclusively on top 20 ranked orthopaedic programs exists. This bias may impair the external validity of the study to hip arthroscopy patient educational materials at other orthopaedic programs. This highlights the need for future study of additional orthopaedic programs. However, this may serve to further emphasis the effect of this study's result because even the top orthopaedic programs in the country have not achieved appropriate patient educational materials. Recently, two studies have evaluated the readability only of arthroscopyrelated topics, with one assessing hip arthroscopy readability specifically. 12,42 Our study differs from these in that our expanded evaluation of 12 areas of hip preservation spans more than just hip arthroscopy itself. In addition, these previous studies evaluated material from internet search engines such as Google, Yahoo!, and Bing, whereas ours assessed online academic center materials. This presents the following two problems: (1) the potentially low quality and reliability of the online material and (2) the minimal direct effect one can have on content improvement.
First, the quality of the material from an internet search engine may not be as reliable or accurate as that from a vetted academic center's website because webpages can be written by anyone. Furthermore, the reliability of attempting to access information suffers because internet search engine results constantly evolve over time. In addition, results are based on one's search device, web history, geographic location, and search engine data centers, making it difficult to obtain consistent results among individuals. Second, although these studies provide valuable information regarding the materials our patients can access, the effect that healthcare providers can have on improving search engine content is severely diminished because of many factors that are implicated in their results. Conversely, the online material of academic centers can be developed by knowledgeable providers and can be a stable source of accurate information that all individuals can access independent of their device, location, etc. For this reason, it is important to not only identify the need for improving the content itself but also to establish the importance for endorsing reliable, accurate information to patients.
The most concerning problem with the content of the hip preservation-related educational materials is the low readability level found in our study. Recommendations for improvement include simpler content descriptions with condensed or smaller sentence structure. However, the addition of illustrations or video content not only makes the article easier to comprehend by giving readers an accompanying visual but also gives patients with limited literacy the opportunity to glean useful information. In this study, video content showed high quality and usability scores, further supporting this recommendation. The evaluation and improvement of academic centers' online hip preservation-related patient education materials can not only influence a patient's understanding of their condition but may also ultimately affect their clinical outcomes.
In conclusion, an analysis of hip preservation patient education text articles from the top 20 ranked orthopaedic surgery academic medical centers found low readability based on the FK and FRE assessments. A median grade level of 11.00 is substantially higher than the recommended or national average reading level. Furthermore, moderate usability and favorable quality existed. Overall, high ranking was associated with poorer readability, higher quality text content, and lower quality video content. Finally, video content was found to be predominantly of high quality. The clinical relevance of this study is seen in the direct correlation between health literacy (including readability, usability, and quality) and patient outcomes. Therefore, improving the educational accessibility and effect of hip preservation-related topics may result in improved treatment outcomes.