Reliability and validity of an Italian four-level emergency triage system

Objectives To measure the reliability and predictive validity of a four-level triage system (I-4L). Methods This observational study was conducted in an urban hospital. Five nurses were randomly selected to assign a triage level to 246 paper scenarios, using the I-4L model. The I-4L model is a four-level triage system: urgency category (UC) 1 requires immediate response; UCs 2, 3 and 4 require assessment within 20, 60 and 120 min, respectively. Weighted κ statistics were used to measure the inter-rater and intrarater reliability of the triage tool and the validity of the model was assessed based on the accuracy in predicting admission and in predicting a reference standard's triage code. Results The I-4L model's inter-rater reliability was κ=0.73 (95% CI 0.67 to 0.79), and the intrarater reliability was κ=0.82 (95% CI 0.67 to 0.96). Its accuracy of triage rating for admission and for prediction of a reference standard's triage code was good: 79% (95% CI 73% to 86%) and 93% (95% CI 89% to 96%), respectively. The percentages of patients admitted per triage level using the I-4L model was: 100% UC 1; 42% UC 2; 6% UC 3; and 2% UC 4. Conclusions The I-4L triage model shows a good inter-rater and intrarater reliability for rating triage acuity and for accuracy in patient admission and prediction of a reference standard's triage code.


INTRODUCTION
Triage is the first assessment and sorting process used to prioritise patients arriving in the emergency department (ED). The most common triage systems are traffic director, spot-check and comprehensive triage. 1 Most current triage tools are based on a categorical measurement acuity scale and are of three four or five levels. The Cape Triage Score (CTS) 2 is a four-level triage system. The Australasian Triage Scale, 3 the Canadian Triage and Acuity Scale (CTAS), 4 the Manchester Triage System (MTS) 5 and the Emergency Severity Index (ESI) 6e9 are all fivelevel triage tools. Italian guidelines require a fourlevel in-hospital triage based on an acuity scale measurement. 10 Consequently, we devised a fourlevel triage system (I-4L) based on 23 flowcharts (contained in a 70-page manual) depending on the patient's complaint. The I-4L triage system has been used in our ED since 2001 but it had not been validated before our study.
Health measurement tools should be valid and reliable. 11 To our knowledge, there are no data regarding the validity and reliability of Italian fourlevel triage systems and very few studies assessing these characteristics in other triage tools. 6e8 12e15 The aim of our study was to measure the reliability and predictive validity of a the I-4L triage system used in our ED.

Study design and setting
This observational study was performed at a large urban medical centre with w65 000 ED visits annually, and an overall ED hospital admission rate of 14%.
In our ED, 15 nurses carry out a comprehensive triage using the I-4L system developed by our Triage Working Group based on Italian guidelines. 10 The I-4L has four urgency categories (UCs): UC 1, immediate response; UCs 2, 3 and 4, assessment within 20, 60 and 120 min, respectively.

Data collection
We created paper triage scenarios with the medical records of patients admitted to our ED during 2 weeks in October 2006. We recorded the following data for 252 patients (18 randomly selected patients each day): demographic and clinical characteristics, nurse triage category, admission status and site, and the data on triage forms completed by the nurse (presenting complaint, mode and time of arrival, past diseases, vital signs and pain score). Each case included the patient's age and gender, presenting complaint, a brief case scenario with mode and time of arrival, past diseases, vital signs and pain score.
Exclusion criteria were: (1) incomplete demographic and clinical data in the triage scenarios (six scenarios were excluded, thus leaving 246 triage scenarios); and (2) absence of code assignment by the nurses (no code assignment was missing). Thus, 246 triage scenarios were included in the final analysis.

Study participants
Five nurses from our ED were assigned to undergo a 5 h refresher course in the I-4L. They were selected by their managers from among nurses willing to participate in the project. According to previous studies, 8 16 a panel of three triage expertsdtwo nurses and one senior clinician, who had emergency teaching triage certification and >15 years emergency nursing and care experiencedindependently assigned triage scores to the 246 scenarios. They used the I-4L method and participated in refresher training. They were also blinded to the triage category assigned by the original triage nurse and by the nurses involved in this study. Their triage scores were the reference standard for the triage level in this study.
The nurses enrolled in the study completed a questionnaire related to their demographics, education and work experience.

Study protocol
After completion of the refresher course, each nurse independently assigned triage scores to the 246 scenarios, at time zero and 6 months later. To prevent communication between participants, the group assigned triage codes on the same day, with each nurse in a different room, and in the presence of the investigators. The triage scenarios were given randomly to the participants. They could consult the I-4L triage methods (the manual for I-4L) and they had a maximum of 3 h for the rating. The assignment of triage codes was repeated 6 months later, in the same way, without a refresher course. The data were collected and entered into a spreadsheet by an investigator who was blind to the aim of the study. The nurse group remained concealed during data entry and analysis.

Data analysis
The triage scores of the panel of triage experts was the reference standard for the triage level in this study. We tested the interrater reliability in the panel of triage experts measuring the weighted k (K).
We calculated inter-rater and intrarater reliability in the group of nurses and assessed the validity of the triage model. Reliability was measured with K by comparing the triage nurses' rating (inter-rater) at time 0 and after 6 months (intrarater). We also measured the inter-rater reliability between the group and its reference standard by measuring the K value among the mode of the urgency category assigned by the nurses of the group and the mode of scores assigned to scenarios by the triage expert panel, our reference standard.
Moreover we calculated the I-4L's sensitivity, specificity and accuracy to predict the reference standard's triage score.
To analyse the predictive validity for patient admission and for the reference standards triage score, for each scenario we considered the mode of the UC assigned by the nurses and we used this code in all validity calculations. We evaluated the validity of the I-4L triage system by calculating sensitivity and specificity for prediction of patient admission and of the reference standard's triage score, using the following cut-offs: true codes 1 and 2¼patient sick and likely to be admitted; true codes 3 and 4¼less urgency and patient likely to be discharged. We calculated sample size according to Worster et al, 13 anticipating a K value of w0.8 from previous studies and an SE of 0.05. Statistical significance was tested at an a level¼0.05. We used the STATA v 9.2. software (StataCorp, College Station, Austin, Texas, USA) for statistical analysis. Being a quality assurance investigation, the study was exempt from formal review. The patients and nurses involved in the study gave permission to access their data.

RESULTS
Of the 246 patients included in triage scenarios, 116 (47%) were women and the mean age was 43.7 years (SD 626.3). The most frequent main symptom was abdominal pain (25/246; 10%). Thirty-seven hospital admissions were recorded: 34 in nonintensive care wards and three in intensive care units. The median number of years in nursing practice was 15 (range: 3e15) with a median of 3 years experience in the ED (range: 1e6) and a median of 3 years experience in ED triage (range: 1e6).
The UCs assigned to each scenario are shown in figure 1. A complete disagreement (when nurses of the same group assigned to the same scenario triage codes that differed by more than two priority levels) occurred in 3% of scenarios evaluated with I-4L and a complete agreement (when all five nurses assigned the same triage code) occurred in 52%. The complete agreement was better in the UC 1 (80%) and UC 2 (69%) triage level compared with UC 4 (31%) and UC 3 (50%) (figure 1). Inter-rater reliability among nurses using I-4L was K¼0.73 (95% CI 0.67 to 0.79), and intrarater reliability was K¼0.82 (95% CI 0.67 to 0.96), respectively. Inter-rater reliability among nurses using I-4L and the triage score of their reference standard was K¼0.76 (95% CI 0.63 to 0.89).
Sensitivity, specificity and accuracy in predicting the reference standard's code was good (table 1). There were no in-hospital deaths among the patients used in the triage scenarios. The rate of hospital admission (evaluated using the triage codes) with respect to each level was: 100% for UC 1, 42% for UC 2, 6% for UC 3 and 2% for UC 4.

DISCUSSION
Our triage system, in this study, seems to have a good inter-rater and intrarater reliability for rating triage acuity and for accuracy in predicting patient admission and a reference standard's triage code.
Many studies have evaluated the reliability and validity of acuity ratings by triage nurses, 6e8 13e20 probably because a triage scale should meet at least these two criteria to perform accurately as intended. 11 21 The inter-rater and intrarater reliability of three-level triage systems has been found to be poor. 12 16 22 However, to our knowledge, there is a lack of data on the reliability and validity of four-level triage systems.
In our study, the high inter-rater reliability score for I-4L (K¼0.73) was similar to the performance of other five-level triage systems, namely K¼0.8 for CTAS 19 and k¼0.76 for ESI. 7 The I-4L triage system has a good inter-rater reliability with its reference standard.
To our knowledge, ours is the first study that measures the intra-rater reliability of a four-level triage system. The lack of previous data could be caused by the high level of difficulty involved in testing this feature: the same nurse, over time, will rate the same patient with the same acuity level. This is why we used paper scenarios several times. Our data also support the scenarios % % complete agreement Figure 1 Urgency category assigned by nurses using the I-4L triage system. True urgency categories indicate the mode of urgency category for each scenario assigned by the group of nurses. Urgency category 1¼immediate response; urgency category 2, 3 and 4¼response within 20, 60 and 120 min, respectively.
validity of our triage score. In fact, the rate of hospital admissions increased in relation to higher acuity ratings (figure 2), and our tool had a high accuracy in predicting hospital admission: positive predictive value 96%. Moreover, the group of nurses who used the I-4L triage system proved accurate in predicting the reference standard's triage code (93% CI 89 to 96). Few studies have used a reference standard to test the validity of a triage system.
It is difficult to compare our results on validity with previous studies because of the differences in the setting and in the type of triage system (five levels vs four levels). Nevertheless, our results on the validity of the I-4L triage system are similar to previous studies on ESI v4. 7 8 16 Our triage tool has one limitation: it could be difficult to learn, consult and teach. In fact it is complex: it is based on 23 flowcharts (contained in a 70-page manual) depending on the patient's complaint. Our triage course requires 2 days to teach.
The main limitation of our study is that it was conducted with paper scenarios and not with patients. However, this procedure has been used and validated in other studies on interrater reliability of triage tools. 13 15 19 20 Another limitation of our study is that we cannot exclude the possibility that the performance of the I-4L method was overestimated because of the nurses' previous experience. Lastly, we evaluated the validity of the triage system based on the accuracy in predicting hospital admission, and hospitalisation rates may vary due to factors other than patients' acuity. The hospital admission rates is not the best outcome to test predictive validity of triage tools because it is a surrogate outcome and there are many confounding variables that could affect it. 11 However, it is very difficult to establish validity criteria for triage acuity classifica-tion in the absence of a clear reference standard. For this reason we tried to develop a surrogate 'gold standard' based on a panel triage expert consensus and we tested the predictive validity of our triage system against this gold standard.
Moreover, we also used the admission rate as outcome because this criterion has been widely used in previous studies. 7e9 14 23e26 CONCLUSION To our knowledge, this is the first study that measures the reliability and predictive validity of an Italian triage system in the ED of an urban hospital . It is also one of the few studies which tests the intrarater reliability for rating triage acuity in a four-level triage system. Our data suggest that the four-level triage model (I-4L) has good inter-rater and intrarater reliability for rating triage acuity. It is also accurate in predicting patient admission and a reference standard's triage code.  Figure 2 Relationship between triage levels and admission status obtained with I-4L. Triage level 1¼immediate response; triage level 2, 3 and 4¼response within 20, 60 and 120 min, respectively.