0
Scientific Article   |    
Reliability, Validity, and Responsiveness of Four Knee Outcome Scales for Athletic Patients
Robert G. Marx, MD, MSc, FRCS(C); Edward C. Jones, MD, MA; Answorth A. Allen, MD; David W. Altchek, MD; Stephen J. O'Brien, MD; Scott A. Rodeo, MD; Riley J. Williams, MD; Russell F. Warren, MD; Thomas L. Wickiewicz, MD
View Disclosures and Other Information
Investigation performed at the Center for Clinical Outcome Research and the Sports Medicine and Shoulder Service, Hospital for Special Surgery, New York, NY
Robert G. Marx, MD, MSc, FRCS(C)
Edward C. Jones, MD, MA
Answorth A. Allen, MD
David W. Altchek, MD
Stephen J. O’Brien, MD
Scott A. Rodeo, MD
Riley J. Williams, MD
Russell F. Warren, MD
Thomas L. Wickiewicz, MD
Center for Clinical Outcome Research (R.G.M. and E.C.J.) and Sports Medicine and Shoulder Service (R.G.M., A.A.A., D.W.A., S.J.O’B., S.A.R., R.J.W., R.F.W., and T.L.W.), Hospital for Special Surgery, 535 East 70th Street, New York, NY 10021. E-mail address for Dr. Marx: marxr@hss.edu

No benefits in any form have been received or will be received from a commercial party related directly or indirectly to the subject of this article. No funds were received in support of this study. Dr. Marx was supported by an American Academy of Orthopaedic Surgeons Health Services Research Fellowship and a Royal College of Physicians and Surgeons of Canada Detweiler Travelling Fellowship.

The Journal of Bone & Joint Surgery.  2001; 83:1459-1469 
5 Recommendations (Recommend) | 3 Comments | Saved by 3 Users Save Case

Abstract

Background: Many patient-based knee-rating scales are available for the evaluation of athletic patients. However, there is little information on the measurement properties of these instruments and therefore no evidence to support the use of one questionnaire rather than another. The goal of the present study was to determine the reliability, validity, and responsiveness of four knee-rating scales commonly used for the evaluation of athletic patients: the Lysholm scale, the subjective components of the Cincinnati knee-rating system, the American Academy of Orthopaedic Surgeons sports knee-rating scale, and the Activities of Daily Living scale of the Knee Outcome Survey.

Methods: All patients in the study had a disorder of the knee and were active in sports (a Tegner score of 4 points). Forty-one patients who had a knee disorder that had stabilized and who were not receiving treatment were administered all four questionnaires at baseline and again at a mean of 5.2 days (range, two to fourteen days) later to test reliability. Forty-two patients were administered the scales at baseline and at a minimum of three months after treatment to test responsiveness. The responses of 133 patients at baseline were studied to test construct validity.

Results: The reliability was high for all scales, with the intraclass correlation coefficient ranging from 0.88 to 0.95. As for construct validity, the correlations among the knee scales ranged from 0.70 to 0.85 and those between the knee scales and the physical component scale of the Short Form-36 (SF-36) and the patient and clinician severity ratings ranged from 0.59 to 0.77. Responsiveness, measured with the standardized response mean, ranged from 0.8 for the Cincinnati knee-rating system to 1.1 for the Activities of Daily Living scale.

Conclusions: All four scales satisfied our criteria for reliability, validity, and responsiveness, and all are acceptable for use in clinical research.

Figures in this Article
    Clinical research studies are placing an increased emphasis on the perspective of the patient with use of health-related quality-of-life instruments. Many patient-based knee-rating scales have been developed for the evaluation of athletic patients1. There is little information on the measurement properties of the existing scales and therefore no evidence to support the use of one questionnaire rather than another. Furthermore, without evidence to support the reliability, validity, and responsiveness of these instruments, the results of clinical studies that are based on these instruments are in question.
    While instruments that have been used to evaluate patients following knee replacement, shoulder surgery, or trauma have been studied in detail2-7, relatively little work has been done to assess or compare knee instruments for the evaluation of athletic patients8.
    Previous work in this area has been cross-sectional in design and therefore has not assessed reliability or how the instruments performed over time (that is, responsiveness)1,8-11. Furthermore, those investigations focused exclusively on patients with anterior cruciate ligament deficiency, substantially limiting the generalizability of the results. The goal of the present study was to determine the reliability, validity, and responsiveness of the scales used to measure disability and symptoms in athletic patients with a wide variety of disorders of the knee.
     
    Anchor for JumpAnchor for Jump
    +Fig. 1:The mean Lysholm scores according to the clinician-rated severity. The error bars indicate the standard deviation.
     
    Anchor for JumpAnchor for Jump
    +Fig. 2:The mean scores on the American Academy of Orthopaedic Surgeons (AAOS) sports knee-rating scale according to the clinician-rated severity. The error bars indicate the standard deviation.
     
    Anchor for JumpAnchor for Jump
    +Fig. 3:The mean scores on the Cincinnati knee-rating system according to the clinician-rated severity. The error bars indicate the standard deviation.
     
    Anchor for JumpAnchor for Jump
    +Fig. 4:The mean Activities of Daily Living (ADL) scores according to the clinician-rated severity. The error bars indicate the standard deviation.
     
    Anchor for JumpAnchor for JumpTABLE I:  Mean Baseline Scores on the Knee-Rating Scales for One Hundred and Thirty-three Patients Included in the Validity Analysis
    InstrumentMean ScoreStandard DeviationLowest ScoreHighest Score
    Activities of Daily Living scale76.018.316.3100.0
    American Academy of Orthopaedic Surgeons sports knee-rating scale73.519.026.9?98.3
    Cincinnati knee-rating scale53.228.7?0.0100.0
    Lysholm scale74.618.715.0100.0
    Tegner scale?6.4?1.6?4.0?10.0
    Short Form-36 subscales
    Physical function74.626.1?5.0100.0
    Role-physical 61.842.9?0.0100.0
    Bodily pain 65.723.012.0100.0
    General health 84.214.442.0100.0
    Vitality 64.419.6?5.0100.0
    Social function83.121.112.5100.0
    Role-emotional 84.231.1?0.0100.0
    Mental health 76.414.224.0100.0
    Physical component scale46.110.220.3?61.0
    Mental component scale53.1?8.717.7?67.8
     
    Anchor for JumpAnchor for JumpTABLE II:  Validity Analysis with Spearman Correlation Matrix for the Four Knee-Rating Scales, the Physical and Mental Component Scales of the Short Form-36 (SF-36), and the Clinician and Patient Ratings of Severity
    *Correlations were significant at p < 0.05. †Correlations were significant at p < 0.01.
    Physical Component Scale of SF-36Mental Component Scale of SF-36American Academy of Orthopaedic Surgeons Sports Knee-Rating ScaleLysholm ScaleCincinnati Knee-Rating ScaleActivities of Daily Living ScalePatient Rating of SeverityClinician Rating of Severity
    Physical component scale of SF-36?—0.18*?0.67†?0.68†?0.68†?0.77†?0.64†?0.5†
    Mental component scale of SF-36—0.18*—0.030.050.01—0.050.080.01
    American Academy of Orthopaedic Surgeons sports knee-rating scale?0.67†—0.03?0.70†?0.83†?0.80†?0.74†?0.64†
    Lysholm scale?0.68†?0.05?0.70†?0.70†?0.85†?0.65†?0.62†
    Cincinnati knee-rating scale?0.68†?0.01?0.83†?0.70†?0.80†?0.67†?0.61†
    Activities of Daily Living scale?0.77†—0.05?0.80†?0.85†?0.80†?0.73†?0.68†
    Patient rating of severity?0.64†?0.08?0.74†?0.65†?0.67†?0.73†?0.58†
    Clinician rating of severity?0.59†?0.01?0.64†?0.62†?0.61†?0.68†?0.58†
     
    Anchor for JumpAnchor for JumpTABLE III:  Validity Analysis with Spearman Correlations for the Four Knee-Rating Scales and Short Form-36 Subscales
    *Correlations were significant at p < 0.01. †Correlations were significant at p < 0.05.
    Short Form-36 SubscaleCincinnati Knee-Rating ScaleLysholm ScaleActivities of Daily Living ScaleAmerican Academy of Orthopaedic Surgeons Sports Knee-Rating Scale
    Physical function0.68*0.66*?0.72*?0.66*
    Role-physical0.52*0.49*?0.60*?0.62*
    Bodily pain0.51*0.57*?0.54*?0.44*
    General health0.26*0.31*?0.28*0.17
    Vitality0.23*0.28*?0.26*?0.19†
    Social function0.50*0.50*?0.48*?0.48*
    Role-emotional0.18†0.18†0.14?0.24*
    Mental health0.24*0.29*?0.21†?0.19†
     
    Anchor for JumpAnchor for JumpTABLE IV:  Validity Analysis of Variance Across Levels of Clinician-Rated and Patient-Rated Severity
    Clinician-Rated SeverityPatient-Rated Severity
    InstrumentF Test StatisticP ValueF Test StatisticP Value
    American Academy of Orthopaedic Surgeons sports knee-rating scale21.1<0.0000000127.3<0.00000001
    Lysholm scale14.8<0.0000000119.5<0.00000001
    Cincinnati knee-rating scale19.4<0.0000000123.8<0.00000001
    Activities of Daily Living scale20.7<0.0000000127.1<0.00000001
    Ten orthopaedic surgeons who specialized in sports-related problems of the knee were polled to determine which questionnaires were the most sound and widely used and therefore the most appropriate for study. The goal was to study questionnaires that measure disabilities rather than impairments. Disabilities are restrictions or the lack of an ability to perform an activity in the usual manner, such as an inability to run, walk, or play a sport12. Impairments are defined as a loss or abnormality of psychological, physiological, or anatomical structure or function at the organ level, such as a decrease in range of motion or an increase in translation of a joint12. Impairments are important to patients and to surgeons; however, they have less impact on patients’ quality of life than do disabilities.
    The Lysholm scale13, the American Academy of Orthopaedic Surgeons sports knee-rating scale14, the Activities of Daily Living scale of the Knee Outcome Survey15, and the Cincinnati knee-rating system10 were studied. The Cincinnati knee-rating system has been widely used; however, the scale has multiple subscores, including some based purely on impairment10. While all components of this scale are relevant to both clinicians and patients, for the purposes of this research we elected to study only the sections relating to disability. We also administered version 1.0 of the Short Form-36 (SF-36) for the validity analyses. The SF-36 is a thirty-six-item questionnaire that measures general health16-18. Its use has been encouraged in conjunction with knee-specific instruments for studies of patients with an injury of the anterior cruciate ligament19. Both a physical component scale and a mental component scale can be derived from the SF-36. Other instruments were not evaluated because they had been recently developed20 or they focused on impairment21,22.
    The present study was not a direct statistical comparison of the reliability, validity, and responsiveness of the four knee-rating scales. The goal of the study was to determine whether the measurement properties of each were satisfactory for use in clinical research.

    Patient Recruitment

    The study was approved by the Institutional Review Board, and all subjects gave informed consent. Patients in the waiting rooms of orthopaedic surgeons who specialized in disorders of the knee were recruited for the study during a six-month period. The patients completed the questionnaires in the waiting room prior to seeing their physician. Patients who were seeing the physician for an initial consultation because of a knee disorder were entered into the responsiveness arm of the study (that is, the arm measuring sensitivity to clinical change), as described below. These patients were retested at a minimum of three months later, following either operative or nonoperative treatment. Patients who were in a clinically stable state were entered into the reliability arm of the study. These individuals were retested within two days to two weeks23,24.
    A wide variety of diagnoses was sought to test a wide spectrum of severity and to ensure generalizability to the various conditions that affect the knee in athletic patients.
    The inclusion criteria were: (1) an ability to read and write English, and (2) the presence of a primary disorder of the knee, including but not limited to patellofemoral disorders (including chondromalacia, patellar dislocation, and patellar tendinitis), instability (acute ligament injury or chronic instability), meniscal injury, or osteochondritis dissecans.
    The exclusion criteria were: (1) an inflammatory joint disease, tumor, or infection, and (2) a Tegner activity rating (prior to injury if the patient had sustained a recent injury) of £3 points25.
    The Tegner scale rates patients from 0 to 10 points on the basis of their activity level and sports participation25. Patients who were not participating in high-demand sports (that is, an activity rating of £3 points, indicating that they did not run or participate in sports except for swimming) were excluded25. We did not exclude patients on the basis of age alone, although patients with a Tegner score of 4 points tended to belong to the age-group of most patients seen in orthopaedic sports-medicine practices.

    Test-Retest Reliability

    The reliability portion of the study involved patients whose condition was stable and was not expected to change prior to the second administration of the questionnaire. These patients did not receive treatment in the interval (range, two to fourteen days) between the first and the second administration of the questionnaires. Patients were excluded if they had had surgery or a traumatic injury in the preceding three months.
    When the patients completed the questionnaires for the second time, they also completed a transitional index in which they rated the severity of the knee condition at that time compared with the severity when the questionnaires were first administered23. They chose from seven responses: "much worse," "somewhat worse," "a little worse," "no change," "a little better," "somewhat better," and "much better". Only patients who responded "no change" were included in the reliability study. The intraclass correlation coefficient and the limits-of-agreement statistic26-28 were used to compare the scores23. The intraclass correlation coefficient is an index of concordance for dimensional measurements, ranging between 0 and 1, where 0.75 is adequate for patients enrolled in a clinical trial29. The limits-of-agreement statistic was also used as a descriptive measure of agreement. This value is the mean difference (and two standard deviations) between the two tests27. Ninety-five percent of the differences between the two test administrations will lie within this interval27.

    Validity

    Validity is an assessment of whether the instrument actually measures what it is intended to measure. A scale is considered to have face validity when its qualitative attributes are deemed to be adequate by individuals with experience in the field30. Content validity is the appraisal of the underlying components of the scale30. These two forms of validity were assessed by five orthopaedic surgeons with experience in the field of sports medicine. The experts were not asked to quantify these types of validity but rather to ensure that the instruments were valid from these points of view in order to make certain that clinicians would be comfortable using them for the evaluation of athletic patients with disorders of the knee.
    Validation is relatively clear-cut when there is a gold standard against which the results can be compared5. In cases where there is no gold standard, such as quality of life, we are forced to use "construct validation." Construct validity is present when the instrument performs as expected in relation to another measurement. We used the patients’ and clinicians’ opinions of severity as well as the other knee-rating scales and the physical component scale of the Short Form-36 (SF-36) as indices to determine construct validity. The patients were asked to rate their condition on a 5-point scale as either "very mildly bothersome," "mildly bothersome," "moderately bothersome," "severely bothersome," or "very severely bothersome."31 Similarly, clinicians were asked to rate the severity of the patient’s knee problem as "very mild," "mild," "moderate," "severe," or "very severe."
    Six hypotheses were proposed to assess the construct validity of the four knee-specific instruments:
    1. Since the knee scales are more specific for abnormalities of the knee, we hypothesized that these instruments would all correlate better with each other than they would with the physical component scale or the mental component scale (Spearman correlation coefficient) and that the physical component scale would correlate more strongly with the knee scales, because it is more specific for physical function, than would the mental component scale.
    2. Since the knee scales are more specific for disorders involving the knee, we hypothesized that the knee-rating scales would correlate better with each other (Spearman correlation coefficient) than they would with any of the eight SF-36 subscales.
    3. Since certain subscales of the SF-36 are more related to symptoms and disabilities experienced by patients with knee disorders, we hypothesized that the knee scales would correlate better with physical function and role-physical (Spearman correlation coefficient) than they would with vitality or social function and that they would correlate better with general health, bodily pain, vitality, and social function than they would with role-emotional or mental health.
    4. Since patient-rated and clinician-rated severity should approximate knee symptoms and disability, we hypothesized that the knee scales would be significantly correlated with clinician-rated and patient-rated severity (Spearman correlation coefficient 0.6, and p £ 0.05).
    5. Since patient-rated and clinician-rated severity should approximate knee symptoms and disability, we hypothesized that there would be a difference in the mean scores on the knee-specific instruments for patients who had different patient-rated severity scores as well as for those who had different clinician-rated severity scores; we thought that the instruments would differentiate between at least two of the levels of severity2 (determined by analysis of variance and the Tukey post hoc honestly significant difference test).
    6. Since we included a broad range of diagnoses of varying severity, we hypothesized that there would be no ceiling or floor effects. Ceiling and floor effects have been defined as one-third of the patients receiving the highest or lowest possible score, respectively32. For greater sensitivity, we defined ceiling and floor effects as one-third of the patients receiving the highest or lowest 10% of the possible scores.

    Responsiveness

    Patients who were expected to have improvement because of the nature of the diagnosis and the proposed treatment were entered into the responsiveness arm of the study33. These patients all had conditions that are known to be successfully treatable in the majority of cases. They were reassessed at a minimum of three months following the initial evaluation. Patients who underwent a reconstructive procedure had a follow-up evaluation at a minimum of six months. Different durations of follow-up were needed because of the different treatments involved. For example, reconstruction of the anterior cruciate ligament requires a longer follow-up to achieve a change in health status than arthroscopic meniscectomy does.
    In order to be able to detect that a true difference had occurred, patients were asked to rate their condition as "much worse," "somewhat worse," "a little worse," "no change," "a little better," "somewhat better," and "much better."34 Patients who responded "a little better," "somewhat better," or "much better" were included in the study of the responsiveness of the instruments, while patients who responded that they were the same or worse were excluded from the responsiveness testing35. We included only patients who stated that their condition had improved in order to be certain that the improvement that we believed that we were measuring had occurred.
    Many statistics are available to determine responsiveness36,37. We elected to use the standardized response mean (the observed change divided by the standard deviation of change) because it has been used widely in previous orthopaedic research5-7 and it incorporates the response variance, allowing statistical testing of the response means38. Standardized response means for validated orthopaedic instruments have ranged from 0.9 to 1.95,6,37.

    The Instruments

    The modified Lysholm scale, as described by Tegner and Lysholm25, is an eight-item questionnaire that was originally designed to evaluate patients following knee ligament surgeryv. It is scored on a 100-point scale, with 25 points for knee stability, 25 points for pain, 15 points for locking, 10 points each for swelling and stair-climbing, and 5 points each for limp, use of a support, and squatting25. This scale has been used extensively in clinical research studies19,39-41.
    The first version of the Cincinnati knee-rating system was published in 1983, with additional modifications developed for occupational activities, athletic activities, symptoms, and functional limitations in sports and daily activities42,43. The system has eleven components, including physical examination, laxity of the knee based on instrumented testing, and radiographic evidence of degenerative joint disease32. We evaluated the subjective component, which includes pain, swelling, and giving-way, as well as the activity-level component, as these two parts are most related to disability.
    The Activities of Daily Living scale of the Knee Outcome Survey was developed recently and published with an evaluation of its reliability, validity, and responsiveness15. This scale is designed for the evaluation of patients with disorders of the knee ranging from injury of the anterior cruciate ligament to arthrosis. It includes seventeen multiple-choice questions divided into two sections: one for symptoms (seven questions) and one for functional disability (ten questions).
    The American Academy of Orthopaedic Surgeons sports knee-rating scale14 was included in the Musculoskeletal Outcomes Data Evaluation and Management System (MODEMS) for athletic patients with disorders of the knee. This instrument has five parts with a total of twenty-three questions: a core section (seven questions) on stiffness, swelling, pain, and function, and four sections (four questions each) on locking or catching on activity, giving-way on activity, current activity limitations due to the knee, and pain on activity due to the knee. At the present time, we are not aware of any published evidence of the reliability, validity, or responsiveness of this instrument.
    Three of the instruments are 100-point scales. The Cincinnati knee-rating system is a 35-point scale that we converted to a 100-point scale, by dividing the score by thirty-five and then multiplying it by 100, to facilitate comparisons.

    Data Management and Analysis

    The four questionnaires were collated in random order before they were presented to each patient in order to avoid a potential bias due to the sequence in which they were completed. Response forms were completed by the patients, and data entry was accomplished by manually scanning the forms. If the response was not readable by the scanner, the data were entered manually. All analyses were carried out on SPSS software (version 9.0; SPSS Advanced Statistics, Chicago, Illinois) for personal computers.
    As already noted, the scoring system for the American Academy of Orthopaedic Surgeons sports knee instrument has five values, or subscales14. As it was impractical to use five values for each patient for the analysis, we used an unweighted mean of the five subscales to calculate an overall score on this instrument. We calculated the score for a given subscale if one-half or more of the items in that subscale could be scored. If it was possible to calculate a score for three or more of the subscales, we calculated the mean of the available scores. The Lysholm, Cincinnati, and Activities of Daily Living scales were scored as recommended by the originators of each system10,13,15.

    Sample Size

    Sample-size calculation indicated that for a = 0.05, b = 0.20, r(0) = 0.60, and r(1) = 0.85, a sample size of forty-two patients was required for the reliability study44. As far as we know, there are no studies that describe the sample size for a responsiveness study, but previous authors have used the same number of patients as those used for reliability studies36. The baseline questionnaires from both groups of patients were used for the validity analyses, which guaranteed a minimum of eighty patients.

    Patient Demographics

    Forty-one patients were included in the reliability arm of the study. Twenty of the patients were male, and twenty-one were female. The mean age was 32.6 years (range, fifteen to sixty years). The mean Tegner score was 6.3 points (range, 4 to 10 points). The diagnoses included injury of the anterior cruciate ligament in twenty-eight patients; osteochondritis dissecans in three; arthrosis, a meniscal tear, and patellofemoral joint pain in two each; and patellar tendinitis, injury of the posterior cruciate ligament, knee dislocation, and patellar tendon rupture in one each. The mean time between completion of the baseline questionnaire and the follow-up questionnaire was 5.2 days (range, two to fourteen days).
    With the baseline responses from the reliability and responsiveness analyses, a total of 133 patients were included in the validity analysis. There were sixty-nine males and sixty-four females. The mean age was 31.5 years (range, fourteen to sixty-five years). The mean Tegner score was 6.4 points (range, 4 to 10 points). The diagnoses included injury of the anterior cruciate ligament in fifty-seven patients; a patellofemoral disorder in twenty-one; a meniscal tear in seventeen; arthrosis in thirteen; injury of the medial collateral ligament in five; osteochondritis dissecans in four; patellar tendinitis and a patellar tendon ossicle in three each; injury of the posterior cruciate ligament and knee dislocation in two each; and Osgood-Schlatter disease, patellar tendon rupture, symptomatic plica, chondral defect, iliotibial band tendinitis, and quadriceps tendon injury in one each.
    Forty-two patients were involved in the responsiveness arm of the study. The mean age was 30.9 years (range, fifteen to sixty-one years). Nineteen of the patients were male, and twenty-three were female. The mean Tegner score was 6.5 points (range, 4 to 9 points). The diagnoses were varied and included a disorder of the patellofemoral joint in fifteen patients; injury of the anterior cruciate ligament in twelve; arthrosis and a meniscal tear in six each; and Osgood-Schlatter disease, injury of the medial collateral ligament, and a patellar tendon ossicle in one each. Twenty-four patients had nonoperative treatment, nine had reconstruction of the anterior cruciate ligament, four had meniscectomy, two had Synvisc (hylan G-F 20) injection, and one each had arthroscopic débridement, meniscal repair, and microfracture.

    Reliability Results

    The mean baseline scores were 71.6 points for the Cincinnati knee-rating system, 84.1 points for the Lysholm scale, 85.8 points for the Activities of Daily Living scale, and 85.1 points for the American Academy of Orthopaedic Surgeons sports knee-rating scale. The mean scores on the follow-up questionnaires were 71.1 points for the Cincinnati knee-rating system, 84.0 points for the Lysholm scale, 86.1 points for the Activities of Daily Living scale, and 85.9 points for the American Academy of Orthopaedic Surgeons sports knee-rating scale.
    The intraclass correlation coefficient for these scales was 0.88 for the Cincinnati knee-rating system, 0.95 for the Lysholm scale, 0.93 for the Activities of Daily Living scale, and 0.92 for the American Academy of Orthopaedic Surgeons sports knee-rating scale. The limits of agreement (mean difference and two standard deviations between the tests) were 3.8 ± 8.0 for the Lysholm scale, 7.8 ± 22.5 for the Cincinnati knee-rating scale, 3.5 ± 9.9 for the Activities of Daily Living scale, and 4.1 ± 9.3 for the American Academy of Orthopaedic Surgeons sports knee-rating scale.

    Validity Results

    The orthopaedic surgeons all considered the scales to have face and content validity. All patients who completed the questionnaires at baseline were included in the construct validity portion of the study since this was a cross-sectional analysis. There were three scenarios in which the baseline responses were used for the validity testing but not for the reliability or responsiveness analysis. Baseline responses of patients who were initially entered into the reliability or responsiveness arm of the study but who did not complete the questionnaires a second time were used for the validity analysis. Patients who were initially entered into the reliability arm but who indicated that the status of the knee had changed on the transitional index when they completed the questionnaires at follow-up were excluded from the reliability study although their baseline responses were used for the validity study. Similarly, patients who were initially entered into the responsiveness arm but who did not believe that the status of the knee had improved on the transitional index at follow-up were excluded from the responsiveness study although their baseline responses were used for the validity study. The mean scores for these patients are listed in Table I.
    With regard to validity testing, the first and second hypotheses were confirmed as the knee-specific scales all correlated better with each other than they did with the physical component scale, the mental component scale, or any of the SF-36 subscales. As expected, all scales correlated better with the physical component scale than they did with the mental component scale (Table II).
    The third hypothesis consisted of two parts. The first was that the physical function and role-physical subscales would correlate better with each knee scale than would the vitality or social function subscales, and the second was that the subscales for general health, bodily pain, vitality, and social function would all correlate better with each knee scale than would the role-emotional and mental health subscales. There were a small number of minor discrepancies from these constructs (Table III).
    The fourth hypothesis was confirmed as all knee-rating scales correlated well with both clinician and patient ratings of severity (Table II). The minimum correlation between either of these constructs and one of the knee-rating scales was 0.61, and all correlations were significant (p < 0.01) (Table II). Patient-rated severity is more relevant, and all scales correlated better with patient-rated severity than they did with clinician-rated severity. Analysis of variance demonstrated a significant difference, with regard to the scores on each knee-rating scale, among patients with different clinician-rated and patient-rated severity (p < 0.00000001) (Table IV and Figs. 1, 2, 3, and 4). Post hoc testing demonstrated that the scores on the Activities of Daily Living scale, the American Academy of Orthopaedic Surgeons sports knee-rating scale, and the Cincinnati knee-rating system differed significantly between two adjacent levels of clinician-rated severity, but the scores on the Lysholm scale did not. All four scales demonstrated significant differences between two or more adjacent levels of patient-graded severity.
    No ceiling or floor effects were demonstrated for any of the scales.

    Responsiveness Results

    The mean scores improved from 37.9 points at baseline to 65.0 points at the time of follow-up for the Cincinnati knee-rating system, from 64.5 points to 79.6 points for the Lysholm scale, from 66.1 points to 83.6 points for the Activities of Daily Living scale, and from 60.6 points to 81.8 points for the American Academy of Orthopaedic Surgeons sports knee-rating scale. The standardized response means were 0.8 for the Cincinnati knee-rating system, 0.9 for the Lysholm scale, 1.1 for the Activities of Daily Living scale, and 1.0 for the American Academy of Orthopaedic Surgeons sports knee-rating scale. While the American Academy of Orthopaedic Surgeons sports knee-rating scale was quite responsive, six patients had questionnaires that were not able to be scored, leaving only thirty-six patients with questionnaires that could be scored at both baseline and follow-up.
    The results of a clinical research study are of questionable value if the measure used to evaluate the effectiveness of treatment is not known to be reliable, valid, and responsive. In many clinical research studies, questionnaires are used as primary outcome measures because these instruments accurately reflect symptoms and disabilities that are specific and important to patients.
    Anderson et al. compared six knee-ligament rating scales in a study of seventy patients who had had reconstruction of the anterior cruciate ligament five years earlier1. They concluded that the International Knee Documentation Committee (IKDC) scale45 should be used to standardize measurements. However, the authors did not present any data on the reliability, validity, or responsiveness of this scale. In another study, the scores of eight knee-rating scales (including the Lysholm, Hospital for Special Surgery, and IKDC scales) were compared in a group of fifty-six patients who had undergone reconstruction of the anterior cruciate ligament8. The authors encouraged the use of the IKDC scale; however, the measurement properties of these scales were not determined.
    Other investigators have compared the Lysholm scale and Cincinnati knee-rating system in studies of patients who either had an insufficient anterior cruciate ligament9 or had had reconstruction of the anterior cruciate ligament10,11,13,46. They found that patients had higher scores on the Lysholm scale but that there was a linear correlation between the two instruments. These studies did not assess the reliability or responsiveness of the instruments.
    Many scales have been developed without patient input or the formal techniques of item generation and item reduction47. In addition, scales have been developed for a wide variety of purposes and for specific patient populations. As a result, different scales may be used for similar studies, which could be a cause for differing conclusions48,49.
    We evaluated the reliability, validity, and responsiveness to clinical change of four questionnaires that assess disability and symptoms in active patients with disorders of the knee. All of the scales were found to have excellent reliability, with the intraclass correlation coefficient ranging from a low of 0.88 for the Cincinnati knee-rating system to a high of 0.95 for the Lysholm scale. (A coefficient of >0.75 is adequate for patients enrolled in a clinical trial29.) The limits-of-agreement statistic is a measure of reliability that provides additional information to the intraclass correlation coefficient27,28. This statistic is the mean difference (and two standard deviations) between two measures used to evaluate the same subject. The mean difference among the four scales was extremely low (range, 3.5 to 7.8). Three of the four scales had a 95% confidence interval between 8.0 and 9.9, while the Cincinnati knee-rating system had a confidence interval of 22.5, indicating increased measurement variability or decreased reliability.
    The knee scales, the physical component scale and mental component scale of the SF-36, and the patient and clinician severity ratings were used as constructs to evaluate validity. All of the scales were thought to have adequate face and content validity by ten orthopaedic surgeons with experience in the field of sports medicine. Our six hypotheses regarding construct validity were confirmed. Such confirmation is important because the use of joint-specific scales could be questioned if the knee-specific instruments did not correlate with each other to a greater degree than they did with the physical component scale, the mental component scale, or the SF-36 subscales.
    Responsiveness is dependent not only on the instrument but also on the magnitude of change actually experienced by the patients. The magnitude of change measured in a cohort of patients is determined by the initial score (lower scores allow more room for improvement), the quality of the intervention, the instrument used to measure the health status, and the statistic used to calculate responsiveness37. The responsiveness of measurement scales in orthopaedics has been measured with use of the standardized response mean. The activities of daily living and symptoms subscales of the Cincinnati knee-rating system had standardized response means of 0.72 and 1.56, respectively, in a study of patients who had undergone anterior cruciate ligament reconstruction32. Two generic health-status instruments had standardized response means of 0.88 and 1.00 in a study of patients who had undergone hip or knee replacement surgery38. The standardized response means for the knee-specific questionnaires in the present study ranged from 0.8 for the Cincinnati knee-rating system to 1.1 for the Activities of Daily Living scale. These values are fairly impressive considering that not all patients underwent surgery and that they were generally not severely disabled at baseline according to their diagnoses and questionnaire scores. Therefore, we concluded that these instruments were all capable of detecting a clinically relevant difference over time.
    The standard deviation of the measure is important for calculating sample size when designing a study to compare two treatments with use of a rating scale as the primary outcome measure50. Standard deviations can be compared in cases where the minimum and maximum scores are the same. In the present study, the standard deviations ranged from 18.3 to 19.0 for three of the scales, while the Cincinnati knee-rating system had a standard deviation of 28.7, again indicating increased measurement variability for the scale. The standard deviations are relatively large, which is possibly due to the heterogeneity of the baseline population. Other instruments have also demonstrated standard deviations in this range when tested in studies of patients with a variety of diagnoses of varying severity. In the initial report that described the Activities of Daily Living scale, a standard deviation of 20.8 was found for the baseline responses of patients with diagnoses that ranged from tendinitis to osteoarthrosis15. Conversely, in a study of a more homogeneous patient group (patients who had recovered from anterior cruciate ligament reconstruction), the Lysholm and Cincinnati scales had lower standard deviations of 8.9 and 10.6, respectively1. In the original study by Lysholm and Gillquist, the standard deviation for the scale was 17.8 for patients with instability of the anterior cruciate ligament but only 10.8 for patients without instability13. Therefore, the standard deviations in the present study are useful for estimating sample size; however, they are likely overestimating the standard deviation that would occur in a more homogeneous patient sample.
    The four knee-specific questionnaires varied in length, with eight questions in the Lysholm scale, ten in the Cincinnati knee-rating system, seventeen in the Activities of Daily Living scale, and twenty-three in the American Academy of Orthopaedic Surgeons sports knee-rating scale. While all of these tools are well within the realm of acceptable responder burden, the number of items is important, particularly if the questionnaires are administered in conjunction with a generic health-status measure.
    The scoring was relatively straightforward for three of the scales; however, the scoring manual provided for the American Academy of Orthopaedic Surgeons sports knee-rating scale suggests the calculation of five subscales. While this information is valuable from a clinical perspective, it is too complicated to have five subscales describing each aspect of knee function for each patient in a research initiative. In addition, this scale was the only one of the four to have the response "cannot do for other reasons." The scoring manual states that this item should be "dropped," which we interpreted as "scored as missing." We elected to calculate the mean of the subscales (to arrive at an overall score for the instrument) if it was possible to score three or more of the subscales. The fact that this scale had more items and more complicated scoring and that more questionnaires could not be scored because of missing responses makes its use somewhat more onerous. However, when patients completed the questionnaire sufficiently so that a score could be calculated, the instrument was reliable, valid, and responsive according to our criteria.
    One limitation of the present study is that the results are not generalizable to patients who are not active in sports (that is, those who have a Tegner rating of <4 points). Another potential limitation is that we chose to measure a variety of disorders of the knee. We did so to allow generalizability of the results to a wide variety of patients and treatments. In effect, our cohort of active patients with disorders of the knee is a relatively homogeneous group from the perspective of general orthopaedic and medical health. While it is possible that a given scale would perform differently in a group of patients with a single diagnosis, it is unlikely that this discrepancy would be very large. Lastly, we used only the subjective components of the Cincinnati knee-rating system, and our results do not apply to the other components of this system, which mainly measure impairments.
    The Activities of Daily Living scale was well understood by patients, could be completed in a relatively short time-period, and had slightly better construct validity and responsiveness than the other scales. This finding is possibly due to the clear wording of the instrument or to the fact that it evaluates a very wide variety of symptoms and disabilities compared with the others. The latter allows an investigator to use this instrument for studies involving various knee diagnoses, as was its intended purpose. We recommend this instrument for the study of disorders of the knee in athletic patients.
    The American Academy of Orthopaedic Surgeons, Lysholm, and Cincinnati knee-rating scales also satisfied our criteria for reliability, validity, and responsiveness, and all are acceptable for use in clinical research. The four scales have many areas of overlap, and the development of statistical methods to compare the results from one scale with those from another is an important area for future research. Additional work to evaluate some of the newer, well-designed knee-specific tools, such as the quality-of-life outcome measure for chronic anterior cruciate ligament deficiency51 and the Knee Injury and Osteoarthritis Outcome Score (KOOS)20, is required.
    Anderson AA, Federspiel CF,Snyder RB. Evaluation of knee ligament rating systems. Am J Knee Surg,1993;6: 67-73. 667  1993 
     
    Beaton DE,Richards RR. Measuring function of the shoulder. A cross-sectional comparison of five questionnaires. J Bone Joint Surg Am,1996;7: 882-90.. 7882  1996 
     
    Bombardier C, Melfi CA, Paul J, Green R, Hawker G, Wright J,Coyte P. Comparison of a generic and a disease-specific measure of pain and physical function after knee replacement surgery. Med Care,1995;33(4 suppl): 131-44. 33(4 suppl)131  1995 
     
    Hawker G, Melfi C, Paul J, Green R,Bombardier C. Comparison of a generic (SF-36) and a disease specific (WOMAC) (Western Ontario and McMaster Universities Osteoarthritis Index) instrument in the measurement of outcomes after knee replacement surgery. J Rheumatol,1995;22: 1193-6. 221193  1995  [PubMed]
     
    Kirkley A, Griffin S, McLintock H,Ng L. The development and evaluation of a disease-specific quality of life measurement tool for shoulder instability. The Western Ontario Shoulder Instability Index (WOSI). Am J Sports Med,1998;26: 764-72. 26764  1998  [PubMed]
     
    L’Insalata JC, Warren RF, Cohen SB, Altchek DW,Peterson MG. A self-administered questionnaire for assessment of symptoms and function of the shoulder. J Bone Joint Surg Am,1997;79: 738-48. 79738  1997  [PubMed]
     
    Martin DP, Engelberg R, Agel J,Swiontkowski MF. Comparison of the Musculoskeletal Function Assessment questionnaire with the Short Form-36, the Western Ontario and McMaster Universities Osteoarthritis Index, and the Sickness Impact Profile health-status measures. J Bone Joint Surg Am,1997;79: 1323-35. 791323  1997  [PubMed]
     
    Labs K,Paul B. To compare and contrast the various evaluation scoring systems after anterior cruciate ligament reconstruction. Arch Orthop Trauma Surg,1997;116: 92-6. 11692  1997  [PubMed]
     
    Bollen S,Seedhom BB. A comparison of the Lysholm and Cincinnati knee scoring questionnaires. Am J Sports Med,1991;19: 189-90. 19189  1991  [PubMed]
     
    Noyes FR, Barber SD,Mooar LA. A rationale for assessing sports activity levels and limitations in knee disorders. Clin Orthop,1989;246: 238-49. 246238  1989  [PubMed]
     
    Noyes FR, Barber SD,Mangine RE. Bone-patellar ligament-bone and fascia lata allografts for reconstruction of the anterior cruciate ligament. J Bone Joint Surg Am,1990;72: 1125-36. 721125  1990  [PubMed]
     
    Verbrugge LM,Jette AM. The disablement process. Soc Sci Med,1994;38: 1-14. 381  1994  [PubMed]
     
    Lysholm J,Gillquist J. Evaluation of knee ligament surgery results with special emphasis on use of a scoring scale. Am J Sports Med,1982;10: 150-4. 10150  1982  [PubMed]
     
    American Academy of Orthopaedic Surgeons. Scoring algorithms for the lower limb outcomes data collection instrument, version 2.0. Rosemont, IL: American Academy of Orthopaedic Surgeons; 1998. 
     
    Irrgang JJ, Snyder-Mackler L, Wainner RS, Fu FH,Harner CD. Development of a patient-reported measure of function of the knee. J Bone Joint Surg Am,1998;80: 1132-45. 801132  1998  [PubMed]
     
    McHorney CA, Ware JE Jr,Raczek AE. The MOS 36-Item Short-Form Health Survey (SF-36): II. Psychometric and clinical tests of validity in measuring physical and mental health constructs. Med Care,1993;31: 247-63. 31247  1993  [PubMed]
     
    McHorney CA, Ware JE Jr, Rogers W, Raczek AE,Lu JF. The validity and relative precision of MOS short- and long-form health status scales and Dartmouth COOP charts. Results from the Medical Outcomes Study. Med Care,1992;30(5 suppl): 253-65. 30(5 suppl)253  1992 
     
    Ware JE Jr, Snow KK, Kosinski M, Gandek B. SF-36 health survey: manual and interpretation guide. Boston: The Health Institute, New England Medical Center; 1993 
     
    Shapiro ET, Richmond JC, Rockett SE, McGrath MM,Donaldson WR. The use of a generic, patient-based health assessment (SF-36) for evaluation of patients with anterior cruciate ligament injuries. Am J Sports Med,1996;24: 196-200. 24196  1996  [PubMed]
     
    Roos EM, Roos HP, Lohmander LS, Ekdahl C,Beynnon BD. Knee Injury and Osteoarthritis Outcome Score (KOOS)—development of a self-administered outcome measure. J Orthop Sports Phys Ther,1998;28: 88-96. 2888  1998  [PubMed]
     
    Brinker MR, Garcia R, Barrack RL, Timon S, Guinn S,Fong B. An analysis of sports knee evaluation instruments. Am J Knee Surg,1999;12: 15-24. 1215  1999  [PubMed]
     
    Marshall JL, Fetto JF,Botero PM. Knee ligament injuries: a standardized evaluation method. Clin Orthop,1977;123: 115-29. 123115  1977  [PubMed]
     
    Deyo RA, Diehr P,Patrick DL. Reproducibility and responsiveness of health status measures. Statistics and strategies for evaluation. Control Clin Trials,1991;12(4 suppl): 142S-58S. 12(4 suppl)142  1991 
     
    Streiner DL, Norman GR. Health measurement scales: a practical guide to their development and use. New York: Oxford University Press; 1989 
     
    Tegner Y,Lysholm J. Rating systems in the evaluation of knee ligament injuries. Clin Orthop,1985;198: 43-9. 19843  1985  [PubMed]
     
    Bland JM,Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet,1986;1: 307-10. 1307  1986  [PubMed]
     
    Bland JM,Altman DG. A note on the use of the intraclass correlation coefficient in the evaluation of agreement between two methods of measurement. Comput Biol Med,1990;20: 337-40. 20337  1990  [PubMed]
     
    Bland JM,Altman DG. Comparing methods of measurement: why plotting difference against standard method is misleading. Lancet,1995;346: 1085-7. 3461085  1995  [PubMed]
     
    Rosner B.Fundamentals of biostatistics. 4th ed. Belmont, CA: Duxbury Press; 1995 
     
    Feinstein AR. Clinimetrics. New Haven, CT: Yale University Press; 1987 
     
    Juniper EF,Guyatt GH. Development and testing of a new measure of health status for clinical trials in rhinoconjunctivitis. Clin Exp Allergy, 1991;21: 77-83. 2177  1991  [PubMed]
     
    Barber-Westin SD, Noyes FR,McCloskey JW. Rigorous statistical reliability, validity, and responsiveness testing of the Cincinnati knee rating system in 350 subjects with uninjured, injured, or anterior cruciate ligament-reconstructed knees. Am J Sports Med,1999;27: 402-16. 27402  1999  [PubMed]
     
    Guyatt G, Walter S,Norman G. Measuring change over time: assessing the usefulness of evaluative instruments. J Chronic Dis,1987;40: 171-8. 40171  1987  [PubMed]
     
    Guyatt G, Mitchell A, Irvine EJ, Singer J, Williams N, Goodacre R,Tompkins C. A new measure of health status for clinical trials in inflammatory bowel disease. Gastroenterology,1989;96: 804-10. 96804  1989  [PubMed]
     
    Guyatt GH, Eagle DJ, Sackett B, Willan A, Griffith L, McIlroy W, Patterson CJ,Turpie I. Measuring quality of life in the frail elderly. J Clin Epidemiol,1993;46: 1433-44. 461433  1993  [PubMed]
     
    Beaton DE, Hogg-Johnson S,Bombardier C. Evaluating changes in health status: reliability and responsiveness of five generic health status measures in workers with musculoskeletal disorders. J Clin Epidemiol,1997;50: 79-93. 5079  1997  [PubMed]
     
    Wright JG,Young NL. A comparison of different indices of responsiveness. J Clin Epidemiol,1997;50: 239-46. 50239  1997  [PubMed]
     
    Liang MH, Fossel AH,Larson MG. Comparisons of five health status instruments for orthopedic evaluation. Med Care,1990;28: 632-42. 28632  1990  [PubMed]
     
    Gauffin H, Pettersson G, Tegner Y,Tropp H. Function testing in patients with old rupture of the anterior cruciate ligament. Int J Sports Med,1990;11: 73-7. 1173  1990  [PubMed]
     
    Odensten M, Hamberg P, Nordin M, Lysholm J,Gillquist J. Surgical or conservative treatment of the acutely torn anterior cruciate ligament. A randomized study with short-term follow-up observations. Clin Orthop,1985;198: 87-93. 19887  1985  [PubMed]
     
    Roberts TS, Drez D Jr, McCarthy W,Paine R. Anterior cruciate ligament reconstruction using freeze-dried, ethylene oxide-sterilized, bone-patellar tendon-bone allografts. Two year results in thirty-six patients. Am J Sports Med,1991;19: 35-41. erratum, 1991;19:2721935  1991  [PubMed]
     
    Noyes FR, Mooar PA, Matthews DS,Butler DL. The symptomatic anterior cruciate-deficient knee. Part I: The long-term functional disability in athletically active individuals. J Bone Joint Surg Am,1983;65: 154-62. 65154  1983  [PubMed]
     
    Noyes FR, Matthews DS, Mooar PA,Grood ES. The symptomatic anterior cruciate-deficient knee. Part II: The results of rehabilitation, activity modification, and counseling on functional disability. J Bone Joint Surg Am,1983;65: 163-74. 65163  1983  [PubMed]
     
    Donner A,Eliasziw M. Sample size requirements for reliability studies. Stat Med,1987;6: 441-8. 6441  1987  [PubMed]
     
    Hefti F,Muller W. Current state of evaluation of knee ligament lesions. The new IKDC knee evaluation form. Orthopäde,1993;22: 351-62. German22351  1993  [PubMed]
     
    Sgaglione NA, Del Pizzo W, Fox JM,Friedman MJ. Critical analysis of knee ligament rating systems. Am J Sports Med,1995;23: 660-7. 23660  1995  [PubMed]
     
    Wright JG. Quality of life in orthopaedics. In: Spilker B, editor. Quality of life and pharmacoeconomics in clinical trials. 2nd ed. Philadelphia: Lippincott-Raven; 1996. p 1039-44 
     
    Andersson G. Hip assessment: a comparison of nine different methods. J Bone Joint Surg Br,1972;54: 621-5. 54621  1972  [PubMed]
     
    Callaghan JJ, Dysart SH, Savory CF,Hopkinson WJ. Assessing the results of hip replacement. A comparison of five different rating systems. J Bone Joint Surg Br,1990;72: 1008-9. 721008  1990  [PubMed]
     
    Freedman KB,Bernstein J. Sample size and statistical power in clinical orthopaedic research. J Bone Joint Surg Am,1999;81: 1454-60.. 811454  1999  [PubMed]
     
    Mohtadi N. Development and validation of the quality of life outcome measure (questionnaire) for chronic anterior cruciate ligament deficiency. Am J Sports Med,1998;  1998 
     

    Submit a comment

    Topics

    Anchor for JumpAnchor for Jump
    +Fig. 1:The mean Lysholm scores according to the clinician-rated severity. The error bars indicate the standard deviation.
    Anchor for JumpAnchor for Jump
    +Fig. 2:The mean scores on the American Academy of Orthopaedic Surgeons (AAOS) sports knee-rating scale according to the clinician-rated severity. The error bars indicate the standard deviation.
    Anchor for JumpAnchor for Jump
    +Fig. 3:The mean scores on the Cincinnati knee-rating system according to the clinician-rated severity. The error bars indicate the standard deviation.
    Anchor for JumpAnchor for Jump
    +Fig. 4:The mean Activities of Daily Living (ADL) scores according to the clinician-rated severity. The error bars indicate the standard deviation.
    Anchor for JumpAnchor for JumpTABLE I:  Mean Baseline Scores on the Knee-Rating Scales for One Hundred and Thirty-three Patients Included in the Validity Analysis
    InstrumentMean ScoreStandard DeviationLowest ScoreHighest Score
    Activities of Daily Living scale76.018.316.3100.0
    American Academy of Orthopaedic Surgeons sports knee-rating scale73.519.026.9?98.3
    Cincinnati knee-rating scale53.228.7?0.0100.0
    Lysholm scale74.618.715.0100.0
    Tegner scale?6.4?1.6?4.0?10.0
    Short Form-36 subscales
    Physical function74.626.1?5.0100.0
    Role-physical 61.842.9?0.0100.0
    Bodily pain 65.723.012.0100.0
    General health 84.214.442.0100.0
    Vitality 64.419.6?5.0100.0
    Social function83.121.112.5100.0
    Role-emotional 84.231.1?0.0100.0
    Mental health 76.414.224.0100.0
    Physical component scale46.110.220.3?61.0
    Mental component scale53.1?8.717.7?67.8
    Anchor for JumpAnchor for JumpTABLE II:  Validity Analysis with Spearman Correlation Matrix for the Four Knee-Rating Scales, the Physical and Mental Component Scales of the Short Form-36 (SF-36), and the Clinician and Patient Ratings of Severity
    *Correlations were significant at p < 0.05. †Correlations were significant at p < 0.01.
    Physical Component Scale of SF-36Mental Component Scale of SF-36American Academy of Orthopaedic Surgeons Sports Knee-Rating ScaleLysholm ScaleCincinnati Knee-Rating ScaleActivities of Daily Living ScalePatient Rating of SeverityClinician Rating of Severity
    Physical component scale of SF-36?—0.18*?0.67†?0.68†?0.68†?0.77†?0.64†?0.5†
    Mental component scale of SF-36—0.18*—0.030.050.01—0.050.080.01
    American Academy of Orthopaedic Surgeons sports knee-rating scale?0.67†—0.03?0.70†?0.83†?0.80†?0.74†?0.64†
    Lysholm scale?0.68†?0.05?0.70†?0.70†?0.85†?0.65†?0.62†
    Cincinnati knee-rating scale?0.68†?0.01?0.83†?0.70†?0.80†?0.67†?0.61†
    Activities of Daily Living scale?0.77†—0.05?0.80†?0.85†?0.80†?0.73†?0.68†
    Patient rating of severity?0.64†?0.08?0.74†?0.65†?0.67†?0.73†?0.58†
    Clinician rating of severity?0.59†?0.01?0.64†?0.62†?0.61†?0.68†?0.58†
    Anchor for JumpAnchor for JumpTABLE III:  Validity Analysis with Spearman Correlations for the Four Knee-Rating Scales and Short Form-36 Subscales
    *Correlations were significant at p < 0.01. †Correlations were significant at p < 0.05.
    Short Form-36 SubscaleCincinnati Knee-Rating ScaleLysholm ScaleActivities of Daily Living ScaleAmerican Academy of Orthopaedic Surgeons Sports Knee-Rating Scale
    Physical function0.68*0.66*?0.72*?0.66*
    Role-physical0.52*0.49*?0.60*?0.62*
    Bodily pain0.51*0.57*?0.54*?0.44*
    General health0.26*0.31*?0.28*0.17
    Vitality0.23*0.28*?0.26*?0.19†
    Social function0.50*0.50*?0.48*?0.48*
    Role-emotional0.18†0.18†0.14?0.24*
    Mental health0.24*0.29*?0.21†?0.19†
    Anchor for JumpAnchor for JumpTABLE IV:  Validity Analysis of Variance Across Levels of Clinician-Rated and Patient-Rated Severity
    Clinician-Rated SeverityPatient-Rated Severity
    InstrumentF Test StatisticP ValueF Test StatisticP Value
    American Academy of Orthopaedic Surgeons sports knee-rating scale21.1<0.0000000127.3<0.00000001
    Lysholm scale14.8<0.0000000119.5<0.00000001
    Cincinnati knee-rating scale19.4<0.0000000123.8<0.00000001
    Activities of Daily Living scale20.7<0.0000000127.1<0.00000001
    Anderson AA, Federspiel CF,Snyder RB. Evaluation of knee ligament rating systems. Am J Knee Surg,1993;6: 67-73. 667  1993 
     
    Beaton DE,Richards RR. Measuring function of the shoulder. A cross-sectional comparison of five questionnaires. J Bone Joint Surg Am,1996;7: 882-90.. 7882  1996 
     
    Bombardier C, Melfi CA, Paul J, Green R, Hawker G, Wright J,Coyte P. Comparison of a generic and a disease-specific measure of pain and physical function after knee replacement surgery. Med Care,1995;33(4 suppl): 131-44. 33(4 suppl)131  1995 
     
    Hawker G, Melfi C, Paul J, Green R,Bombardier C. Comparison of a generic (SF-36) and a disease specific (WOMAC) (Western Ontario and McMaster Universities Osteoarthritis Index) instrument in the measurement of outcomes after knee replacement surgery. J Rheumatol,1995;22: 1193-6. 221193  1995  [PubMed]
     
    Kirkley A, Griffin S, McLintock H,Ng L. The development and evaluation of a disease-specific quality of life measurement tool for shoulder instability. The Western Ontario Shoulder Instability Index (WOSI). Am J Sports Med,1998;26: 764-72. 26764  1998  [PubMed]
     
    L’Insalata JC, Warren RF, Cohen SB, Altchek DW,Peterson MG. A self-administered questionnaire for assessment of symptoms and function of the shoulder. J Bone Joint Surg Am,1997;79: 738-48. 79738  1997  [PubMed]
     
    Martin DP, Engelberg R, Agel J,Swiontkowski MF. Comparison of the Musculoskeletal Function Assessment questionnaire with the Short Form-36, the Western Ontario and McMaster Universities Osteoarthritis Index, and the Sickness Impact Profile health-status measures. J Bone Joint Surg Am,1997;79: 1323-35. 791323  1997  [PubMed]
     
    Labs K,Paul B. To compare and contrast the various evaluation scoring systems after anterior cruciate ligament reconstruction. Arch Orthop Trauma Surg,1997;116: 92-6. 11692  1997  [PubMed]
     
    Bollen S,Seedhom BB. A comparison of the Lysholm and Cincinnati knee scoring questionnaires. Am J Sports Med,1991;19: 189-90. 19189  1991  [PubMed]
     
    Noyes FR, Barber SD,Mooar LA. A rationale for assessing sports activity levels and limitations in knee disorders. Clin Orthop,1989;246: 238-49. 246238  1989  [PubMed]
     
    Noyes FR, Barber SD,Mangine RE. Bone-patellar ligament-bone and fascia lata allografts for reconstruction of the anterior cruciate ligament. J Bone Joint Surg Am,1990;72: 1125-36. 721125  1990  [PubMed]
     
    Verbrugge LM,Jette AM. The disablement process. Soc Sci Med,1994;38: 1-14. 381  1994  [PubMed]
     
    Lysholm J,Gillquist J. Evaluation of knee ligament surgery results with special emphasis on use of a scoring scale. Am J Sports Med,1982;10: 150-4. 10150  1982  [PubMed]
     
    American Academy of Orthopaedic Surgeons. Scoring algorithms for the lower limb outcomes data collection instrument, version 2.0. Rosemont, IL: American Academy of Orthopaedic Surgeons; 1998. 
     
    Irrgang JJ, Snyder-Mackler L, Wainner RS, Fu FH,Harner CD. Development of a patient-reported measure of function of the knee. J Bone Joint Surg Am,1998;80: 1132-45. 801132  1998  [PubMed]
     
    McHorney CA, Ware JE Jr,Raczek AE. The MOS 36-Item Short-Form Health Survey (SF-36): II. Psychometric and clinical tests of validity in measuring physical and mental health constructs. Med Care,1993;31: 247-63. 31247  1993  [PubMed]
     
    McHorney CA, Ware JE Jr, Rogers W, Raczek AE,Lu JF. The validity and relative precision of MOS short- and long-form health status scales and Dartmouth COOP charts. Results from the Medical Outcomes Study. Med Care,1992;30(5 suppl): 253-65. 30(5 suppl)253  1992 
     
    Ware JE Jr, Snow KK, Kosinski M, Gandek B. SF-36 health survey: manual and interpretation guide. Boston: The Health Institute, New England Medical Center; 1993 
     
    Shapiro ET, Richmond JC, Rockett SE, McGrath MM,Donaldson WR. The use of a generic, patient-based health assessment (SF-36) for evaluation of patients with anterior cruciate ligament injuries. Am J Sports Med,1996;24: 196-200. 24196  1996  [PubMed]
     
    Roos EM, Roos HP, Lohmander LS, Ekdahl C,Beynnon BD. Knee Injury and Osteoarthritis Outcome Score (KOOS)—development of a self-administered outcome measure. J Orthop Sports Phys Ther,1998;28: 88-96. 2888  1998  [PubMed]
     
    Brinker MR, Garcia R, Barrack RL, Timon S, Guinn S,Fong B. An analysis of sports knee evaluation instruments. Am J Knee Surg,1999;12: 15-24. 1215  1999  [PubMed]
     
    Marshall JL, Fetto JF,Botero PM. Knee ligament injuries: a standardized evaluation method. Clin Orthop,1977;123: 115-29. 123115  1977  [PubMed]
     
    Deyo RA, Diehr P,Patrick DL. Reproducibility and responsiveness of health status measures. Statistics and strategies for evaluation. Control Clin Trials,1991;12(4 suppl): 142S-58S. 12(4 suppl)142  1991 
     
    Streiner DL, Norman GR. Health measurement scales: a practical guide to their development and use. New York: Oxford University Press; 1989 
     
    Tegner Y,Lysholm J. Rating systems in the evaluation of knee ligament injuries. Clin Orthop,1985;198: 43-9. 19843  1985  [PubMed]
     
    Bland JM,Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet,1986;1: 307-10. 1307  1986  [PubMed]
     
    Bland JM,Altman DG. A note on the use of the intraclass correlation coefficient in the evaluation of agreement between two methods of measurement. Comput Biol Med,1990;20: 337-40. 20337  1990  [PubMed]
     
    Bland JM,Altman DG. Comparing methods of measurement: why plotting difference against standard method is misleading. Lancet,1995;346: 1085-7. 3461085  1995  [PubMed]
     
    Rosner B.Fundamentals of biostatistics. 4th ed. Belmont, CA: Duxbury Press; 1995 
     
    Feinstein AR. Clinimetrics. New Haven, CT: Yale University Press; 1987 
     
    Juniper EF,Guyatt GH. Development and testing of a new measure of health status for clinical trials in rhinoconjunctivitis. Clin Exp Allergy, 1991;21: 77-83. 2177  1991  [PubMed]
     
    Barber-Westin SD, Noyes FR,McCloskey JW. Rigorous statistical reliability, validity, and responsiveness testing of the Cincinnati knee rating system in 350 subjects with uninjured, injured, or anterior cruciate ligament-reconstructed knees. Am J Sports Med,1999;27: 402-16. 27402  1999  [PubMed]
     
    Guyatt G, Walter S,Norman G. Measuring change over time: assessing the usefulness of evaluative instruments. J Chronic Dis,1987;40: 171-8. 40171  1987  [PubMed]
     
    Guyatt G, Mitchell A, Irvine EJ, Singer J, Williams N, Goodacre R,Tompkins C. A new measure of health status for clinical trials in inflammatory bowel disease. Gastroenterology,1989;96: 804-10. 96804  1989  [PubMed]
     
    Guyatt GH, Eagle DJ, Sackett B, Willan A, Griffith L, McIlroy W, Patterson CJ,Turpie I. Measuring quality of life in the frail elderly. J Clin Epidemiol,1993;46: 1433-44. 461433  1993  [PubMed]
     
    Beaton DE, Hogg-Johnson S,Bombardier C. Evaluating changes in health status: reliability and responsiveness of five generic health status measures in workers with musculoskeletal disorders. J Clin Epidemiol,1997;50: 79-93. 5079  1997  [PubMed]
     
    Wright JG,Young NL. A comparison of different indices of responsiveness. J Clin Epidemiol,1997;50: 239-46. 50239  1997  [PubMed]
     
    Liang MH, Fossel AH,Larson MG. Comparisons of five health status instruments for orthopedic evaluation. Med Care,1990;28: 632-42. 28632  1990  [PubMed]
     
    Gauffin H, Pettersson G, Tegner Y,Tropp H. Function testing in patients with old rupture of the anterior cruciate ligament. Int J Sports Med,1990;11: 73-7. 1173  1990  [PubMed]
     
    Odensten M, Hamberg P, Nordin M, Lysholm J,Gillquist J. Surgical or conservative treatment of the acutely torn anterior cruciate ligament. A randomized study with short-term follow-up observations. Clin Orthop,1985;198: 87-93. 19887  1985  [PubMed]
     
    Roberts TS, Drez D Jr, McCarthy W,Paine R. Anterior cruciate ligament reconstruction using freeze-dried, ethylene oxide-sterilized, bone-patellar tendon-bone allografts. Two year results in thirty-six patients. Am J Sports Med,1991;19: 35-41. erratum, 1991;19:2721935  1991  [PubMed]
     
    Noyes FR, Mooar PA, Matthews DS,Butler DL. The symptomatic anterior cruciate-deficient knee. Part I: The long-term functional disability in athletically active individuals. J Bone Joint Surg Am,1983;65: 154-62. 65154  1983  [PubMed]
     
    Noyes FR, Matthews DS, Mooar PA,Grood ES. The symptomatic anterior cruciate-deficient knee. Part II: The results of rehabilitation, activity modification, and counseling on functional disability. J Bone Joint Surg Am,1983;65: 163-74. 65163  1983  [PubMed]
     
    Donner A,Eliasziw M. Sample size requirements for reliability studies. Stat Med,1987;6: 441-8. 6441  1987  [PubMed]
     
    Hefti F,Muller W. Current state of evaluation of knee ligament lesions. The new IKDC knee evaluation form. Orthopäde,1993;22: 351-62. German22351  1993  [PubMed]
     
    Sgaglione NA, Del Pizzo W, Fox JM,Friedman MJ. Critical analysis of knee ligament rating systems. Am J Sports Med,1995;23: 660-7. 23660  1995  [PubMed]
     
    Wright JG. Quality of life in orthopaedics. In: Spilker B, editor. Quality of life and pharmacoeconomics in clinical trials. 2nd ed. Philadelphia: Lippincott-Raven; 1996. p 1039-44 
     
    Andersson G. Hip assessment: a comparison of nine different methods. J Bone Joint Surg Br,1972;54: 621-5. 54621  1972  [PubMed]
     
    Callaghan JJ, Dysart SH, Savory CF,Hopkinson WJ. Assessing the results of hip replacement. A comparison of five different rating systems. J Bone Joint Surg Br,1990;72: 1008-9. 721008  1990  [PubMed]
     
    Freedman KB,Bernstein J. Sample size and statistical power in clinical orthopaedic research. J Bone Joint Surg Am,1999;81: 1454-60.. 811454  1999  [PubMed]
     
    Mohtadi N. Development and validation of the quality of life outcome measure (questionnaire) for chronic anterior cruciate ligament deficiency. Am J Sports Med,1998;  1998 
     
    Accreditation Statement
    These activities have been planned and implemented in accordance with the Essential Areas and policies of the Accreditation Council for Continuing Medical Education (ACCME) through the joint sponsorship of the American Academy of Orthopaedic Surgeons and The Journal of Bone and Joint Surgery, Inc. The American Academy of Orthopaedic Surgeons is accredited by the ACCME to provide continuing medical education for physicians.
    CME Activities Associated with This Article
    Submit a Comment
    Please read the other comments before you post yours. Contributors must reveal any conflict of interest.
    Comments are moderated and will appear on the site at the discretion of JBJS editorial staff.

    * = Required Field
    (if multiple authors, separate names by comma)
    Example: John Doe




    Related Articles
    Related Cases
    Related Content
    Topic Collections
    Related Audio and Videos
    PubMed Articles
    Clinical Trials
    Readers of This Also Read...
    jbjs jobs
    12/22/2011
    ME - Central Maine Medical Center
    12/22/2011
    VA - Charleston Area Medical Center
    12/22/2011
    Virginia - Charleston Area Medical Center