Design
Data for this analysis were obtained as part of the Kinemax Outcomes
Study, which is a prospective observational study of primary total
knee arthroplasties for the treatment of osteoarthritis at twelve
centers: four in the United States, six in the United Kingdom, and
two in Australia. Independent research assistants at the participating
sites recruited patients from September 1997 to December 1998.
Patients
Patients were included if they had a primary diagnosis of osteoarthritis
and no history of knee implant surgery. They were excluded if they
had a history of knee joint infection or were unable to complete
the questionnaires because of cognitive or language difficulties.
Patients who had had a second, contralateral total knee arthroplasty
within the previous twelve months were also excluded, to ensure
that the twelve-month results reflected the outcome of the index
operation and not a subsequent operation.
Data Collection Procedures
Independent research assistants working at the various sites
recruited eligible patients. All of the research assistants were
trained by the lead author and used a standardized method of collecting
the physical examination data according to a written protocol to
minimize interobserver variability as much as possible. The protocol ensured
that all patients were positioned the same and the same techniques
were used for the physical examination. Written guidelines on how
to rate the patient’s pain level were also given to all
of the evaluators. The independent trained evaluators included physician
assistants, physical therapists, orthopaedic nurses, and clinical
research staff. They recorded a clinical history, carried out a
physical examination of the patients, and administered patient questionnaires.
The questionnaires were made into books so that all patients completed
the questions in the same order; the WOMAC pain and function scales
were followed by the SF-36. The evaluators did not see the results
of the questionnaires prior to carrying out the physical examination. Preoperative
data were collected within six weeks prior to the total knee arthroplasty,
and follow-up data were collected at twelve months following surgery.
Data were entered into a single database at the coordinating center.
Data Elements
Information about the clinical history, including previous orthopaedic
surgery on a lower limb and the current level of pain, was collected.
The Knee Society questionnaire included items on functional ability,
such as walking distance, stair-climbing ability, and use of walking aids.
The physical examination included assessments of the range of motion,
stability, alignment, and muscle power of the knee. These data were
used to calculate a Knee Society Clinical Rating score, which consists
of two scores, a knee score and a function score, ranging from 0
to 100 points (with 100 points being the best score). The function
score allocates points for walking distance and stair-climbing ability
and makes deductions for the use of a walking aid; 100 represents
unlimited walking distance and normal stair-climbing without the
use of an aid.
Fifty of the 100 points in the knee score reflect pain assessment
(a score of 50 points represents no pain). The Knee Society pain
component requires the evaluator to rate the patient’s
knee pain with one question that combines the frequency and severity
of pain and has seven ordinal responses. All evaluators in this
study were given guidelines to follow when rating the patient’s pain,
so there should be some degree of internal and external consistency
within and between evaluators. The other 50 points reflect the clinical
assessment of range of motion, stability, alignment, and muscle
power; 50 points represents at least 0° to 125° of knee flexion
with no active lag, no instability, and normal alignment.
The WOMAC (a disease-specific measure of pain, stiffness, and
function) and the SF-36 (a generic health status measure) were administered
to the patients at each evaluation. We transformed WOMAC scores
to a 0 to 100-point scale for each domain (with 100 points being the
best score). The standardized method of calculating the SF-36 domains
was used so that each of the eight subscales had a score of 0 to
100 points (with 100 points being the best score)18.
The SF-36 also has a question that asks patients to rate their current
general health status compared with their status twelve months ago.
Patients answer on a scale with five responses ranging from "much
better" to "much worse."
At the time of follow-up, patients were asked to answer four
questions on patient satisfaction with one of four responses ranging
from "very satisfied" to "very dissatisfied." A
satisfaction score was calculated as the mean of the responses to
these four questions, transformed to a 0 to 100-point scale (with
100 points being the best score). This scale has been validated
for use for patients treated with total knee arthroplasty19. The quality-of-life question asks
patients how much the quality of their life has changed since their
total knee arthroplasty. Patients answer with one of seven responses
ranging from "the quality of my life is worse" to "more
improvement than I ever dreamed possible."
Demographic data were obtained with a questionnaire at the preoperative
assessment.
Analysis
Statistical analyses were performed with use of the SAS (Statistical
Analysis System) statistical package20. The Knee Society scores
are derived by a clinical scoring algorithm and contain positively
and negatively scored items. Item analysis was carried out by calculating
the correlations between items in each of the scales, which were
reported as Pearson correlation coefficients.
Validity reflects the extent to which the instrument measures
what it is purported to measure. We hypothesized that there would
be a moderately strong correlation (Pearson correlation coefficient,
r > 0.50) between the Knee Society pain component and the
WOMAC pain and SF-36 bodily pain domains as well as between the
Knee Society function score and the WOMAC function and SF-36 physical
functioning domains. All correlations were expressed with use of
the Pearson correlation coefficient. The strength of correlations
was compared with use of the test of equality of two non-independent correlations21.
All p values are two-tailed.
Responsiveness was assessed by determining whether changes in
Knee Society scores correlated with other indicators of change in
the patients’ clinical status; higher correlations indicate
greater responsiveness22. Correlation with patient-assessed indicators
of improvement ensures that the scales are capturing meaningful change.
We hypothesized that the changes in the Knee Society scores would
have moderately strong correlations with the patient satisfaction
scale, the question regarding the perceived improvement in quality
of life, and the SF-36 question regarding the change in health status
over the past twelve months.
Responsiveness was also assessed with use of the standardized
response mean, calculated as the mean change between the preoperative
and twelve-month scores divided by the standard deviation of the
change in score23. Because this
parameter assesses the extent of improvement, we excluded patients
from these analyses if they said that their quality of life was
the same or worse since their total knee arthroplasty. We hypothesized
that there would be large standardized response means (that is, >1.0)
for all Knee Society scores because total knee arthroplasty has
dramatic effects on pain and function. Confidence intervals (95%)
were calculated for this statistic under the assumption that the
differences between the preoperative and twelve-month scores followed
a normal distribution and therefore the standardized response mean
distribution could be approximated by a normal distribution with
mean zero and standard deviation of one over the square root of
the sample size24.
Recruitment
Between September 1997 and December 1998, 1100 patients were
recruited. After exclusion of 238 patients who had had a second,
contralateral total knee arthroplasty within the previous twelve
months, the final sample for this analysis consisted of 862 patients.
Of these, 697 (80.9%) completed the questionnaires and
had a physical examination at twelve months, forty-six (5.3%) only
completed the questionnaires, nineteen (2.2%) only had
a physical examination, and 100 (11.6%) had no available
twelve-month data. No twelve-month data were available for these
100 patients because eighteen (2.1% of 862) had died, twelve
(1.4%) were unable to continue to participate in the study
because of another illness, four (0.5%) had had a revision
of the hip replacement, thirty-two (3.7%) had withdrawn
from the study, four (0.5%) had moved away and were unable
to continue to return for follow-up, sixteen (1.9%) were unable
to attend the twelve-month review for other reasons but remained
in the study, and fourteen (1.6%) were lost to follow-up.
We limited our data set to the 697 patients for whom we had complete
preoperative and twelve-month data. No significant differences in
the demographics were found between the group for whom complete
data were available and the group for whom complete data were not
available.
Patient Features and Outcomes
The mean age of the 697 patients was seventy years (range, thirty-eight
to ninety years), and the majority of patients (411; 58.9%)
were female. Three hundred and twenty-eight patients (47.0%)
were from the United Kingdom; 213 (30.7%), from the United
States; and 156 (22.3%), from Australia. Table I presents the
mean preoperative and twelve-month follow-up Knee Society knee,
pain, and function scores, WOMAC pain and function scores, and SF-36
bodily pain and physical functioning scores. Patients had significant
improvements on all measures (p < 0.001) at the twelve-month
follow-up review.
Item Analysis
There were consistently low correlations among the individual
items of the Knee Society knee score at both the preoperative assessment
(range of Pearson correlation coefficients, r = —0.03
to 0.51) and the twelve-month assessment (r = 0.001 to
0.45) (Table II).
The correlations among the items of the function score were also
low at both assessment times (r = 0.24 to 0.30 at the preoperative
evaluation, and r = 0.30 to 0.46 at twelve months) (Table III).
Validity
The correlations among the Knee Society knee and function scores
and the WOMAC and SF-36 scores are presented in Table IV. The pain
component of the Knee Society knee score is included as a separate
variable. As there is no validated scale of clinical measurements
(such as range of motion, alignment, and stability), we were unable
to test for convergent construct validity of this part of the scale.
The correlation between the Knee Society pain component and the
WOMAC pain score was significantly stronger than that between the
Knee Society pain component and the SF-36 bodily pain score at both
the preoperative and the twelve-month review (r = 0.44
versus 0.31 at the preoperative assessment, and r = 0.68
versus 0.35 at the twelve-month assessment; p value for comparison
of the strength of correlations, <0.001 at both evaluation times).
The Knee Society function score had a significantly stronger
correlation with the SF-36 physical functioning score than it had
with the WOMAC function score at both the preoperative and the twelve-month
review (r = 0.63 versus 0.46 at the preoperative review,
and r = 0.72 versus 0.58 at the twelve-month review; p
value for comparison of the strength of correlations, <0.001
at both evaluation times). The correlations between the Knee Society
scores and the WOMAC or SF-36 scores were just as strong and in
some cases stronger than the correlations between the WOMAC and
SF-36 scores (table IV).
Responsiveness
Changes in the Knee Society knee and function scores, WOMAC pain
and function scores, and SF-36 bodily pain and physical functioning
scores at the twelve-month review were correlated with satisfaction,
perceived improvement in quality of life, and perceived change in
health status (Tables V and VI). There were moderate correlations
between the changes in the Knee Society knee score (r = 0.23
to 0.30) and pain score (r = 0.23 to 0.32) and these measures,
but the change in the WOMAC pain score demonstrated significantly
stronger correlations with all of the measures (r = 0.34
to 0.44; p value for comparison of the strength of correlations, <0.001)
(Table V). The
change in the SF-36 bodily pain score also had stronger correlations
with the perceived improvement in quality of life and the perceived
change in general health status than did the change in the Knee
Society knee or pain scores (p value for comparison of the strength
of correlations, <0.001) (Table V). The change in the Knee Society
function score had moderate correlations with the measures (r = 0.23
to 0.24), but both the WOMAC function and the SF-36 physical functioning
scores demonstrated significantly stronger correlations (r = 0.37
to 0.45; p value for comparison of the strength of correlations, <0.001)
(Table VI).
Standardized response means were calculated for each of the outcome
measures (Table I),
after exclusion of thirty-seven patients who had reported that their
quality of life was the same or worse following the total knee arthroplasty.
According to this analysis, the Knee Society knee score was the
most responsive (standardized response mean = 2.2 compared
with 2.0 for the WOMAC pain score and 1.0 for the SF-36 bodily pain
score) and the Knee Society function score was the least responsive
(standardized response mean = 0.8 compared with 1.4 for
the WOMAC function score and 1.1 for the SF-36 physical functioning
score).
We stratified the analysis to examine the influence on our results
of a disorder of the contralateral knee (such as osteoarthritis
or a condition leading to ligament surgery, meniscal surgery, or
total knee replacement performed more than twelve weeks previously).
All of the preoperative scores for the patients who had a contralateral
knee disorder were worse (by an average of 4 to 8 points) than those
for the patients who did not have a contralateral knee disorder.
At twelve months, the two groups (those with and those without a
contralateral knee disorder) had similar scores on all of the pain scales
except the SF-36, but the function scores were an average of 5,
6, or 7 points lower for the group with a contralateral knee disorder.
The standard response means were therefore slightly better for the
WOMAC and Knee Society pain scales for patients with a contralateral
knee problem but were unchanged for the function scores. The stratified
analysis of the comparison of correlations among the different scoring
systems for construct validity showed the same relationships that were
found when we pooled the data, despite the fact that the strength
of some of the correlations varied. Similarly, the stratified analysis
comparing correlations between patient satisfaction or the patient’s
perceptions of changes in quality of life and changes in the different scoring
systems showed, like the pooled analysis, that changes in the WOMAC
and SF-36 scores had significantly stronger correlations with these
measures than did changes in the Knee Society scores.
he Knee Society Clinical Rating System has commonly been used
in studies of the outcomes of total knee arthroplasty5 and has also been employed to evaluate
the construct validity of other outcome scales25.
The use of a dual rating system that provides separate knee and
function scores was proposed as a means of solving the problem that
arises when deterioration of a patient’s general health
or other comorbid conditions influence their functional status while
the state of the knee following surgery remains excellent26. However, the Knee Society Clinical
Rating System has not been validated. In our study, we attempted
to remove observer bias by using independent trained evaluators to
collect the data, and we aimed to validate the Knee Society system
by comparing it with outcome measures (the WOMAC8,14,27 and
the SF-3611-13) that had undergone
rigorous psychometric validation. We found poor correlation among
the items of the Knee Society knee score, but the knee score had
good convergent construct validity with the WOMAC pain score and was
a responsive outcome measure for patients undergoing total knee
arthroplasty. There was also poor correlation among the items of
the Knee Society function score, but the function score had good
convergent construct validity with the SF-36 physical functioning
score. It was less responsive than the WOMAC function or SF-36 physical
functioning scores for measuring functional outcomes following total
knee arthroplasty.
Items were chosen for the Knee Society knee and function scales,
and the decision was made to report the knee and functional ratings
separately, through the consensus of the Knee Society. There are
no available data indicating how the individual items were developed
and reduced. The face validity of the Knee Society Clinical Rating
System is questionable, as patients were not included in the item-selection
process and the number of selected items is limited. Item analysis
demonstrated poor correlation among the individual items of the
knee score, suggesting that a good score on one part of the assessment
may not reflect a good score on others (Table II). This makes interpretation of
the final score difficult. For example, a knee score of 80 points
may be given to a patient who has no pain, 0° to 25° of knee flexion,
normal alignment, and no instability, or it may be given to a patient
who has mild or occasional pain on walking and stair-climbing, 0°
to 130° of knee flexion, normal alignment, and no instability. Clearly,
these patients had very different results. In addition, we found
that, although the correlations among the items of the function
score were slightly stronger, they were still weak (Table III).
As the Knee Society scores are calculated with use of a clinical
scoring algorithm that includes both positively and negatively scored
items, it is inappropriate to test for internal consistency of these
scores. In comparison, the WOMAC and SF-36 scores are easier to
interpret because there is high internal consistency and strong correlations
among items. Therefore, a patient with a WOMAC pain score of 50
points can be assumed to have, on average, moderate pain with activities.
Similarly, a patient with an SF-36 physical functioning score of
50 points can be assumed to have, on average, little limitation
with most activities.
Construct validity indicates whether the instrument correlates
with other measures or attributes that have an established relationship
with the domain of interest28.
The convergent construct validity of the Knee Society pain and function
scores was established by the finding that they had modest correlations
with the analogous domains of the WOMAC and SF-36 scales (Table IV). Convergent
construct validity is demonstrated if the correlation between the
scores of two different instruments measuring the same health dimensions
is positive and appreciably greater than zero29.
McDowell and Newell30 reported,
in a review of rating scales and questionnaires, that correlation
coefficients for convergent construct validity often fall between
0.20 and 0.60 and are only rarely greater than 0.70. As there is
no gold-standard outcomes measure for total knee arthroplasty, the convergent
construct validity of the Knee Society system must be tested against
other validated instruments, such as the WOMAC and SF-36. The strength
of the correlations that we report here are within the ranges outlined by
McDowell and Newell and are also similar to those reported in other
studies validating outcome measures for hip and knee arthritis31,32.
The pain component of the Knee Society knee score was included
as a separate variable to assess its association with the WOMAC
and SF-36 pain scores. At both evaluation times, the correlation
between the Knee Society and WOMAC pain scores (r = 0.44
preoperatively and r = 0.68 at twelve months) was stronger
than that between the Knee Society and SF-36 pain scores (r = 0.31
preoperatively and r = 0.35 at twelve months) (Table IV). This observation
was expected as the Knee Society pain scale was designed for patients
undergoing total knee arthroplasty and the WOMAC pain scale was designed
for patients with osteoarthritis of the hip or knee, which was the
underlying joint disease in all the patients in our cohort. Conversely,
the SF-36 bodily pain scale is a much more general measure of pain.
The correlation between the Knee Society function and SF-36 physical
functioning scores (r = 0.63 preoperatively and r = 0.72
at twelve months) was stronger than that between the Knee Society
function and WOMAC function scores (r = 0.46 preoperatively
and r = 0.58 at twelve months) at both assessment times
(Table IV).
One reason for this finding may be the items selected for these
scores. The questions on the Knee Society function scale ask only
about walking distance, stair-climbing ability, and use of a walking
aid. Half of the items on the SF-36 physical functioning score are
devoted to walking distance and stair-climbing ability, whereas
the WOMAC function score has more varied items.
Using standardized response means to assess responsiveness, we
found that the Knee Society knee score and the WOMAC pain score
were more sensitive for detecting change over time than was the
SF-36 bodily pain score (Table I). We restricted these analyses
to patients who reported improvement to ensure that the changes
were clinically meaningful. The lack of responsiveness of the SF-36 bodily
pain score may be attributed to the fact that it is too generic
to be sensitive enough to detect change due to total knee arthroplasty.
Kreibich et al.7, in their study
of different measures of outcome after total knee arthroplasty,
found that the WOMAC and Knee Society scores were more responsive
with regard to detecting change in patient status at three and twelve months
than were the SF-36 scores and simple functional tests such as the
six-minute walk and thirty-second stair climb.
Responsiveness was also assessed in terms of patient satisfaction
and perceived improvement in quality of life and change in general
health status (Tables V and VVI). Correlations between changes
in the scores of the different systems and these items allowed us
to ascertain which scales best capture patient-centered measures
of improvement. In our analysis, the WOMAC pain and function and
SF-36 physical functioning scores were significantly more responsive
(p value for the comparison of the strength of correlations, <0.001)
than the Knee Society scores in terms of patient satisfaction and perceived
improvement in quality of life; this finding indicated that these
scores better reflect changes that are most important to patients.
Our study had several strengths. It was performed at multiple
sites in the United Kingdom, United States, and Australia, which
included both orthopaedic departments in large academic referral
centers and smaller private clinics, enhancing the generalizability
of the results. Each site had an independent trained evaluator to
carry out the physical examinations and to administer the questionnaires,
which eliminated observer bias. A high proportion of eligible patients
were recruited, so there was a large sample for the preoperative
assessment, and 80.9% were evaluated at twelve months.
One limitation of the study is that, because of the geographic dispersion of
the sites, we did not test the interrater reliability of the Knee
Society Clinical Rating System. Another limitation may be that the
order in which the questionnaires were administered may have influenced
the responses.
In conclusion, the use of patient-reported outcome measures for
assessing the outcomes of total knee arthroplasty has been emphasized
in the orthopaedic literature over the past ten years. However,
previous research relied on knee rating systems such as the Knee
Society Clinical Rating System, even in the absence of validation
studies. Our results showed low correlations among the items of
both the knee and the function score of the Knee Society Clinical
Rating System, making interpretation of overall scores difficult.
The knee and function scores demonstrated convergent construct validity
with the analogous domains of the WOMAC and SF-36. The Knee Society
knee score was a responsive instrument for assessing the outcomes
of total knee arthroplasty, but the Knee Society function score
was not. The WOMAC and SF-36 have high internal consistency and
are more responsive measures of the outcomes of total knee arthroplasty.
We concluded that, as they are less labor-intensive for researchers
to use and their use removes observer bias from the study design,
they are preferable measures for outcome studies of knee arthroplasty.