Abstract
Background: The purpose of this study was
to determine whether currently published outcome measures of physical
function would be suitable for use for older adults with a hip fracture.
The measures that were considered were the Musculoskeletal Function
Assessment (MFA) Instrument, the Older Americans' Resources and
Services (OARS) Multidimensional Functional Assessment Questionnaire
physical function subscale, the Toronto Extremity Salvage Score
(TESS), and the Short Form-36 (SF-36). Following suggestions by
an expert panel and patient interviews, the MFA was not tested further.
The TESS was modified and renamed the Lower Extremity Measure (LEM).
Methods: Forty-three community-dwelling patients
with a hip fracture completed the LEM, OARS, and SF-36 in the hospital
so that the prefracture status could be obtained; they were then
followed prospectively at six weeks and at six months. All patients
were interviewed twice in the hospital to assess the reliability
of the LEM (intraclass correlation coefficient = 0.85). To establish
criterion validity, the measures were compared with the Timed Up
and Go (TUG) test at six weeks. We tested a number of hypotheses
to determine construct validity.
Results: Only the LEM scores were significantly
correlated with the TUG scores (r = -0.53, p = 0.03). The LEM scores
were significantly correlated with the SF-36 subscale scores and
the OARS scores. Patients with at least one comorbidity had a lower mean
prefracture LEM score (90.0 ±9.7) than patients
with no comorbidity (96.9 ±8.1) (p = 0.02).
Patients who had used no walking aids before the fracture had a
higher mean prefracture LEM score than those who had used a cane (95.5 ± 5.8 compared with 85.5 ±12.7;
p = 0.0007). Both the LEM and the SF-36 scores changed significantly
between all of the time-periods (p < 0.05). Measures of responsiveness
indicated that the LEM was the best measure for detecting changes
in physical function.
Conclusions: The LEM can detect clinically important
changes in physical function over time in patients with a hip fracture
and would be most useful for clinical trials or cohort studies.
Orthopaedists who are currently utilizing the SF-36 can be reassured
that the physical function subscale is a valid measure for patients
with a hip fracture.
Hip fractures in the elderly are associated with considerable
morbidity and high economic cost. Physical function is one of the
most important outcomes following treatment of a hip fracture. A large
proportion of patients with a hip fracture do not return to prefracture
levels of physical function, and many are unable to return to community-dwelling15. Outcome assessment after treatment
of a hip fracture has traditionally focused on clinical or surgeon-defined
measures of technical success rather than on functional outcome.
As evidence-based medicine and cost containment become increasingly
important, reliable and valid measures to assess the effectiveness
of treatment and rehabilitation from the viewpoint of physical function
become necessary. There is a wide array of available scales for
measuring health outcomes; they may be disease-specific or generic.
Disease or domain-specific measures focus on the specific complaints
that are attributable to a specific diagnosis and therefore are
more likely to reflect clinical changes2.
For example, the Western Ontario and McMaster University Osteoarthritis
Index (WOMAC) was developed to evaluate patients with osteoarthritis of
the knee or hip1. It consists
of three sections: one deals with pain (five questions); one, with
stiffness (two questions); and one, with physical function (seventeen questions).
In contrast, generic measures consider various domains of health,
including physical, mental, and psychosocial, and tend to provide
a broad picture of health across a range of conditions, disease
severities, interventions, and patient characteristics20,21.
A review of the literature reveals a plethora of instruments
to assess the ability to perform activities of daily living (for
example, eating, dressing, bathing, transferring, walking, and shopping)19. All of these instruments are based
on the innovative work of Katz et al.12 and
Lawton and Brody16. Examination
of these scales reveals differences in the time that it takes to
complete them, the activities that are listed, the types of response
options (indicating the level of difficulty or the amount of help
required to do the activity), the scoring procedures (a scale of
0 to 2 or 1 to 5), and the populations studied (stroke patients,
community-dwelling elderly patients, and so on). None of these scales
were specifically developed to assess the physical function of patients
with a hip fracture. In our search for an instrument with which
to evaluate patients with a hip fracture, we were looking for one
that could be easily completed by patients during a telephone interview
and therefore would not require a clinic visit. Such an instrument would
serve as the basis for measuring physical function in outcome studies
involving patients with a hip fracture.
The disease-specific measures that were selected for consideration
in this study were the Musculoskeletal Function Assessment (MFA)
Instrument24; the Older Americans'
Resources and Services (OARS) Multidimensional Functional Assessment Questionnaire
physical function subscale6; and
the Toronto Extremity Salvage Score (TESS)4,
which is a newly validated measure for assessing physical function
in patients with bone sarcoma. We also evaluated a generic health-status
measure, the Short Form-36 (SF-36)20,21.
Each time that a measure is used for a patient group that differs
from the one for which it was created, it is necessary to reestablish
its psychometric properties of reliability and validity28. Reliability refers to the ability
of an instrument to produce consistent results each time that it
is used. Validity is defined as the extent to which an outcome measure
assesses what it claims to measure (for example, physical function)29. Validity can be categorized as
face, criterion, or construct.
The appropriateness of the items within the measure as assessed
by the patient and by experts is often referred to as face or content
validity. Face validity indicates whether the measure appears to be
assessing the desired qualities by determining if the instrument
samples all of the relevant or important content areas28. For example, one would expect that
a scale measuring physical function of the lower extremity would
include questions on walking ability.
To establish criterion validity, one must be able to demonstrate
that the scores obtained with use of the measure are systematically
related to one or more outcome criteria. These outcome criteria
are "gold standards" for the concept being measured. If the findings
obtained with a new instrument correlate strongly with the gold
standard, the new instrument has criterion validity28. One of the problems with using
questionnaires is that it is difficult to obtain true gold standards
for comparison.
To establish construct validity, the outcome measure must relate
to other data or health-status measures in ways that are consistent
with plausible hypotheses.
When an outcome measure is to be used to detect changes over
time, responsiveness becomes important. Responsiveness refers to
the ability of an instrument to measure clinical change8,17. In this paper, we report on the
reliability, validity, and responsiveness of a disease-specific
measure of physical function for patients with a hip fracture.
The Measures
The Musculoskeletal Function Assessment (MFA) Instrument is a
health-status instrument with 100 self-reported items; it is designed
to detect small differences in function among patients with musculoskeletal
disorders of the extremities that are commonly seen in clinical
practice24. There are ten categories
of questions that ask about self-care, sleep/rest, hand/fine-motor
coordination, mobility, housework, employment/work, leisure/recreation,
family relationships, cognition/thinking, and emotional adjustment/coping/adaptation.
All questions require a yes, no, or don't-know response. Each category
score and the total score are standardized on a scale of 0 to 100,
with 0 representing minimum dysfunction and 100 representing maximum
dysfunction.
The Older Americans' Resources and Services (OARS) Multidimensional
Functional Assessment Questionnaire physical function subscale was
developed for population studies of a variety of conditions affecting
elderly adults7. It contains fifteen
questions, thirteen of which begin with "Can you" (use the telephone,
get to places out of walking distance, go shopping for groceries
or clothes, prepare your own meals, do your own housework, take
your own medicine, handle your own money, eat, dress and undress yourself,
take care of your appearance, walk, get in and out of bed, and take
a bath or shower). The response choices are given a score of 2 (without help),
1 (with some help), or 0 (completely unable to do). The last two
questions are "Do you ever have trouble getting to the bathroom
on time?", with responses given a score of 2 (no) or 0 (yes), and
"Is there someone who helps you with such things as shopping, housework,
bathing, dressing, and getting around?", with responses given a
score of 1 (yes) or 0 (no). The overall summary physical function
score6 is calculated as: (total
score/29) ¥ 100.
The Toronto Extremity Salvage Score (TESS) is a disease-specific
measure that was developed for patients undergoing limb preservation
surgery for a tumor of an extremity. The intent of the TESS is to
evaluate a single domain, physical disability, on the basis of patients'
reports of their function4. The
TESS is self-administered, and the questions are framed to ask about
the difficulty experienced when performing an activity. We chose
to evaluate the lower extremity TESS because it focused on the section
of the body most affected by a hip fracture and it was developed
to monitor and evaluate clinically important changes in physical
function over time4. The lower
extremity TESS consists of thirty questions related to physical
function - for example, "Putting on shoes is:" with responses ranging from
"impossible to do (score = 1)" to "not at all difficult (score =
5)." The overall or summary score is calculated as: ([total raw
score - lowest possible total raw score]/raw score range) ¥ 100, where
the raw score is the sum of reported difficulty. If the activity
is not part of the patient's normal activities, the patient is asked
to mark it as not applicable. An example of this is gardening, which may
not be applicable for patients living in an apartment building.
If the task is not applicable, it does not contribute to the overall
score. The lowest possible score is 30, and the maximum possible score
is 100. Higher scores indicate higher levels of function.
The Short Form-36 (SF-36) is a widely used validated generic
health-status measure with eight subscales: physical function, role
limitations due to physical health, role limitations due to emotional
problems, vitality, mental health, social function, bodily pain,
and general health20,21. Responses
vary from dichotomous (yes/no) to a 6-point verbal rating scale.
The SF-36 has no global score, and results are presented as a profile
of scores from each subscale. When the SF-36 physical function subscale
(eight questions) is coded, summed, and transformed, it provides
a value from 0 to 100, with higher scores indicating better function.
Face Validity
The face validity of the TESS, OARS, and MFA was assessed by
convening an expert panel that consisted of three orthopaedic surgeons,
a geriatrician, two physiotherapists, two nurses, and a social worker
who were actively involved in the care of patients with a hip fracture.
We also administered the TESS, the OARS physical function subscale, and
the MFA to fifteen community-dwelling patients with a hip fracture,
ten of whom were returning for a visit to the fracture clinic and
five of whom were on the orthopaedic ward. The purpose of this exercise
was to eliminate any instruments that were difficult or too long
for patients to complete, or that patients did not understand, from
additional validation. The SF-36 has previously been shown to be
applicable for use in elderly populations and thus was not further
assessed for face validity18.
The expert panel did not suggest that any additional activities
be added to the TESS. On the recommendation of the pilot subjects,
"getting on and off the toilet" was added to the list of items and
the question regarding "participating in my usual sporting activities"
was dropped. The pilot subjects thought that question should have
been deleted because they were not engaging in sports and it was
covered by "usual leisure activities." The item "participating in
sexual activity" was considered to be too personal by these elderly
patients and thus was dropped as well. The wording of some of the
questions was also changed to improve clarity. For example, "shopping"
became "food shopping." The items in the modified TESS, now named
the Lower Extremity Measure (LEM), included: getting out of bed,
getting in/out of bathtub, getting on/off toilet, showering, putting
on pants, putting on stockings, putting on shoes, rising from chair,
standing upright, kneeling, getting up from kneeling, bending to
pick something up off of floor, sitting, walking upstairs, walking downstairs,
walking inside, walking outside, walking up/down ramp, getting in/out
of automobile, using public transportation, socializing with friends/family,
doing usual daily activities, gardening/yardwork, preparing own
meals, finishing usual daily activities, spending usual amount of time
doing daily activities, doing light housework, doing heavy housework,
and food shopping. These items were rated as 1 (impossible [completely
unable to do]), 2 (extremely difficult), 3 (moderately difficult),
4 (a little bit difficult), 5 (not at all difficult), or 9 (task
not applicable).
The elderly patients who were interviewed expressed a number
of concerns about the MFA instrument. They found it to be too long,
and they had difficulty with the limitation of the response items
to yes or no. A particularly difficult issue for these elderly patients
had to do with the phrase "because of your injury." If the question
was not true or was not related to their injury, then the appropriate
response was "no." Since these were elderly patients with all types
of comorbidities, they did not perform many of the activities because
of other health problems. Therefore, they had difficulty attributing
a particular limitation only to their hip fracture. This problem
could potentially be avoided by asking about physical function prior to
the fracture. The resulting baseline or preinjury score would then
reflect the effect of preexisting comorbidities. The major limitation
noted with the MFA was the inability to obtain information on prefracture
health status. The questions could not be reworded to collect this
information. Knowledge of the prefracture health status is extremely important
in an elderly population because underlying comorbidity is one of
the best predictors of recovery3,14,22,23,25.
This is not an issue in studies of younger trauma patients because
they are less likely to have had functional limitations prior to
the injury.
The two questions in the mobility section that the patients had
difficulty understanding were "Do you pivot?" and "Do you always
walk with a limp?" The section on work activities was not relevant
to this population. The patients also found some of the questions
(for example, "Do you have less enjoyment in sex?") to be too personal
or inappropriate. Because of these concerns, the MFA was not considered
to be a useful measure for patients with a hip fracture. Therefore,
on the basis of the comments by the expert panel and the patients with
a hip fracture, the OARS and the LEM underwent further validation.
The physical function subscale of the SF-36 was evaluated further
as well. Floor and ceiling effects were also calculated as measures
of face validity. A floor effect is measured as the proportion of
patients who obtain the lowest possible score, and the ceiling effect
is the proportion who obtain the highest possible score28.
Reliability
The OARS and the SF-36 had both previously been subjected to
reliability testing in elderly populations6,20;
therefore, they were not assessed again for reliability in this
study. In order to determine test-retest reliability of the LEM,
all patients were interviewed a second time within forty-eight hours after
the baseline interview. Forty-eight hours was chosen because we
wanted to ensure that the patient would not be discharged from hospital
before the second interview was conducted. The intraclass correlation
coefficient was calculated to assess reliability. The LEM has previously
been shown to have high internal consistency, with a Cronbach alpha
of 0.94 and an intraclass correlation coefficient of greater than
0.87 at multiple time-points for test-retest reliability4.
Criterion Validity
The definition of criterion validity is the correlation of a
scale with some other measure of the trait or disorder under study
that has been used and accepted as the "gold standard" in the field28. Correlating one measure with another
is not straightforward because the new measure was not designed
to precisely replicate existing measures. There are no gold standards
for the outcome measures being assessed in the current study. However,
the accepted theoretical premise underlying functional status is
that it is related to the ability to perform activities of daily
living26. Given that reported
physical function is associated with functional mobility and that
both of these parameters are related to actual physical function, one
approach to validating reported physical function is to correlate
it with functional mobility. We therefore compared the LEM, the
OARS, and the physical function subscale of the SF-36 with an objective
measure of functional mobility, the Timed Up and Go (TUG) test27. This is a quick and practical method
for testing basic mobility maneuvers. It has been used extensively
for frail elderly individuals with a wide variety of physical conditions,
including hip fracture, stroke, Parkinson disease, and cerebellar
degeneration. The test consists of one multiphase task. The patient
begins seated in an armchair, with his or her walking aid if necessary.
He or she is asked to rise from the chair, stand momentarily, walk
to a line on the floor three meters away, turn, return, and turn
around to sit down again. The observer, using a stopwatch, begins
timing on the word "go." The total time in seconds that it takes the
patient to complete the entire task is the outcome measure. The
test-retest reliability is extremely high (intraclass correlation
coefficient = 0.99) as is the inter-rater reliability (Kendall coefficient
W = 0.85)27. The time that the
patient requires to do the test is categorized into one of four
groups that reflect mobility skills. Those who complete the test
in less than ten seconds are freely mobile. Those who complete it
in ten to nineteen seconds can carry out basic transfers independently
(and many can transfer to and from the tub or shower independently) and
can climb stairs and go outside alone. The majority of those who
complete the test in twenty to twenty-nine seconds use a cane and
have some difficulty with stairs. Those who take thirty seconds or
more to complete the test need help with getting in and out of a
chair or have difficulty going outside alone.
The LEM measures functional status, whereas the TUG measures
functional mobility. Since functional mobility does not encompass
all aspects of physical function, the correlation is not expected to
be high. Correlations between 0.2 and 0.6 are typically observed
when there is no true gold standard19.
For the present study, we hypothesized that a correlation of -0.4
between the LEM and the TUG would be sufficient to demonstrate criterion
validity. Patients' scores at six weeks were compared with the time
in seconds on the TUG with use of the Pearson correlation coefficient.
It was calculated that a sample of forty patients was needed to produce
a 95 percent confidence interval of 0.10 to 0.63, assuming a correlation
of 0.4 with an alpha of 0.05 and a beta of 0.215.
Construct Validity
Construct validity is the extent to which an instrument corresponds
to the theoretical concepts (physical function) under study. Construct
validity was assessed with use of four hypotheses: (1) patients
who had comorbidities would have lower prefracture physical function
scores; (2) patients who were able to walk without aids prior to
the fracture would have higher prefracture scores; (3) compared
with the other seven SF-36 subscales, the physical function subscale
would have the highest correlation with the LEM, and the LEM would
also be correlated with the OARS; and (4) the prefracture scores
would be highest, and the six-month scores would be higher than
the six-week scores but lower than the prefracture scores. For the
first, second, and fourth hypotheses, t tests were used to evaluate
differences in scores and a p value of less than 0.05 was set for
the level of significance. For the third hypothesis, we assumed that
a correlation coefficient between 0.40 and 0.60 would be observed
when the physical function measures were compared. It has been suggested
that this level should be expected when scales that measure the
same construct are compared28.
Responsiveness
Responsiveness was quantified with use of the effect size and
the standardized response mean. Effect size is a method of standardizing
the extent of change measured by an instrument to allow comparisons
between instruments. It is calculated as the difference between
the mean prefracture score and the postoperative score, divided
by the standard deviation of the prefracture score. An effect size
of 1.0 is equivalent to a change of one standard deviation in the
sample. Effect sizes of 0.2, 0.5, and 0.8 are typically regarded
as indicating small, medium, and large degrees of change, respectively13. The standardized response mean
is the mean change score divided by the standard deviation of the
change score11. The effect size
may be less accurate because it uses the standard deviation of the
prefracture score, which is a cross-sectional figure, and hence does
not contain information on the accuracy of the instrument in detecting
change over time5. We hypothesized
that, since the items on the OARS concentrate on the more basic
activities of daily living, our study subjects would achieve the maximum
OARS scores early in their recovery and therefore the OARS would
be less responsive to change than the LEM is. For the LEM, the OARS,
and the physical function subscale of the SF-36, t tests were used
to compare change scores between the prefracture and the six-week,
the prefracture and six-month, and the six-week and six-month time-periods.
The level of significance was specified at p < 0.05.
Study Population
The study population was drawn from all patients with a hip fracture
who were treated at a large teaching hospital between July 1, 1995,
and April 30, 1996. The study was approved by the Institutional
Review Board. One hundred and eighty patients are treated at this
institution annually, with more than half coming from long-term-care
facilities. Excluded from the study were patients with diagnosed
cognitive impairment, those who had sustained the hip fracture in
a long-term-care facility, and those who had previously dwelt in
the community but were being discharged to a long-term-care facility
with no possibility of returning to the community. The institution
at which the study was performed has a multidisciplinary hip-fracture-care
team, including a geriatrician, that meets weekly to plan the discharge
of patients. All patients with a hip fracture who have possible
cognitive impairment have the diagnosis confirmed by the geriatrician.
The research assistant identified all eligible patients on the
orthopaedic ward. The hospital chart was examined for exclusion
criteria and to obtain information on comorbid illness, complications, medications,
duration of hospital stay, treatment, discharge destination, social
services ordered (for example, home care), and ambulatory status.
Prior to discharge, informed consent was obtained from eligible
patients.
Forty-three community-dwelling patients with a hip fracture were
interviewed in the hospital to determine their prefracture physical
function and general health status and were followed prospectively
at six weeks and six months to assess criterion and construct validity.
At the six-week follow-up evaluation, the LEM, OARS, and SF-36 were
completed and the TUG was performed in the fracture clinic; at six
months, the LEM, OARS, and SF-36 were completed by telephone interview.
We did not follow patients for one year. Investigators who have
examined physical function following hip fracture have noted that
function improves quickly up until six weeks, improves at a lower
rate between six weeks and six months, and levels off such that
nearly maximum function is usually attained at six months23,25.
Baseline Characteristics
The age (and standard deviation) of the forty-three patients
was 80.9 ±8.3 years, and the mean hospital
stay was 13.9 ±6.2 days. Thirty-five (81
percent) of the patients were female, twenty-two (51 percent) had
an intertrochanteric fracture, nineteen (44 percent) lived alone,
forty (93 percent) owned their own home or apartment, fifteen (35
percent) were married, fifteen (35 percent) were widowed, five (12
percent) were divorced, eight (19 percent) had never been married,
thirty (70 percent) had used no walking aids prior to the fracture,
and eleven (26 percent) had used a cane outdoors.
Thirteen of the forty-three patients did not return to the fracture
clinic for the six-week follow-up visit. The main reason why these
patients did not return was that they were still in a rehabilitation hospital
or they had had to be admitted to a long-term-care facility following
their discharge to home. There were no significant differences between
those lost to follow-up and the respondents with respect to age,
proportion that was female, duration of stay in the hospital, living
alone, marital status, type of housing, use of walking aids before
the fracture, comorbidity, or fracture type. Also, those lost to
follow-up did not have significantly worse prefracture physical
function (mean LEM score, 90.0) than the respondents (mean LEM score,
93.4). There were, however, significant differences at baseline
with regard to some of the scores on the SF-36 subscales. The largest
difference was in the scores for prefracture pain (96.7 for the
respondents compared with 79.0 for the nonrespondents, p = 0.01).
The scores on the general health perception (p = 0.04), vitality
(p = 0.05), and mental health (p = 0.01) subscales were also significantly
different between the respondents and those lost to follow-up (Table I).
Face Validity
The LEM showed an adequate floor effect, with no patients achieving
a score of 0. Prefracture, 30 percent of the patients scored 100,
indicating that they had no difficulty performing any of the activities included
in the LEM. The range of prefracture scores was 58 to 100. At six
weeks and six months, none of the patients scored 100.
Reliability
In our sample at baseline the intraclass correlation coefficient
was 0.85, demonstrating a high level of test-retest reliability
for the LEM.
Criterion Validity
Only the LEM was significantly correlated with the TUG (r = -0.53,
p = 0.03). The correlations of the TUG with the OARS and the SF-36
physical function subscale were -0.35 and -0.26, respectively, and
were not significant.
Construct Validity
All of the tested hypotheses supported the construct validity
of the LEM. Patients who had at least one comorbidity had a lower
mean prefracture LEM score (90.0 ±9.7) than
those with no comorbidity (96.9 ±8.1) (p
= 0.02). Patients who had walked without aids prior to the fracture
scored higher (95.5 ±5.8) than those who
had used a cane (85.5 ±12.7) (p = 0.0007).
The preoperative LEM scores showed the highest correlation with
the physical function subscale of the SF-36 (r = 0.78) and lower correlations
with the other SF-36 subscales. The lowest correlations (r = 0.004
to 0.27) were with the role function-emotional, pain, and mental health
subscales. Table II shows
the mean prefracture and postoperative scores on the LEM, the OARS
physical function subscale, and the different dimensions of the
SF-36. The prefracture scores were the highest, followed by the
six-month and six-week scores, on all three scales.
Responsiveness
The changes in the scores between prefracture and six weeks,
prefracture and six months, and six weeks and six months were significant
(p < 0.05) for the LEM and the physical function subscale of the
SF-36 (Table II).
The extents of the changes in physical function between prefracture
and six weeks, prefracture and six months, and six weeks and six
months were compared by evaluating the effect sizes and the standardized
response means (Table III). As expected, the physical function
scores changed the most between prefracture and six weeks. The results
suggest that the LEM is more responsive or sensitive to changes
in physical function than is the physical function subscale of the
SF-36 or the OARS, as shown by the larger effect sizes and standardized
response means observed across each of the three time comparisons.
Clinically Meaningful Difference
The above results indicate that the LEM is a valid and responsive
measure of physical function. In order to provide information on
what the score means in terms of a clinically meaningful difference,
we reviewed the types and levels of difficulty associated with mean
scores in 10-point increments ranging from approximately 55 to 85. We
wished to determine if the mean scores reflected a decrease in physical
function or a change in the level of independence. We also examined
the mean difficulty rating of each activity across all respondents.
We found that activities that placed considerable stress on the
lower extremity, such as kneeling and walking up and down stairs,
were the first to become difficult. Patients who had difficulty
with these activities scored approximately 75, whereas those who
had such difficulties and also found it extremely difficult to use
public transportation, to shop, or to garden scored approximately 65.
The patients who scored less than 60 had difficulty with every activity
of daily living (Table IV). The two items with a large proportion
of "not applicable" responses were gardening and use of public transportation
because many patients did not engage in these activities on a regular
basis.
We examined three disease-specific instruments, the MFA, the
OARS physical function subscale, and the LEM, and we also examined
the physical function subscale of the SF-36, to determine if they
would be useful outcome measures for assessing the physical function
of patients with a hip fracture. Because the MFA was not practical
and could not be used to obtain baseline measurements, we did not
evaluate that instrument further in our study of community-dwelling
patients with a hip fracture. All of the other measures were quick
and simple to administer and were therefore tested for validity
for use for patients with a hip fracture. We hypothesized that,
although the OARS physical function subscale was easy to administer,
it would not be as responsive to changes in physical function as
the LEM and the SF-36 physical function subscale because the majority
of activities assessed in the OARS centered on basic activities
of daily living, such as bathing and dressing. Functional items
that are not necessarily affected by the lower extremity, such as
using the telephone and taking medicine, would be the same before
and after a fracture.
One of the hypotheses that we used to assess construct validity
was that the LEM would be correlated with the physical function
subscale of the SF-36 in the 0.40 to 0.60 range. However, the correlation was
much stronger (r = 0.78). The reason for this is that five of the
eight questions on the SF-36 subscale relate exclusively to function
of the lower extremity. This then raises the question of why one should
use the LEM for assessing physical function if the two instruments
are providing similar results. It seems that one would have to administer only
the SF-36, and not a disease-specific measure, to assess the physical
function of patients with a hip fracture. However, in addition to
validity, two other factors influence the decision to use an outcome
measure in research studies: whether the measure is responsive and
whether it is relevant.
Our data showed that the OARS subscale was the least responsive
to changes in physical function, followed by the physical function
subscale of the SF-36 and then the LEM. This finding has a number
of important implications for outcome studies. One is from the perspective
of study design. When two groups are being compared, the more responsive
outcome measure reduces the number of patients needed to demonstrate
a clinically relevant difference between the groups. The greater
effect size calculated for the LEM indicates that the LEM requires
a smaller sample size than does the physical function subscale of
the SF-36 to detect the same difference. Our results also suggest
that a 10-point difference in the LEM score may indicate a clinically
meaningful difference in physical function after treatment of a
hip fracture. This finding needs to be tested in additional prospective
studies. Furthermore, because the LEM is a disease-specific instrument
it can provide useful information about an individual patient's
function4. This information could
be used to provide patients with reasonable expectations about their functional
status at, for example, six weeks and six months following surgery.
It could also be used by clinicians to monitor an individual patient's
difficulty with specific activities, and that difficulty could then
be addressed in the rehabilitation process. This may facilitate
improved function at an earlier stage or result in an improvement
in patient outcomes.
One of the limitations of this study was the loss to follow-up
of patients who did not return to the fracture clinic at six weeks
after treatment of the hip fracture. The population that we studied
is a difficult one to track because a hip fracture has a tremendous
impact on an individual's level of physical function and hence on
independence. The main reasons for loss to follow-up were the patient's
inability to cope at home or the discharge of the patient to a rehabilitation
hospital following discharge to home. Many elderly patients who have
had a hip fracture move in with a family member or to a long-term-care
facility permanently and thus become untraceable with use of hospital
records. This loss to follow-up limits the generalizability of the
study findings to the functional recovery of patients with a hip
fracture who are expected to return to community living.
Another explanation for the loss to follow-up is that these patients
have other comorbidities that further limit their ability to recover
from the hip fracture. If those lost to follow-up had lower physical
function scores at six weeks, then differences between our results
at six weeks and at six months and our results at baseline would
have been even greater. When we compared the mean prefracture LEM
scores for respondents and those for patients who had been lost
to follow-up, we found no difference, suggesting that the patients
were the same functionally. However, we did note differences in general
health perception, vitality, pain, and mental health as measured
by the SF-36. This finding suggests that recovering from the hip
fracture was more difficult for those with a less positive outlook,
which in turn resulted in them being less able to cope with the
accompanying loss of function.
The small sample size adds another limitation to the study —
namely, the inability to confirm face validity with use of factor
analysis. Also, the LEM has not been validated for use for patients
with a hip fracture and cognitive impairment, who represent a sizeable
proportion of this patient population. Use of the LEM for these
patients would require asking family members and caregivers to comment
on the physical function of the patient with the hip fracture. How
the caregiver perceives the patient's level of function may be different from
how the patient perceives it.
Discharge patterns for patients with a hip fracture in Canada
differ substantially from those in the United States. The average
duration of hospitalization in the United States is 4.8 days less
than that in Canada9. The United
States Congress Office of Technology Assessment reported that up
to half of American female patients with a hip fracture may spend some
time in a nursing home and 14 percent are still there at one year29. In contrast, in the province of
Ontario, which has approximately ten million residents, the average hospital
stay for a community-dwelling patient with a hip fracture in fiscal
year 1996 was 19.1 days10. Ten
percent were discharged to another hospital, and 17 percent were
discharged to a long-term-care facility without the possibility
of returning home because they required 1.5 to three hours of professional
nursing care per day9. The other
73 percent were discharged to home with home-care services or to
a rehabilitation hospital with the expectation that they would return home.
The latter group is representative of the patients enrolled in the
present study. The LEM may not be applicable to the 14 percent of
hip-fracture patients in the United States who are still in a nursing
home one year after the fracture.
Our study should reassure clinicians that even though the SF-36
is a generic measure and, on the surface, may appear to be less
relevant to patients with a hip fracture, the physical function
subscale is a good measure of lower extremity function in these
patients. One of the main advantages of the LEM compared with the
SF-36 physical function subscale is related to rehabilitation. With
the LEM, the clinician has a better sense of what the score means.
The LEM can provide more detailed information about specific aspects
of a patient's function because the clinician knows exactly which
activities the patient is having difficulty with; thus, use of the
LEM may improve the recovery process. Therefore, it is more useful
at the individual patient level, which is a characteristic of disease-specific
measures. Another advantage of the LEM is its practicality. It has
a reasonable length, and it takes about five minutes to administer.
The language is simple, and it is worded to obtain prefracture functional
status. The LEM would be particularly useful in the research setting.
Because it results in larger effect sizes, it requires a smaller
sample size than, for example, the SF-36, to show the same effect.
In summary, this study supports use of the LEM as a disease-specific
instrument for assessing, and monitoring over time, the physical
function of community-dwelling patients who have undergone surgical
and rehabilitation interventions for a hip fracture. This study
also suggests that an orthopaedist who is currently using the SF-36
in his or her practice can assume that its physical function subscale
is a valid measure of the physical function of patients with a hip
fracture.
Note: The authors thank Dr. Aileen Davis for allowing them to
modify her instrument, the Toronto Extremity Salvage Score.
Bellamy, N.: Pain assessment in osteoarthritis: experience with the
WOMAC osteoarthritis index. Sem. Arthrit. and Rheumat.,18 (Supplement 2): 14-17, 1989.18 (Supplement 2)14
1989
Bombardier, C.; Melfi, C. A.; Paul, J.; Green, R.; Hawker, G.; Wright, J.; and Coyte, P.: Comparison of a generic and a disease-specific measure
of pain and physical function after knee replacement surgery. Med. Care,33 (Supplement 4): 131-AS144, 1995.33 (Supplement 4)131
1995
Borgquist, L.; Ceder, L.; and Thorngren, K. G.: Function and social status 10 years after hip fracture.
Prospective follow-up of 103 patients. Acta Orthop. Scandinavica,61: 404-410, 1990.61404
1990
Davis, A. M.; Wright, J. G.; Williams, J. I.; Bombardier, C.; Griffin, A.; and Bell, R. S.: Development of a measure of physical function for patients
with bone and soft tissue sarcoma. Qual. Life Res.,5: 508-516, 1996.5508
1996
[PubMed]
de Bruin, A. F.; Diederiks, J. P.; de Witte, L. P.; Stevens, F. C.; and Philipsen, H.: Assessing the responsiveness of a functional status measure:
the Sickness Impact Profile versus the SIP68. J. Clin. Epidemiol.,50: 529-540, 1997.50529
1997
[PubMed]
Duke University Center for the Study
of Aging and Human Development: Multidimensional Functional Assessment:
The OARS Methodology, a Manual. Ed. 2, pp. 61-90. Durham, North
Carolina, Duke University, 1978.
Fillenbaum, G. G., and Smyer, M. A.: The development, validity, and reliability of the OARS
Multidimensional Functional Assessment Questionnaire. J. Gerontol.,36: 428-534, 1981.36428
1981
[PubMed]
Guyatt, G.; Walter, S.; and Norman, G.: Measuring change over time: assessing the usefulness of
evaluative instruments. J. Chronic Dis.,40: 171-178, 1987.40171
1987
[PubMed]
Inglehart, J. K.: The American health care system: community hospitals. New England J. Med.,,329: 372-376, 1993.329372
1993
Jaglal, S.: Osteoporotic fractures:
incidence and impact. In Patterns of Health Care in Ontario: Arthritis
and Related Conditions, pp. 143-156. Edited by J. I. Williams and
E. M. Badley. Toronto, Institute for Clinical Evaluative Sciences,
1998.
Katz, J. N.; Larson, M. G.; Phillips, C. B.; Fossel, A. H.; and Liang, M. H.: Comparative measurement sensitivity of short and longer
health status instruments. Med. Care,30: 917-925, 1992.30917
1992
[PubMed]
Katz, S.; Ford, A. B.; Moskowitz, R. W.; Jackson, B. A.; and Jaffe, M. W.: Studies of illness in the aged. The index of ADL: a standardized
measure of biological and psychosocial function. J. Am. Med. Assn.,185: 914-919, 1963.185914
1963
Kazis, L. E.; Anderson, J. J.; and Meenan, R. F.: Effect sizes for interpreting changes in health status. Med. Care,27 (Supplement 3): 178-S189, 1989.27 (Supplement 3)178
1989
Koval, K. J., and Zuckerman, J. D.: Current concepts review. Functional recovery after fracture
of the hip. J. Bone and Joint Surg.,76-A: 751-758, May 1994.76-A751
1994
Lachin, J. M.: Introduction to sample size determination and power analysis
for clinical trials. Controlled Clin. Trials,2: 93-113, 1981.293
1981
[PubMed]
Lawton, M. P., and Brody, E. M.: Assessment of older people: self-maintaining and instrumental
activities of daily living. Gerontologist,9: 179-186, 1969.9179
1969
[PubMed]
Liang, M. H.: Evaluating measurement responsiveness. J. Rheumatol.,22: 1191-1192, 1995.221191
1995
[PubMed]
Lyons, R. A.; Perry, H. M.; and Littlepage, B. N.: Evidence for the validity of the Short-Form 36 questionnaire
(SF-36) in an elderly population. Age and Ageing,23: 182-184, 1994.23182
1994
[PubMed]
McDowell, I., and Newell, C.: Measuring Health:
A Guide to Rating Scales and Questionnaires, pp. 88-91. New York,
Oxford University Press, 1987.
McHorney, C. A.; Ware, J. E., Jr.; Rogers, W.; Raczek, A. E.; and Lu, J. F.: The validity and relative precision of MOS short- and
long-form health status scales and Dartmouth COOP charts. Results
from the Medical Outcomes Study. Med. Care,30 (Supplement 5): 253-MS265, 1992.30 (Supplement 5)253
1992
McHorney, C. A.; Ware, J. E., Jr.; and Raczek, A. E.: The MOS 36-item Short-Form Health Survey (SF-36): II.
Psychometric and clinical tests of validity in measuring physical
and mental health constructs. Med. Care,31: 247-263, 1993.31247
1993
[PubMed]
Magaziner, J.; Simonsick, E. M.; Kashner, T. M.; Hebel, J. R.; and Kenzora, J. E.: Survival experience of aged patients with a hip fracture. Am. J. Pub. Health,79: 274-278, 1989.79274
1989
Marottoli, R. A.; Berkman, L. F.; and Cooney, L. M., Jr.: Decline in physical function following hip fracture. J. Am. Geriat. Soc.,40: 861-866, 1992.40861
1992
[PubMed]
Martin, D. P.; Engelberg, R.; Agel, J.; Snapp, D.; and Swiontkowski, M. F.: Development of a musculoskeletal extremity health status
instrument: the Musculoskeletal Function Assessment Instrument. J. Orthop. Res.,14: 173-181, 1996.14173
1996
[PubMed]
Mossey, J. M.; Mutran, E.;
Knott, K.;, and Craik, R.: Determinants of recovery 12 months after hip fracture:
the importance of psychosocial factors. Am. J. Pub. Health,79: 279-286, 1989.79279
1989
Myers, A. M.: The clinical Swiss army knife. Empirical evidence on the
validity of IADL functional status measures. Med. Care,30 (Supplement 5): 96-MS111, 1992.30 (Supplement 5)96
1992
Podsiadlo, D., and Richardson, S.: The timed "Up and Go": a test of basic functional mobility
for frail elderly persons. J. Am. Geriat. Soc.,39: 142-148, 1991.39142
1991
[PubMed]
Streiner, D. L., and Norman, G. R.:
Health Measurement Scales: A Practical Guide to Their Development
and Use, pp. 106-125. New York, Oxford University Press, 1989.
United States Congress Office of Technology Assessment:
Hip Fracture Outcomes in People Age Fifty and Over - Background
paper OTA-BPH-H-120-P. Washington, D.C., United States Government
Printing Office, July 1994.