0
Articles   |    
Pitfalls of Using Patient Recall to Derive Preoperative Status in Outcome Studies of Total Knee Arthroplasty
Elizabeth A. Lingard, BPhty, MPhil, MPH; Elizabeth A. Wright, PhD; Clement B. Sledge, MD
View Disclosures and Other Information
Investigation performed at Brigham and Women’s Hospital, Boston, Massachusetts
Elizabeth A. Lingard, BPhty, MPhil, MPH
Department of Trauma and Orthopaedic Surgery, The Medical School, University of Newcastle upon Tyne, Newcastle upon Tyne NE2 4HH, United Kingdom. E-mail address: lizlingard@aol.com

Elizabeth A. Wright, PhD
Clement B. Sledge, MD
Robert Brigham Multipurpose Arthritis and Musculoskeletal Diseases Center (E.A.W.) and Department of Orthopedic Research (C.B.S.), Brigham and Women’s Hospital, 75 Francis Street, Boston, MA 02115

One or more of the authors has received or will receive benefits for personal or professional use from a commercial party related directly or indirectly to the subject of this article. In addition, benefits have been or will be directed to a research fund, foundation, educational institution, or other nonprofit organization with which one or more of the authors is associated. Funds were received in total or partial support of the research or clinical study presented in this article. The funding sources were Stryker/Howmedica, Rutherford, NJ, and Limerick, Ireland.

The Journal of Bone & Joint Surgery.  2001; 83:1149-1156 
5 Recommendations (Recommend) | 3 Comments | Saved by 3 Users Save Case

Abstract

Background: It is essential to adjust for the level of preoperative pain and functional status when measuring the outcome of total knee arthroplasty. Some study designs rely on postoperative patient recall to derive preoperative status. In this study, we compared prospectively collected preoperative data with data derived from patient recall of preoperative status three months after total knee arthroplasty.

Methods: Patients were recruited as part of a prospective observational study of the outcome of primary total knee arthroplasty for osteoarthritis at four centers in the United States, six centers in the United Kingdom, and two centers in Australia. Independent research assistants recruited patients and collected data with use of a uniform documentation system preoperatively and three months postoperatively. Preoperative data included the findings of a clinical history and physical examination, demographic information, socioeconomic status, and scores from two health-status instruments: the Western Ontario and McMaster University Osteoarthritis Index (WOMAC) and the Medical Outcomes Study Short Form-36 Health Survey (SF-36). Postoperative data included the WOMAC and SF-36 scores and patient recall of preoperative status on selected items from these health-status measures.

Results: A total of 862 patients were recruited, and recall data were available for 770 patients (89%). The mean age was seventy years (range, thirty-eight to ninety years), and 59% of the patients were women. Comparisons of prospective and recall data on individual pain and function items showed poor-to-fair agreement (weighted kappa, 0.20 to 0.41). Patients recalled significantly more pain than they had reported preoperatively (p < 0.001), but there were random recollection errors for the function items. There was only moderate correlation between the prospective and recalled summary scores for pain (Spearman r = 0.53) and function (Spearman r = 0.48). In addition, 61% of the recalled pain scores and 50% of the recalled function scores differed from the prospective scores by more than 10 points (10% of the total range).

Conclusions: Patients’ recall of preoperative pain and functional status three months after total knee arthroplasty demonstrated only moderate agreement with what the patients had reported prospectively. Researchers who use recall data to derive preoperative status must recognize these limitations when drawing conclusions about the effectiveness of total knee arthroplasty.

Figures in this Article
    Preoperative pain and function are important predictors of outcome after total hip and knee arthroplasty1. Because there are no recognized levels of pain or functional deterioration that are used as precise indications for total knee arthroplasty, patients undergo the operation at varying levels of disease severity2-6. Failure to adjust for preoperative status when evaluating the outcome of total knee arthroplasty may lead to overestimation or underestimation of the effects of the operation7.
    Some study designs, such as cross-sectional and retrospective designs, do not include the collection of preoperative data and often rely on patients’ recall of their preoperative status8. Mancuso and Charlson7 analyzed the accuracy of patient recall of preoperative status after total hip arthroplasty. The patients in that study were recruited preoperatively and then were evaluated at a mean of 2.5 years later, at which time they were asked to recall their preoperative status. The patients were found to have poor recall of pain, function, and impact on health but moderate recall of walking ability. They tended to recall more pain and better function, but the direction and magnitude of recollection error varied for major subgroups of patients.
    In the present study, we aimed to analyze patient recall of preoperative status by surveying patients three months after total knee arthroplasty, with no other interventions performed during the time between the two assessments. We also evaluated whether there was a systematic bias that could be used to adjust recall data in order to accurately derive preoperative pain and functional status. We hypothesized that the patients’ recall of preoperative pain and functional status at three months after the procedure would have weak agreement with the pain and functional status that they had reported preoperatively.
     
    Anchor for JumpAnchor for Jump
    +Fig. 1:Graph showing the percentage agreement between prospective and recalled pain ratings. The asterisk indicates that patients recalled significantly more pain than they had reported preoperatively (p < 0.001, McNemar test).
     
    Anchor for JumpAnchor for Jump
    +Fig. 2:Graph showing the percentage agreement between prospective and recalled functional ratings. The single asterisk indicates that patients recalled significantly less limitation than they had reported preoperatively (p < 0.001, McNemar test). The double asterisk indicates that patients recalled significantly more limitation than they had reported preoperatively (p = 0.009, McNemar test).
     
    Anchor for JumpAnchor for Jump
    +Fig. 3:Graph showing the difference in improvement as reported on the prospective pain and function data compared with that from pain and function data recalled three months after total knee arthroplasty.
     
    Anchor for JumpAnchor for JumpTABLE I:  Prospective and Recall Ratings of Preoperative Pain While Walking on a Flat Surface*
    *The values represent the number of patients, with the percentage given in parentheses (n = 763).
    Prospective Rating of PainRecall Rating of Pain
    NoneMildModerateSevereExtreme
    None0 (0)?0 (0)?0 (0)??2 (0.3)?0 (0)
    Mild6 (0.8)?9 (1.2)21 (2.8)?10 (1.3)?5 (0.7)
    Moderate4 (0.5)17 (2.2)60 (7.9)106 (13.9)16 (2.1)
    Severe1 (0.1)?3 (0.4)36 (4.7)242 (31.7)79 (10.4)
    Extreme0 (0)?1 (0.1)?5 (0.7)?55 (7.2)85 (11.1)
     
    Anchor for JumpAnchor for JumpTABLE II:  Prospective and Recall Ratings of Preoperative Limitation in Walking More Than One Mile*†
    *The values represent the number of patients, with the percentage given in parentheses (n = 741). †1 mi = 1.6 km.
    Prospective Rating of LimitationRecall Rating of Limitation
    Not LimitedLimited a LittleLimited a Lot
    Not limited12 (1.6)14 (1.9)?12 (1.6)
    Limited a little11 (1.5)36 (4.9)?45 (6.1)
    Limited a lot14 (1.9)88 (11.9)509 (68.7)
     
    Anchor for JumpAnchor for JumpTABLE III:  Agreement Between Prospective and Recalled Reponses on Individual Pain and Function Items
    *CI = confidence interval. †The values are given as percentages, with the number of responses that agreed in parentheses. ‡The values are given as percentages, with the number of responses that varied by more than one category in parentheses. §1 mi = 1.6 km. **100 yd = 91.4 m.
    Kappa (95% CI*)Weighted Kappa (95% CI*)Percentage Agreement†Percentage of Responses that Varied by >1 Category‡
    Pain
    Walking (n = 763)0.26 (0.21-0.31)0.37 (0.32-0.42)51.9 (396)6.2 (47)
    Stair-climbing (n = 746)0.29 (0.24-0.35)0.39 (0.34-0.44)53.8 (401)6.0 (45)
    Function
    Vigorous activities (n = 706)0.21 (0.11-0.31)0.20 (0.10-0.31)86.5 (611)4.0 (28)
    Moderate activities (n = 733)0.29 (0.23-0.35)0.31 (0.25-0.37)60.8 (446)3.7 (27)
    Climbing 1 flight of stairs (n = 733)0.27 (0.21-0.33)0.30 (0.24-0.36)58.9 (432)3.7 (27)
    Walking >1 mi§ (n = 741)0.28 (0.21-0.35)0.33 (0.26-0.41)75.2 (557)3.5 (26)
    Walking 100 yd** (n = 730)0.28 (0.22-0.34)0.35 (0.30-0.41)54.4 (397)3.7 (27)
    Bathing and dressing (n = 754)0.36 (0.31-0.42)0.41 (0.35-0.46)61.3 (462)5.3 (40)
     
    Anchor for JumpAnchor for JumpTABLE IV:  Spearman Rank Correlations Between Data Collected Preoperatively and the Prospective and Recall Summary Scores for Pain and Function
    *The correlations were significantly different (p < 0.001).
    PainFunction
    ProspectiveRecallProspectiveRecall
    Knee Society pain score0.40*0.29*
    Knee Society function score0.58*0.43*
    Activity level0.36*0.26*
    Walking distance with support0.49*0.38*
    Walking distance without support0.56*0.43*
     
    Anchor for JumpAnchor for JumpTABLE V:  Spearman Rank Correlation Between Prospective and Recall Summary Scores for Pain and Function, and Percentage of Scores That Differed by More than 10% of the Total Range
    *The correlations were significantly different (p < 0.05).
    Pain Function
    Spearman RPercentage of Scores That Differed by >10% Spearman RPercentage of Scores That Differed by >10%
    All (n = 862)0.53610.4850
    Gender
    Men (n = 351)0.51630.4750
    Women (n = 511)0.54600.4651
    Country
    United Kingdom (n = 429)0.55560.4553
    United States (n = 263)0.49670.4650
    Australia (n = 170)0.56660.5146
    Age
    <75 yr (n = 577)?0.57*61?0.52*49
    75 yr (n = 285) ?0.47*63?0.41*52
    Education
    Less than high school (n = 459)?0.58*570.4650
    High school or more (n = 375)?0.48*680.4950
    SF-36 mental health score at 3 mo
    <60 points (n = 128)0.4960?0.36*59
    60 points (n = 638)0.5462?0.51*49
    WOMAC function at 3 mo
    Worse (n = 89)0.4878?0.29*60
    Same or better (n = 683)0.5359?0.49*49

    Design

    Data for this analysis were obtained as part of the Kinemax Outcomes study, a prospective observational study of the outcome of total knee arthroplasty conducted at four centers in the United States, six centers in the United Kingdom, and two centers in Australia. The appropriate institutional review board or ethical committee approved the study at each of the participating centers. Independent research assistants at the participating sites recruited patients from September 1997 to December 1998.

    Patients

    The patients who were included in the study were scheduled to undergo a primary unilateral total knee arthroplasty for the treatment of osteoarthritis. Patients were excluded if they had had bilateral total knee arthroplasty within the previous twelve months, if they were unable to complete the questionnaire, if they had a history of knee joint infection, or if they had undergone prior knee reconstructive surgery.

    Data Collection

    Independent research assistants recruited eligible patients and used a uniform documentation system to collect data on the clinical history and the results of a physical examination. Patient questionnaires were administered preoperatively (within six weeks before the operation) and three months postoperatively. The lead author (E.A.L.) trained all research assistants to standardize data collection. Data were entered into a single database at the coordinating center.

    Data Elements

    The clinical history was specifically reviewed with regard to previous orthopaedic surgical procedures on the lower limb, pain, and functional ability. The patient questionnaire included specific questions on function, including walking distance and stair-climbing ability. From these data, the Knee Society clinical rating system was used to derive a pain score and a function score9,10. To derive the pain score, the evaluator rates the patient’s knee pain on a single seven-category scale that corresponds to a score between 0 points (severe pain) and 50 points (no pain). The function score allots 50 points for walking distance and 50 points for stair-climbing ability. A score of 100 points indicates that the subject is able to climb stairs normally and able to walk an unlimited distance without an aid. Points are deducted if the patient uses a walking aid.
    The questionnaire also included questions about demographic characteristics, such as age, gender, and race, and socioeconomic variables, such as income and education. Two health-status instruments were used: the Western Ontario and McMaster University Osteoarthritis Index (WOMAC)11, which is a disease-specific health-status instrument that was designed for patients with osteoarthritis of the hip and knee, and the Medical Outcomes Study Short Form-36 Health Survey (SF-36)12-14, which is a general health-status instrument that assesses both the mental and the physical domain of health in several contrasting ways. WOMAC scores were transformed to a scale of 0 to 100 points for each domain (best score, 100 points). Different official versions of the WOMAC were used for patients in Australia, the United States, and the United Kingdom15. The standardized method of calculating the SF-36 domains was used so that each of the eight subscales had a score of 0 to 100 points (best score, 100 points). The SF-36 was first developed from responses of a patient cohort in the United States and has been subsequently validated for use in populations of patients in Australia and the United Kingdom16,17.
    Three months after the knee arthroplasty, the WOMAC and SF-36 were administered again and the patients were also asked to recall their preoperative status on selected items from the WOMAC pain scale and the SF-36 physical function scale. The two items from the WOMAC scale asked the patients to recall how much pain they had had while (1) walking on a flat surface and (2) going up or down stairs. Patients rated their pain with use of a Likert-type scale with five responses: none, mild, moderate, severe, or extreme. The six items from the SF-36 scale asked patients to recall how much limitation they had had during the following activities: (1) vigorous activities, (2) moderate activities, (3) climbing one flight of stairs, (4) walking >1 mi (1.6 km), (5) walking 100 yd (91.4 m), and (6) bathing and dressing. Patients rated their function with use of a Likert-type scale with three responses: not limited at all, limited a little, or limited a lot.

    Analysis of the Data

    Statistical analyses were performed with use of the SAS statistical package (SAS Institute, Cary, North Carolina)18. The kappa statistic was used to measure agreement between individual items. This test evaluates whether the amount of agreement is greater than that expected by chance alone. A kappa coefficient of 1 indicates perfect agreement, and a coefficient of 0 indicates that the responses are completely independent (equal to the agreement expected by chance alone)19. Because disagreements in patients’ recall varied in magnitude, there was a sufficient sample size to calculate a weighted kappa that gives partial credit for discrepant ratings20. A weighted kappa of <0.4 indicates poor agreement; 0.4 to 0.75, moderate-to-good agreement; and >0.75, excellent agreement19.
    The McNemar test of symmetry was used to measure whether disagreement in one direction was equal to disagreement in the other direction; that is, whether patients whose postoperative rating disagreed with their preoperative rating tended to recall more or less pain and/or more or less limitation in functional activities three months after the operation than they had reported preoperatively21.
    In the clinical research literature, WOMAC and SF-36 scores are most commonly reported in the form of summary scales. We therefore sought to determine the extent to which patient recall influences these summary scores. Recalled pain and function summary scores were defined by calculating the means of the recall responses for pain and function items and then transforming them to a 0 to 100-point scale (best score, 100 points). Prospective pain and function summary scores were defined by calculating the means of the preoperative ratings of pain and function for the same items and then transforming them with use of the same algorithm. Summary scores were calculated only if the patient had answered all of the pain items and at least four of the function items. The recalled and prospective summary scores were then correlated and expressed with use of the nonparametric Spearman coefficient. All p values were two-tailed.
    We further analyzed subgroups of patients by age, gender, country, educational level (dichotomized as less than high school or as high school or more), SF-36 mental health scores, and whether patients were better or worse in terms of the WOMAC function score three months after the operation. The WOMAC function score has been shown to be a highly responsive measure for detecting change in functional status after total knee arthroplasty, and the relative efficiency of that measure is greater than that of several traditional measures of surgical outcome1,11. In these subgroups, the strength of correlations between prospective and recalled scores were compared with use of the Fisher test of equality of two correlations for different samples.
    The proportion of scores that varied by more than 10 points (10% of the total range), equivalent to half of one standard deviation, was calculated for all subgroups. Effect size is calculated as the change (or difference) in a score divided by the standard deviation of the score. A difference of half of one standard deviation corresponds to an effect size of 0.5 and is considered to be a large difference22. For the outcome measures used here, the standard deviation was approximately 20 points (possible score range, 0 to 100 points), and a difference of 10 points between the prospective and recalled scores was approximately equivalent to half of one standard deviation. That is, if we were to look for the impact made by an independent variable, we would expect a difference of 10 points to demonstrate a significant difference between groups due to this variable. Additionally, it has been shown that changes of 9 to 12 points on WOMAC scales are perceptible to patients with knee osteoarthritis23. Therefore, scores that differ by this range or more represent a clinically important difference. The prospective and recalled summary scores also were compared in terms of the strength of their correlations with other data that were collected preoperatively, including the Knee Society pain and function scores and the patients’ reports of how far they were able to walk with and without support.
    A total of 862 patients were recruited, and recall data regarding preoperative pain and function were collected from 770 patients (89%) three months after the operation. Of the ninety-two patients who did not complete the questionnaire, twelve had died, five were unable to continue in the study due to other illness, two had had a revision of the knee, eighteen had withdrawn from the study, one had moved away, two were lost to follow-up, ten were unable to attend the three-month assessment, and forty-two attended the assessment and had a clinical examination but did not complete the patient recall questions. The mean age of those who completed the study was seventy years (range, thirty-eight to ninety years), and 59% of the patients were female. Half of the patients were from the United Kingdom, 30% were from the United States, and 20% were from Australia. Fifty-five percent of the patients had less than a high-school education. Only 13% of the patients were still working, and most (70%) were retired. Most (61%) of the patients were married.
    Table I represents the cross-tabulation of the patients’ prospective and recalled ratings of pain while walking on a flat surface. The highlighted diagonal region indicates the proportion of patients whose preoperative and postoperative responses agreed (51.9%), the area above the diagonal indicates the proportion of patients who recalled more pain (31.3%), and the area below the diagonal indicates the proportion of patients who recalled less pain (16.8%).
    Table II represents the cross-tabulation of the patients’ prospective and recalled ratings of functon while walking for a distance of >1 mi. The highlighted diagonal region again represents the proportion of patient responses that agreed (75.2%). It is important to note that because the responses are clustered in the "limited a lot" category, the data in the table are highly unbalanced.
    Overall, there was poor agreement between the prospective and recalled ratings for most items (weighted kappa, 0.20 to 0.39; 95% confidence interval, 0.10 to 0.44); only one item, functional limitation during bathing and dressing, demonstrated fair agreement (weighted kappa, 0.41; 95% confidence interval, 0.35 to 0.46) (Table III). The weighted kappa values for the pain-related items were 0.37 and 0.39, although only 6.2% and 6.0% of the responses varied by more than one category when recalled ratings were compared with prospective ratings. The weighted kappa values for the functional items also indicated poor-to-fair agreement (range, 0.20 to 0.41), and only 3.5% to 5.3% of the responses varied by more than one category when recalled ratings were compared with prospective ratings. For two of the functional items, vigorous activities and walking >1 mi, the percentage agreement was high (86.5% and 75.2%, respectively), even though the weighted kappa value was low (0.20 and 0.33, respectively). This paradox is due to the fact that the data in these tables are highly unbalanced, and therefore the kappa statistic is not the most informative method of analysis.
    Patients whose postoperative rating disagreed with their prospective rating tended to recall more pain than they had reported preoperatively (Fig. 1). The McNemar test indicated that patients recalled significantly more pain for walking on a flat surface (p < 0.001). The recall errors for the functional items were more random, with patients tending to recall less limitation for vigorous and moderate activities and walking >1 mi but more limitation for climbing one flight of stairs and walking 100 yd (Fig. 2). The McNemar test indicated that patients recalled significantly less limitation for walking >1 mi (p < 0.001) but significantly more limitation for walking 100 yd (p = 0.009).
    We explored various reasons for the paradoxical finding that patients recalled less limitation for walking >1 mi and more limitation for walking 100 yd. The preoperative data on walking distance with and without support were correlated much more strongly with the prospective function summary scores than with the recalled function summary scores (Table IV). Preoperatively, only about 15% of the patients could walk >1 mi but 65% could walk >100 yd. Three months postoperatively, 45% of the patients were able to walk >1 mi and 85% were able to walk >100 yd. We also looked at whether their walking distance improved, stayed the same, or got worse and whether this change correlated with changes in their perception of how limited they were preoperatively. This analysis did not enable us to draw conclusions as to why patients’ perceptions of walking limitation were recalled in such a varied fashion.
    These pain and function-related items are rarely reported individually and are more commonly reported as summary scales. Therefore, summary scales were derived for pain and function, and then the prospective summary scores were correlated with the recalled summary scores. There was only moderate correlation between these scores, as indicated by a Spearman correlation coefficient of 0.53 for pain and 0.48 for function (Table V). In addition, 61% of the pain scores and 50% of the function scores varied by >10 points (10% of total range) when the prospective and recalled scores were compared.
    Subgroup analysis demonstrated that patients whose WOMAC function scores had deteriorated at three months after the operation (eighty-nine patients; 10% of those for whom such data were available) had significantly poorer recall of function (p = 0.02). In this subgroup, 78% of the recalled pain scores and 60% of the recalled function scores varied by >10 points from the prospective scores. Patients who were seventy-five years of age or older (285 patients; 33% of those for whom such data were available) had significantly poorer recall of both pain (p = 0.04) and function (p = 0.038). Patients whose educational level was defined as high school or more (375 patients; 44% of those for whom such data were available) also had significantly poorer recall of pain (p = 0.03) (Table V).
    Patients were grouped as having either high or low mental health on the basis of their three-month SF-36 mental health score. A score of £60 points was used to indicate poor mental health; this score is equivalent to the seventy-fifth percentile of a population-based group of patients with a diagnosis of clinical depression24. The mean SF-36 mental health score (and standard deviation) at three months after the operation was 75.6 ± 17.4 points, and 16.7% of the patients had a score of <60 points. The high-mental-health and low-mental-health groups had similar recall of pain, but the low-mental-health group had significantly worse recall of function (p = 0.04). There were no significant differences in the strength of correlations due to gender or country (Table V).
    The pain and function summary scores were also correlated with pain and function data that had been collected preoperatively (Table IV). The Knee Society pain score (the preoperative pain rating assigned by the research assistants) had a significantly stronger correlation with the prospective pain score than with the recalled pain score (p < 0.001). Functional items that were assessed preoperatively, such as walking distance and the Knee Society function score, had significantly stronger correlations with the prospective function score than with the recalled function score (p < 0.001 for all comparisons).
    Changes in functional status (calculated as the three-month score minus the preoperative score) were examined with use of either the prospective or the recalled preoperative status. A higher proportion of patients made greater improvements in both pain and function when the recalled preoperative scores were used (Fig. 3).
    The present study demonstrated poor-to-fair weighted kappa values (=0.41) for all of the individual pain and function items. There was a significant trend for patients to recall more pain, but there was random recollection error for the function items. This result is similar to the findings of Mancuso and Charlson7. The usefulness of the weighted kappa statistic was limited because the data on most of the items that were analyzed in this study were highly unbalanced, leading to a paradox between a high percentage agreement but a low kappa value. This was especially evident in the items concerning vigorous activities and walking >1 mi, for which the percentage agreement was 86.5% and 75.2%, respectively, and the weighted kappa value was 0.20 and 0.33, respectively.
    The advantage of using a kappa statistic to evaluate agreement between prospective and recall data rather than just reporting the percentage agreement (the proportion of values that are an exact match) is that the kappa coefficient adjusts for the amount of agreement expected to occur by chance alone. Patients’ ratings of their preoperative status will always tend to be clustered at the more severe end of the pain scale and the more limited end of the function scale. Therefore, cross-tabulation of prospective and recalled responses of preoperative status will have marginal totals (that is, row and column totals) that are much larger at the more severe end of the table. Consequently, these data will always be unbalanced, making interpretation of the kappa score difficult because the marginal totals have such a strong influence on how this statistic is calculated25. Analysis of the summary scales is more useful for researchers who rely on the use of recall data to derive preoperative status.
    Correlations between prospective and recalled summary scores were only moderate, and a large proportion of scores differed by >10 points (equivalent to more than half of one standard deviation). Additionally, patients who had comparatively worse WOMAC function scores three months after the operation had significantly poorer recall of function. Older patients (seventy-five years of age or older) and patients with low mental health (an SF-36 mental health score of <60 points) also had poorer recall of function. Patients who had completed high school or had more education and older patients had significantly poorer recall of pain.
    When the prospective and recalled preoperative scores were compared with other scores that had been collected preoperatively by the research assistants and from the patients’ self-reports, we found that the prospective scores consistently demonstrated stronger correlation than the recalled scores did. Thus, using retrospective recall of preoperative status to calculate a patient’s change in symptoms or health status is not as accurate as using the differences recorded in a prospective study and is at best only a surrogate for the actual measurement of symptoms before and after treatment.
    Fortin et al.1 found that preoperative pain and function as measured with the WOMAC and SF-36 were strong predictors of outcome after total knee arthroplasty when these instruments were administered again six months after surgery. The respective preoperative scores alone explained 25% of the variation in WOMAC pain scores at six months and 21% of the variation in SF-36 physical function scores at six months. Because of the strong influence of preoperative status on outcome, it is essential to use caution when interpreting the results of studies that rely on recall data to derive preoperative status. Some allowances must be made for the high level of variation to ensure that the results reported are not overestimating or underestimating the effectiveness of total knee arthroplasty.
    Psychometric research has consistently shown that the retrospective recall of a change in symptoms or health status is not as accurate as the difference recorded in a prospective study8. The subject’s memory of his or her health status is required, and this memory is flawed by experiences that have occurred in the interval. However, retrospective recall is a very useful measure when the goal is to assess what the subject believes about the effects of treatment. Patients rate their level of pain and functional disability according to their internal standard. A response shift may occur over time due to a variety of experiences that the patient has had during this interval, and the internal standard that the patient has used previously may be recalibrated26,27. Depending on the research question being studied, recall information can be the most appropriate data on which to base conclusions about treatments.
    Our study had several strengths. The recruitment of a large study cohort enabled us to exclude patients who had had bilateral surgery within the previous twelve months, ensuring that patients’ attention would be focused on the index knee. The large study cohort also enabled us to analyze the different subgroups of patients without loss of statistical power. Patients were reviewed only three months after total knee arthroplasty, which markedly decreased the intervening health issues that might have influenced their responses if they had been examined a year or more later.
    Our study also had some limitations. We asked only limited recall questions at three months after the procedure as we did not wish to overburden patients. Therefore, we lost some of the scaling properties for our summary scores. Also, we did not ask patients how accurately they believed that they could recall their preoperative symptoms. It has been shown that when recall is used it is essential to ascertain how accurately the patients believe that they have recalled their preoperative status8.
    Researchers who use recall data to derive preoperative status must take into account the fact that this is not a direct substitute for prospectively collected data. Because of the variability in recall data, there is the possibility that the effectiveness of total knee arthroplasty may be overestimated or underestimated. In our study, use of the kappa statistic to evaluate the level of agreement between prospectively collected data and recalled preoperative status was of little value because the data were highly unbalanced. To make valid conclusions from their analysis, researchers using the kappa statistic to report on agreement must ensure that the data are balanced.
    Note: The Kinemax Outcomes Group includes William Gillespie, Colin Howie, Ian Annan, Judith Lane (Princess Margaret Rose Hospital, Edinburgh, Scotland); Ian Pinder, David Weir, Karen Bettinson (Freeman Hospital, Newcastle upon Tyne, England); Maurice Needhoff, Roz Jackson (King’s Mill Clinic, Mansfield, England); Tim Wilton, Peter Howard (Derbyshire Royal Infirmary, Derby, England); Ian Forster, Paul Szyprt, Chris Moran, David Whitaker, Mike Bullock, Zena Hinchcliffe (Queen’s Medical Centre, Nottingham, England); Ian Learmonth, John Newman, Chris Ackroyd, George Langkamer, Robert Spencer, Mark Shannon, Evert Smith, John Dixon, Sarah Whitehouse (Avon Orthopedic Centre, Bristol, England); Clement Sledge, Frederick Ewald, Robert Poss, John Wright, Scott Martin, John Kwon, Yvette Valderrama (Brigham and Women’s Hospital, Boston, MA); Steven Harwin, Michael Lichardi (Beth Israel Medical Center, New York, NY); Mark Mehlhoff, Linda Weiler, Tom Cahalan (Iowa Medical Clinic, Cedar Rapids, IA); Richard Cronk, Allyson Sandago (Neuromuscular and Joint Center, Corvallis, OR); Stephen Rackemann, Emma McLaughlin (The Knee Centre, Gold Coast, Australia); and Peter Lewis, Robert Bauze, Jane Clasohm (Queen Elizabeth Hospital, Adelaide, Australia).
    Fortin PR, Clarke AE, Joseph L, Liang MH, Tanzer M, Ferland D, Phillips C, Partridge AJ, Belisle P, Fossel AH, Mahomed N, Sledge CB,Katz JN. Outcomes of total hip and knee replacement: preoperative functional status predicts outcomes at six months after surgery. Arthritis Rheum,1999;42: 1722-8. 421722  1999  [PubMed]
     
    Coyte PC, Hawker G, Croxford R, Attard C,Wright JG. Variation in rheumatologists’ and family physicians’ perceptions of the indications for and outcomes of knee replacement surgery. J Rheumatol,1996;23: 730-8. 23730  1996  [PubMed]
     
    Katz JN, Wright EA, Guadagnoli E, Liang MH, Karlson EW,Cleary PD. Differences between men and women undergoing major orthopedic surgery for degenerative arthritis. Arthritis Rheum,1994;37: 687-94. 37687  1994  [PubMed]
     
    Mancuso CA, Ranawat CS, Esdaile JM, Johanson NA,Charlson ME. Indications for total hip and total knee arthroplasties, Results of orthopaedic surveys. J Arthroplasty,1996;11: 34-46. 1134  1996  [PubMed]
     
    van Walraven CV, Peterson JM, Kapral M, Chan B, Bell M, Hawker G, Gollish J, Schatzker J, Williams JI,Naylor CD. Appropriateness of primary total hip and knee replacements in regions of Ontario with high and low utilization rates. CMAJ,1996;155: 697-706. 155697  1996  [PubMed]
     
    Wright JG, Coyte P, Hawker G, Bombardier C, Cooke D, Heck D, Dittus R,Freund D. Variation in orthopedic surgeons’ perceptions of the indications for and outcomes of knee replacement. CMAJ,1995;152: 687-97. 152687  1995  [PubMed]
     
    Mancuso CA,Charlson ME. Does recollection error threaten the validity of cross-sectional studies of effectiveness?. Med Care,1995;33(4 Suppl): 77-88. 33(4 Suppl)77  1995 
     
    Herrmann D. Reporting current, past, and changed health status. What we know about distortion. Med Care,1995;33(4 Suppl): 89-94. 33(4 Suppl)89  1995 
     
    Insall JN, Dorr LD, Scott RD,Scott WN. Rationale of the Knee Society clinical rating system. Clin Orthop,1989;248: 13-4. 24813  1989  [PubMed]
     
    Lingard E, Katz J, Wright E, Wright R, Sledge CB,Kinemax Outcomes Group. Validity and responsiveness of the Knee Society clinical rating system [abstract]. Arthritis Rheum,2000;43(Suppl 9): 217. 43(Suppl 9)217  2000 
     
    Bellamy N, Buchanan WW, Goldsmith CH, Campbell J,Stitt L. Validation study of WOMAC a health status instrument for measuring clinically important patient relevant outcomes to antirheumatic drug therapy in patients with osteoarthritis of the hip or knee. J Rheumatol,1988;15: 1833-40. 151833  1988  [PubMed]
     
    McHorney CA, Ware JE Jr,Raczek AE. The MOS 36-item Short-Form Health Survey (SF-36): II. Psychometric and clinical tests of validity in measuring physical and mental health constructs. Med Care,1993;31: 247-63. 31247  1993  [PubMed]
     
    McHorney CA, Ware JE Jr, Lu JF,Sherbourne CD. The MOS 36-item Short-Form Health Survey (SF-36): III. Tests of data qualityscaling assumptions, and reliability across diverse patient groups. Med Care,1994;32: 40-66. 3240  1994  [PubMed]
     
    Ware JE,Sherbourne CD. The MOS 36-item Short-Form Health Survey (SF-36). I. Conceptual framework and item selection. Med Care,1992;30: 473-83. 30473  1992  [PubMed]
     
    Bellamy N. WOMAC osteoarthritis index: a user’s guide. London, ON; 1995 
     
    Gandek B, Ware JE Jr, Aaronson NK, Alonso J, Apolone G, Bjorner J, Brazier J, Bullinger M, Fukuhara S, Kaasa S, Leplege A,Sullivan M. Tests of data quality, scaling assumptions, and reliability of the SF-36 in eleven countries: results from the IQOLA Project International Quality of Life Assessment. J Clin Epidemiol,1998;51: 1149-58. 511149  1998  [PubMed]
     
    Gandek B,Ware JE Jr. Methods for validating and norming translations of health status questionnaires: the IQOLA Project approach. International Quality of Life Assessment. J Clin Epidemiol,1998;51: 953-9. 51953  1998  [PubMed]
     
    SAS/STAT/User’s guide, release 6.03. Cary, NC: SAS Institute; 1988 
     
    Fleiss JL. The measurement of interrater agreement. In: Statistical methods for rates and proportions. 2nd ed. New York: John Wiley; 1981. p 212-36 
     
    Soeken KL,Prescott PA. Issues in the use of kappa to estimate reliability. Med Care,1986;24: 733-41. 24733  1986  [PubMed]
     
    BMDP statistical software manual. Vol 10. Berkeley: University of California Press; 1988. p 267-9 
     
    Cohen J. Statistical power analysis for the behavioral sciences. 2nd ed. Hillsdale, NJ: Lawrence Erlbaum; 1988 
     
    Ehrich EW, Davies GM, Watson DJ, Bolognese JA, Seidenberg BC,Bellamy N. Minimal perceptible clinical improvement with the Western Ontario and McMaster Universities osteoarthritis index questionnaire and global assessments in patients with osteoarthritis. J Rheumatol,2000;27: 2635-41. 272635  2000  [PubMed]
     
    Ware JE JrSnow KKKosinski MGandek B. SF-36 Health Survey: manual and interpretation guide. 2nd ed. Boston: The Health Institute, New England Medical Center; 1997. p 10-26 
     
    Feinstein AR,Cicchetti DV. High agreement but low kappa: I. The problems of two paradoxes. J Clin Epidemiol,1990;43: 543-9. 43543  1990  [PubMed]
     
    Howard GS,Dailey PR. Response-shift bias: a source of contamination of self-report measures. J Appl Psychol,1979;64: 144-50. 64144  1979 
     
    Sprangers M,Hoogstraten J. Pretesting effects in retrospective pretest-posttest designs.. J Appl Psychol,1989;74: 265-72. 74265  1989 
     

    Submit a comment

    Topics

    Anchor for JumpAnchor for Jump
    +Fig. 1:Graph showing the percentage agreement between prospective and recalled pain ratings. The asterisk indicates that patients recalled significantly more pain than they had reported preoperatively (p < 0.001, McNemar test).
    Anchor for JumpAnchor for Jump
    +Fig. 2:Graph showing the percentage agreement between prospective and recalled functional ratings. The single asterisk indicates that patients recalled significantly less limitation than they had reported preoperatively (p < 0.001, McNemar test). The double asterisk indicates that patients recalled significantly more limitation than they had reported preoperatively (p = 0.009, McNemar test).
    Anchor for JumpAnchor for Jump
    +Fig. 3:Graph showing the difference in improvement as reported on the prospective pain and function data compared with that from pain and function data recalled three months after total knee arthroplasty.
    Anchor for JumpAnchor for JumpTABLE I:  Prospective and Recall Ratings of Preoperative Pain While Walking on a Flat Surface*
    *The values represent the number of patients, with the percentage given in parentheses (n = 763).
    Prospective Rating of PainRecall Rating of Pain
    NoneMildModerateSevereExtreme
    None0 (0)?0 (0)?0 (0)??2 (0.3)?0 (0)
    Mild6 (0.8)?9 (1.2)21 (2.8)?10 (1.3)?5 (0.7)
    Moderate4 (0.5)17 (2.2)60 (7.9)106 (13.9)16 (2.1)
    Severe1 (0.1)?3 (0.4)36 (4.7)242 (31.7)79 (10.4)
    Extreme0 (0)?1 (0.1)?5 (0.7)?55 (7.2)85 (11.1)
    Anchor for JumpAnchor for JumpTABLE II:  Prospective and Recall Ratings of Preoperative Limitation in Walking More Than One Mile*†
    *The values represent the number of patients, with the percentage given in parentheses (n = 741). †1 mi = 1.6 km.
    Prospective Rating of LimitationRecall Rating of Limitation
    Not LimitedLimited a LittleLimited a Lot
    Not limited12 (1.6)14 (1.9)?12 (1.6)
    Limited a little11 (1.5)36 (4.9)?45 (6.1)
    Limited a lot14 (1.9)88 (11.9)509 (68.7)
    Anchor for JumpAnchor for JumpTABLE III:  Agreement Between Prospective and Recalled Reponses on Individual Pain and Function Items
    *CI = confidence interval. †The values are given as percentages, with the number of responses that agreed in parentheses. ‡The values are given as percentages, with the number of responses that varied by more than one category in parentheses. §1 mi = 1.6 km. **100 yd = 91.4 m.
    Kappa (95% CI*)Weighted Kappa (95% CI*)Percentage Agreement†Percentage of Responses that Varied by >1 Category‡
    Pain
    Walking (n = 763)0.26 (0.21-0.31)0.37 (0.32-0.42)51.9 (396)6.2 (47)
    Stair-climbing (n = 746)0.29 (0.24-0.35)0.39 (0.34-0.44)53.8 (401)6.0 (45)
    Function
    Vigorous activities (n = 706)0.21 (0.11-0.31)0.20 (0.10-0.31)86.5 (611)4.0 (28)
    Moderate activities (n = 733)0.29 (0.23-0.35)0.31 (0.25-0.37)60.8 (446)3.7 (27)
    Climbing 1 flight of stairs (n = 733)0.27 (0.21-0.33)0.30 (0.24-0.36)58.9 (432)3.7 (27)
    Walking >1 mi§ (n = 741)0.28 (0.21-0.35)0.33 (0.26-0.41)75.2 (557)3.5 (26)
    Walking 100 yd** (n = 730)0.28 (0.22-0.34)0.35 (0.30-0.41)54.4 (397)3.7 (27)
    Bathing and dressing (n = 754)0.36 (0.31-0.42)0.41 (0.35-0.46)61.3 (462)5.3 (40)
    Anchor for JumpAnchor for JumpTABLE IV:  Spearman Rank Correlations Between Data Collected Preoperatively and the Prospective and Recall Summary Scores for Pain and Function
    *The correlations were significantly different (p < 0.001).
    PainFunction
    ProspectiveRecallProspectiveRecall
    Knee Society pain score0.40*0.29*
    Knee Society function score0.58*0.43*
    Activity level0.36*0.26*
    Walking distance with support0.49*0.38*
    Walking distance without support0.56*0.43*
    Anchor for JumpAnchor for JumpTABLE V:  Spearman Rank Correlation Between Prospective and Recall Summary Scores for Pain and Function, and Percentage of Scores That Differed by More than 10% of the Total Range
    *The correlations were significantly different (p < 0.05).
    Pain Function
    Spearman RPercentage of Scores That Differed by >10% Spearman RPercentage of Scores That Differed by >10%
    All (n = 862)0.53610.4850
    Gender
    Men (n = 351)0.51630.4750
    Women (n = 511)0.54600.4651
    Country
    United Kingdom (n = 429)0.55560.4553
    United States (n = 263)0.49670.4650
    Australia (n = 170)0.56660.5146
    Age
    <75 yr (n = 577)?0.57*61?0.52*49
    75 yr (n = 285) ?0.47*63?0.41*52
    Education
    Less than high school (n = 459)?0.58*570.4650
    High school or more (n = 375)?0.48*680.4950
    SF-36 mental health score at 3 mo
    <60 points (n = 128)0.4960?0.36*59
    60 points (n = 638)0.5462?0.51*49
    WOMAC function at 3 mo
    Worse (n = 89)0.4878?0.29*60
    Same or better (n = 683)0.5359?0.49*49
    Fortin PR, Clarke AE, Joseph L, Liang MH, Tanzer M, Ferland D, Phillips C, Partridge AJ, Belisle P, Fossel AH, Mahomed N, Sledge CB,Katz JN. Outcomes of total hip and knee replacement: preoperative functional status predicts outcomes at six months after surgery. Arthritis Rheum,1999;42: 1722-8. 421722  1999  [PubMed]
     
    Coyte PC, Hawker G, Croxford R, Attard C,Wright JG. Variation in rheumatologists’ and family physicians’ perceptions of the indications for and outcomes of knee replacement surgery. J Rheumatol,1996;23: 730-8. 23730  1996  [PubMed]
     
    Katz JN, Wright EA, Guadagnoli E, Liang MH, Karlson EW,Cleary PD. Differences between men and women undergoing major orthopedic surgery for degenerative arthritis. Arthritis Rheum,1994;37: 687-94. 37687  1994  [PubMed]
     
    Mancuso CA, Ranawat CS, Esdaile JM, Johanson NA,Charlson ME. Indications for total hip and total knee arthroplasties, Results of orthopaedic surveys. J Arthroplasty,1996;11: 34-46. 1134  1996  [PubMed]
     
    van Walraven CV, Peterson JM, Kapral M, Chan B, Bell M, Hawker G, Gollish J, Schatzker J, Williams JI,Naylor CD. Appropriateness of primary total hip and knee replacements in regions of Ontario with high and low utilization rates. CMAJ,1996;155: 697-706. 155697  1996  [PubMed]
     
    Wright JG, Coyte P, Hawker G, Bombardier C, Cooke D, Heck D, Dittus R,Freund D. Variation in orthopedic surgeons’ perceptions of the indications for and outcomes of knee replacement. CMAJ,1995;152: 687-97. 152687  1995  [PubMed]
     
    Mancuso CA,Charlson ME. Does recollection error threaten the validity of cross-sectional studies of effectiveness?. Med Care,1995;33(4 Suppl): 77-88. 33(4 Suppl)77  1995 
     
    Herrmann D. Reporting current, past, and changed health status. What we know about distortion. Med Care,1995;33(4 Suppl): 89-94. 33(4 Suppl)89  1995 
     
    Insall JN, Dorr LD, Scott RD,Scott WN. Rationale of the Knee Society clinical rating system. Clin Orthop,1989;248: 13-4. 24813  1989  [PubMed]
     
    Lingard E, Katz J, Wright E, Wright R, Sledge CB,Kinemax Outcomes Group. Validity and responsiveness of the Knee Society clinical rating system [abstract]. Arthritis Rheum,2000;43(Suppl 9): 217. 43(Suppl 9)217  2000 
     
    Bellamy N, Buchanan WW, Goldsmith CH, Campbell J,Stitt L. Validation study of WOMAC a health status instrument for measuring clinically important patient relevant outcomes to antirheumatic drug therapy in patients with osteoarthritis of the hip or knee. J Rheumatol,1988;15: 1833-40. 151833  1988  [PubMed]
     
    McHorney CA, Ware JE Jr,Raczek AE. The MOS 36-item Short-Form Health Survey (SF-36): II. Psychometric and clinical tests of validity in measuring physical and mental health constructs. Med Care,1993;31: 247-63. 31247  1993  [PubMed]
     
    McHorney CA, Ware JE Jr, Lu JF,Sherbourne CD. The MOS 36-item Short-Form Health Survey (SF-36): III. Tests of data qualityscaling assumptions, and reliability across diverse patient groups. Med Care,1994;32: 40-66. 3240  1994  [PubMed]
     
    Ware JE,Sherbourne CD. The MOS 36-item Short-Form Health Survey (SF-36). I. Conceptual framework and item selection. Med Care,1992;30: 473-83. 30473  1992  [PubMed]
     
    Bellamy N. WOMAC osteoarthritis index: a user’s guide. London, ON; 1995 
     
    Gandek B, Ware JE Jr, Aaronson NK, Alonso J, Apolone G, Bjorner J, Brazier J, Bullinger M, Fukuhara S, Kaasa S, Leplege A,Sullivan M. Tests of data quality, scaling assumptions, and reliability of the SF-36 in eleven countries: results from the IQOLA Project International Quality of Life Assessment. J Clin Epidemiol,1998;51: 1149-58. 511149  1998  [PubMed]
     
    Gandek B,Ware JE Jr. Methods for validating and norming translations of health status questionnaires: the IQOLA Project approach. International Quality of Life Assessment. J Clin Epidemiol,1998;51: 953-9. 51953  1998  [PubMed]
     
    SAS/STAT/User’s guide, release 6.03. Cary, NC: SAS Institute; 1988 
     
    Fleiss JL. The measurement of interrater agreement. In: Statistical methods for rates and proportions. 2nd ed. New York: John Wiley; 1981. p 212-36 
     
    Soeken KL,Prescott PA. Issues in the use of kappa to estimate reliability. Med Care,1986;24: 733-41. 24733  1986  [PubMed]
     
    BMDP statistical software manual. Vol 10. Berkeley: University of California Press; 1988. p 267-9 
     
    Cohen J. Statistical power analysis for the behavioral sciences. 2nd ed. Hillsdale, NJ: Lawrence Erlbaum; 1988 
     
    Ehrich EW, Davies GM, Watson DJ, Bolognese JA, Seidenberg BC,Bellamy N. Minimal perceptible clinical improvement with the Western Ontario and McMaster Universities osteoarthritis index questionnaire and global assessments in patients with osteoarthritis. J Rheumatol,2000;27: 2635-41. 272635  2000  [PubMed]
     
    Ware JE JrSnow KKKosinski MGandek B. SF-36 Health Survey: manual and interpretation guide. 2nd ed. Boston: The Health Institute, New England Medical Center; 1997. p 10-26 
     
    Feinstein AR,Cicchetti DV. High agreement but low kappa: I. The problems of two paradoxes. J Clin Epidemiol,1990;43: 543-9. 43543  1990  [PubMed]
     
    Howard GS,Dailey PR. Response-shift bias: a source of contamination of self-report measures. J Appl Psychol,1979;64: 144-50. 64144  1979 
     
    Sprangers M,Hoogstraten J. Pretesting effects in retrospective pretest-posttest designs.. J Appl Psychol,1989;74: 265-72. 74265  1989 
     
    Accreditation Statement
    These activities have been planned and implemented in accordance with the Essential Areas and policies of the Accreditation Council for Continuing Medical Education (ACCME) through the joint sponsorship of the American Academy of Orthopaedic Surgeons and The Journal of Bone and Joint Surgery, Inc. The American Academy of Orthopaedic Surgeons is accredited by the ACCME to provide continuing medical education for physicians.
    CME Activities Associated with This Article
    Submit a Comment
    Please read the other comments before you post yours. Contributors must reveal any conflict of interest.
    Comments are moderated and will appear on the site at the discretion of JBJS editorial staff.

    * = Required Field
    (if multiple authors, separate names by comma)
    Example: John Doe




    Related Articles
    Related Cases
    Related Content
    Topic Collections
    Related Audio and Videos
    PubMed Articles
    Clinical Trials
    Readers of This Also Read...
    jbjs jobs
    12/22/2011
    ME - Central Maine Medical Center
    12/22/2011
    VA - Charleston Area Medical Center