0
Scientific Article   |    
The American Academy of Orthopaedic Surgeons Outcomes Instruments Normative Values from the General Population
Frank G. Hunsaker, PhD; Dominic A. Cioffi, MA; Peter C. Amadio, MD; James G. Wright, MD; Beth Caughlin, MS
View Disclosures and Other Information
Investigation performed at the National Research Corporation, Lincoln, Nebraska

Frank G. Hunsaker, PhD
National Research Corporation, 1245 Q Street, Suite 400, Lincoln, NE 68508. E-mail address: fhunsaker@nationalresearch.com.

Dominic A. Cioffi, MA
Cooper Research, 8150 Corporate Park Drive, Cincinnati, OH 45242

Peter C. Amadio, MD
Departments of Health Science Research and Orthopedic Surgery, Mayo Clinic, 200 First Street S.W., Rochester, MN 55905

James G. Wright, MD
Division of Orthopedic Surgery, Hospital for Sick Children, 555 University Avenue, Room S-107, Toronto, ON M5G 1X8, Canada

Beth Caughlin, MS
Experian, 901 West Bond Street, Lincoln, NE 68521-3694

In support of their research or preparation of this manuscript, one of the authors received a grant or outside funding from the Canadian Institutes of Health Research. None of the authors received payments or other benefits or a commitment or agreement to provide such benefits from a commercial entity. No commercial entity paid or directed, or agreed to pay or direct, any benefits to any research fund, foundation, educational institution, or other charitable or nonprofit organization with which the authors are affiliated or associated.

The Journal of Bone & Joint Surgery.  2002; 84:208-215 
5 Recommendations (Recommend) | 3 Comments | Saved by 3 Users Save Case

Abstract

Background: The collection of population-based normative data is a necessary step in the process of standardization of eleven American Academy of Orthopaedic Surgeons (AAOS) musculoskeletal outcomes measures. These data serve as comparative normative scores with which to assess the effectiveness of treatment regimens in clinical practice settings and to study the clinical outcomes of treatment in musculoskeletal research.

Methods: With use of a panel mail methodology, self-reported data on the eleven AAOS musculoskeletal outcomes measures were collected from the general population of the United States.

Results: The overall response rate of 67.4% for the various surveys met study expectations. For the eleven measures, the range of the confidence intervals for the surveys was ±1.6% to ±2.3%, exceeding the ±3% set a priori. With use of the Multitrait/Multi-Item Analysis Program, all of the scales within each of eleven measures exhibited high internal reliability as well as discriminant and convergent validity. Items within each of the scales contributed roughly equal proportions of information to the total scale scores.

Conclusions: All eleven instruments met study expectations for providing reliable and valid normative data for use in clinical and research settings.

Figures in this Article
    The American Academy of Orthopaedic Surgeons (AAOS) Outcomes Studies Committee, in collaboration with the Council of Musculoskeletal Specialty ­Societies and the Council of Spine Societies, developed and pretested eleven functional outcomes assessment measurement instruments1-11. These AAOS measurement instruments are directed at patient populations comprised of individuals with a diagnosis of a specific musculoskeletal disorder or disorders. The instruments were designed to collect patient-based data for use in clinical practice settings to assess the ­effectiveness of treatment regimens and in musculoskeletal ­research to study the clinical outcomes of treatment.
    Although there is some evidence demonstrating the responsiveness and validity of some of these instruments, such data have allowed individual or group scores to be compared only relatively6,7,9,10. The challenge for clinicians is to interpret the meaningfulness of these scoring systems. Some information is necessary to place scores, changes in scores, and scores for different ages and genders within a clinical context. This context is the normal values, collated according to age and gender groupings.
    Reflecting the desire of medical practitioners for more rigorous verification of the effects of various interventions and treatments, normative data comparisons have been incorporated into a variety of treatment-outcome studies6,12-15. A comparison of pretreatment and posttreatment clinical data, with normative data from the general population serving as a reference group, allows medical practitioners to assess whether the findings in individuals before treatment are different from population norms and whether they have returned to normal values after treatment. In considering the clinical importance of treatment-outcome studies, Kendall et al.13 raised two basic questions. First, is the pretreatment-posttreatment change large enough to be considered relevant? Second, can treated individuals be distinguished from "normal individuals" serving as a reference group? (In this context, the operational definition of "normal individuals" is the random sample of individual­s from the general population of the United States from whom scale scores are derived.) Clinical measures address the first question, whereas normative comparisons address the second. Specifically, the method of normative comparison addresses the issues of severity, compared with data from a normal, age and gender-adjusted population, and whether patients treated for specific conditions have returned to, or at least come closer to, normative ranges of functioning.
    The collection of normative data is thus a necessary step in the process of standardization of the AAOS outcome measures. The present study was conducted to collect such normative data from the general population of the United States. These data are intended to establish a national database of ­orthopaedic normative outcomes for the eleven musculoskeletal functional outcomes assessment instruments developed through the AAOS.
     
    Anchor for JumpAnchor for JumpTABLE I:  Lower Extremities Scale Scores
    MeasureFoot and AnkleHip and KneeLower LimbSports Knee
    GlobalComfortCoreRight Hip PainLeft Hip PainRight Knee PainLeft Knee PainCoreCoreGiving-Way of KneePreop. LimitationsCurrent LimitationsPain
    No. of responses1755158819001785177918021799189018271532678642616
    Mean93.1973.8791.0295.5896.0994.2494.6890.5292.7895.6593.8082.0183.66
    Standard deviation12.3329.5114.3512.3412.0813.3313.1613.7812.3515.0517.9826.3825.09
     
    Anchor for JumpAnchor for JumpTABLE II:  Upper Extremities Scale Scores
    MeasureUpper Limb
    FunctionSportsWork
    No. of responses170611131610
    Mean10.10?9.75?8.81
    Standard deviation14.6822.7218.37
     
    Anchor for JumpAnchor for JumpTABLE III:  Pediatric Scale Scores
    MeasureUpper ExtremityMobilitySports/Physical FunctioningPainHappinessGlobal Function
    Parent-Child
    No. of responses174517481762176216171716
    Mean91.9798.3590.2292.4389.8093.31
    Standard deviation11.49?5.6812.3213.7514.10?7.77
    Parent-Adolescent
    No. of responses353834993537352435183482
    Mean98.8299.2293.6688.9681.4795.15
    Standard deviation?5.08?4.5610.9916.6718.01?7.24
    Adolescent
    No. of responses3534348134601043506102
    Mean98.7199.0595.5189.3181.8395.88
    Standard deviation?4.93?4.70?9.7414.7917.59?5.38
     
    Anchor for JumpAnchor for JumpTABLE IV:  Spine Scale Scores
    MeasureCervical SpineLumbar Spine
    Neurological SymptomsPainNeurological SymptomsPain
    No. of responses1790179017801786
    Mean89.3589.0685.7086.74
    Standard deviation18.4415.4822.4017.17
     
    Anchor for JumpAnchor for JumpTABLE V:  SMFA* (General Musculoskeletal) Scale Scores
    *SMFA = Short Musculoskeletal Function Assessment.
    MeasureDaily ActivitiesEmotionArm-HandMobilityFunction IndexBother Index
    No. of responses189118851890188818711734
    Mean11.8520.54?6.0213.6112.7013.77
    Standard deviation19.2018.3812.2618.3115.5918.59
     
    Anchor for JumpAnchor for JumpTABLE VI:  Multitrait Analysis Summary Scores
    *AAOS = American Academy of Orthopaedic Surgeons, SMFA = Short Musculoskeletal Function Assessment, and DASH = Disabilities of the Arm, Shoulder and Hand questionnaire. †All item-to-scale correlations (corrected) were significant (p < 0.05, with use of two standard errors for testing significance). ‡Percentage of item-scale correlations of 0.40.
    AAOS Questionnaire*Number of ScalesRange of Cronbach’s AlphaRange of Inter-item Correlations†Item Internal Consistency‡ (%)Percentage of Item-to-Scale Correlations with 2 Standard Errors
    Cervical spine30.90 to 0.940.45 to 0.71100?98
    Foot and ankle30.81 to 0.960.46 to 0.56?98?99
    Hip and knee60.89 to 0.930.47 to 0.77100?99
    Lower limb20.89 to 0.940.49 to 0.54100100
    Lumbar spine30.91 to 0.940.49 to 0.72100?98
    SMFA (general musculoskeletal)60.88 to 0.980.50 to 0.63100?85
    Adolescent50.81 to 0.910.26 to 0.60?92?95
    Sports knee60.81 to 0.960.44 to 0.85100?93
    Upper limb (DASH)40.94 to 0.980.49 to 0.87100?97
    Parent-adolescent50.81 to 0.890.26 to 0.53?92?95
    Parent-child50.78 to 0.890.26 to 0.56?90?95
    The eleven AAOS instruments are designed to assess the degree to which a patient’s condition or conditions affect his or her physical and emotional functioning, self-image, and symptom status. These measures are self-reported and cover five general areas of musculoskeletal function: the lower extremity, the upper extremity, pediatric musculoskeletal function, the spine, and general musculoskeletal function. Each of the instruments also includes a three-response-option co­morbidity checklist of fourteen conditions or disorders16-18. In addition, seven of the eleven instruments include the Short Form-36 (SF-36) Health Survey questionnaire19. In the current study, normative data were collected for the lower limb, sports knee, foot and ankle, hip and knee, upper extremity (Disabilities of the Arm, Shoulder and Hand [DASH] questionnaire), cervical spine, lumbar spine, and general musculoskeletal function (Short Musculoskeletal Function Assessment [SMFA]) measures. In addition to the adult surveys, three different pediatric surveys were administered. These consisted of a survey for children (completed by parents of children who were between two and ten years old) and two surveys for adolescents (one completed by adolescents between the ages of eleven and eighteen and one, by a parent of that adolescent). Parents who were surveyed were instructed to respond by proxy about a specified child or adolescent. Adolescents who were surveyed were matched to parents receiving proxy surveys and were instructed to respond for themselves. As an inducement to complete the survey, adolescents received a $5.00 payment.
    The study was completed in four major stages: (1) the development of a sampling design and sampling strategy, (2) the design and reconfiguration of the clinical measures for the general population, (3) sample selection and data collection, and (4) data cleaning and data analyses, including application of scoring algorithms provided by the AAOS to generate scaled scores, reliability and validity tests of the data, and application of data reduction techniques to produce standardized scaled scores within specified age, gender, ethnicity, and comorbidity strata.

    Sampling Design

    The sampling methodology for this project was designed to garner data representative of the general population of the United States stratified by the following demographic markers: gender, comorbid conditions, ethnicity, and specific age-groups. To meet this requirement, a panel methodology was selected20,21. The panel was a group of households recruited by National Family Opinion.
    Specifically, the National Family Opinion’s household panel is a reliable sample of more than 475,000 households of individuals and the families with whom they reside that are representative of the United States population. Respondents are matched to the United States Census data with respect to geographical region, age, income, and household size. The panel is managed to maintain additional demographic information such as the gender of household members. Each year, approximately one-third of the National Family Opinion panel is rotated out of participation and new panel participants are recruited to replace those members. Panel households are balanced not only demographically in the four Census regions and nine Census divisions but also in correct proportion by state within each division. Panel members are not compensated for participation. No inducement was offered in this study with the exception of the $5.00 given to the adolescent cohort. Households are identified by frequently used geographic classifications to provide complete sampling and identification flexibility. This approach to sampling, with use of a single-wave mail questionnaire, was deemed appropriate for a number of reasons.
    First, it was assumed that the information provided by the respondents would be valid and reliable. Each of the musculoskeletal measures required that respondents provide ­information about their physical, emotional, and social functioning capabilities as well as symptom status. No items that were intended to elicit highly sensitive or personal information, such as self-disclosure about alcohol or drug misuse, feelings or beliefs about other people, or attitudes about controversial social issues, were included. However, the DASH measure does include one item on sexual function.
    Previous research has shown that including items that ask subjects to disclose information that they would not typically discuss in casual conversation not only reduces response rates dramatically but also calls into question the validity of the responses22. Requiring respondents to reveal this type of information can trigger a "social desirability effect," whereby individuals tend to respond to personal questions with answers that they believe are socially acceptable22. Because no such items are included in the AAOS measures, it was assumed that responses to questions included in those measures would be both candid and truthful.
    Second, the size and scope of the study required a cost-efficient and expeditious methodology. Compared with random samples generated with use of census tract data or other methods, which typically produce low response rates in the 20% to 25% range with use of a multiple-wave mailing strategy, panel studies have been shown to yield response rates of 60% or higher with a single-wave mailing20,23. Additionally, with the exception of comorbid conditions, the demographic markers (age, gender, etc.) required for post-stratification were known for the panel before selection. This information, along with the high response rate associated with panel studies, facilitated sampling, permitted careful targeting of respondents to increase the likelihood that the margin of error set a priori for each measure (±3 points on a 100-point metric) would be met, and ensured acceptable sample representation within strata.
    Finally, by monitoring response rates within strata, decisions regarding additional sampling could be made and executed promptly. This shortened the time required to complete the data collection phase of the study. Data were collected over a six-week period in the spring of 1999.

    Survey Design

    Of primary concern in the survey design process was the reconfiguration of the eleven condition-specific clinical questionnaires into measures appropriate for the general population. Questionnaires were modified to increase the validity, reliability, and utility of the data gathered for this project. For example, the condition-specific sections of the clinical questionnaires were all prefaced with comments that asked patients to answer items in reference to a particular orthopaedic condition for which they were currently being treated or for which they were receiving follow-up care (e.g., "Please answer questions about your foot/ankle which is being treated."). These types of instructions are clearly inappropriate for the general population. Therefore, a number of simple, yet necessary, changes were made in survey layout and wording.
    First, instructions preceding each of the subsections of the clinical measures were amended to elicit general evaluations of the respondent’s hip, knee, ankle, etc. For example, for the lower limb measure, the banner was edited to read: "Please answer the following questions about your lower limb, which includes the hip, leg, foot, and toes. These questions are about how you have felt on average. It is very important that you fill out each item." To increase the likelihood that respondents would respond to all of the items, a motivational sentence stressing the importance of completing each item was included.
    Second, a series of screening and skip questions was added to allow clinicians to identify and select into subcategories those individuals within the sample who had (1) never had a problem with, e.g., a lower limb, (2) had a problem with a lower limb but never sought medical treatment, or (3) had received medical treatment for a lower limb. All scores reported in the present study were derived from the total samples and were not broken out on the basis of the responses to the screening questions.
    A number of skip pattern instructions were included in the screening question described above. The first skip pattern instructed respondents who had indicated that they had received medical treatment to indicate whether they had had surgery for the problem. The respondents who answered that they had a problem but had not sought medical treatment were instructed to mark "All That Apply" to a list of reasons why they had not sought treatment (e.g., "My lower limb problem/injury was not serious enough to seek treatment."). Respondents who had never had a problem were instructed to skip both of those items.
    Taken together, the changes in survey format were relatively minor, and all items included in the clinical measures were retained. The reconfigured questionnaires, formatted for the general population, are essentially isomorphic to the clinical measures.

    Sample Selection and Data Collection

    Selection of the samples for the eleven measures was completed with use of random-sample-selection algorithms. To ensure that the requirements of the study would be met post-stratification, demographic frequency counts were conducted on each sample drawn to determine that adequate representation criteria within strata were met.
    Personalized survey questionnaires were sent by direct mail to sampled panel participants. For each of the eleven measurement instruments, an initial 2920 surveys were mailed. On the basis of previous experiences with the National Family Opinion panel, this mailing was expected to yield a 65% response rate that would render approximately 1898 usable surveys for each of the eleven conditions.
    The decision to set response rate expectations at 65% was based on a number of factors. First, a review of the survey research literature suggested that response bias effects are greatly minimized as response rates exceed 50%20,23. Second, the confidence intervals around survey values are very small given the raw number of returns that were anticipated. Last, previous experience with the National Family Opinion panel suggested that additional surveying of nonresponders would not yield a sufficient number of returns to justify the additional cost.

    Data Analyses

    The first phase of data manipulation involved running frequencies on all data points. This was done in order to identify the percentages of missing or out-of-range values. All out-of-range values were assigned to a missing-value category. Scoring algorithms and validity tests on all scaled items required item completion criteria to be met before scoring could be completed. Each of the AAOS instruments contains items that are scored individually. However, the majority of the questionnaire items are aggregated into conceptually distinct scales designed to measure physical and mental functioning of the patient and symptom status. With the exception of the comorbidities scale (described below) and single-item measures, each scale is composed of the summated mean scores from related items.
    Summative scale scores were calculated only for individuals who answered at least half of the items of a scale (or half plus one for scales with an odd number of items). With the exception of the hip and knee function and limitation scale, the global foot and ankle scale, and the DASH module, items of a scale that were not completed by respondents were calculated into the mean score for that scale. For the DASH instrument, if 10% of the items of any scale were missing, that individual’s scale scores were treated as missing values. If <10% of the items were missing, the rest of the items were scored and averaged, that mean score was imputed to the missing items and rounded to the nearest integer, and the scale score was then calculated. Each scale was calibrated to a 100-point metric scored from 0 to 100. The DASH and SMFA were scored so that 0 represented the least disability or best health and 100, the most disability or worst health. All other scales were scored so that 0 represented the worst health and 100, the best health. Calibrating scores to this metric allowed for direct comparison with SF-36 scores and was generally easier to interpret for diverse audiences19. Tables IthroughV present the scoring for each of the eleven measures, including sample sizes, mean scores, and standard deviations.

    Comorbidity Checklist Scoring

    The comorbidity checklist18 component of the surveys required respondents to provide, for each comorbid condition listed, a yes-or-no response to three questions: (1) "Do you have the problem?", (2) "Do you receive treatment for it?", and (3) "Does it limit your activity?" Each of these responses is then used to calculate a general comorbidity index and three subscales composed of scores from related items. The comorbidity index is calculated as the sum of "yes" responses (x) across all response options divided by the total number of possible "yes" responses, or comorbidity index = x/42 ¥ 100.

    Reliability and Validity Analyses

    Multitrait-scaling techniques were used to assess the reliability and validity of the eleven reconfigured AAOS measures. The Multitrait/Multi-Item Analysis Program (MAP) is a straightforward methodology for scale analysis24,25. In multitrait scaling, scale items are evaluated in terms of four scaling criteria: (1) convergent validity expressed in terms of internal consistency, (2) item discriminant validity, (3) tests for equal item-total correlations, and (4) equal variance test of scale items.
    Multitrait scaling involves examination of item frequencies, item and scale descriptive statistics (e.g., mean, standard deviation, and variance), scale internal consistency estimates, item-scale correlations (corrected for overlap), and correlations among scales. Multitrait scaling goes beyond traditional tests of internal consistency primarily because it tests item discrimination across scales. Thus, items are evaluated with respect to how well they represent a particular construct relative to other constructs.
    In multitrait scaling analysis, related scale items within a measurement instrument are summated. These summated rating scales are then statistically compared with each other in order to test assumptions of validity and reliability within the instrument. Questions are grouped into conceptually related scales on the basis of the underlying concept that they are theoretically intended to measure. In order to preserve as much of the sample for analysis as possible, mean replacement of missing data is performed on a case-by-case basis. If an individual respondent was missing data for less than half of the items within a given scale, that person’s mean score for the items to which he or she responded was substituted for all missing data points within the scale. For individuals for whom more than half of the scale items were missing, the items that were not missing were assigned to a missing-value category and thus were excluded from the analysis. This mean replacement approach was used solely for reliability and validity testing.
    Multitrait scaling analyses were then performed on the eleven survey instruments on the basis of three conceptual models in order to assess (1) item internal consistency validity, (2) item discriminant validity, and (3) internal consistency reliability of the AAOS measures.

    Response Rates

    The range of confidence intervals for the eleven instruments ranged from ±1.6% to ±2.3%, which met the required confidence interval criterion of ±3% established a priori. The response rate overall for the eleven measures was 67.4%, which exceeded the 65% rate expected. The response rates for adult stand-alone surveys were similarly higher than the 65% response rate expected. The parent-adolescent and adolescent survey outgo was larger than that of the stand-alone surveys, with the mailings reflecting the matching of surveys for these two groups. However, the anticipated 65% response rate was not obtained for the parent-child surveys (61.4%) or the parent-adolescent-matched surveys (62.9%). Examination of the response frequencies for these surveys to identify possible response bias effects due to differences in the characteristics of responders and nonresponders did not reveal evidence of systematic biasing effects. Therefore, the decision to not mail additional surveys to nonresponders was based on an assumption that an additional mailing to parents would not yield meaningful gains in terms of statistical power or precision in that the reduction of the margin of error for the parent studies would be <1% (~0.007).

    Scale Scores

    Tables I through V present the sample sizes, mean converted scaled scores, and standard deviations for each of the eleven AAOS measures. Norm-based scores within age, gender, ethnicity, and other relevant demographic markers are available through the AAOS.

    Reliability and Validity Tests

    Alpha Reliability and Item-to-Scale Pearson’s Product-Moment Correlation Coefficients

    It would be very difficult, at best, to report all of the statistics calculated to test the reliability and validity of eleven separate multiple-scale instruments in a traditional results section format. Summary descriptions and statistics for the analyses conducted are described below. Table VI displays the number of scales within each measure, the range of Cronbach’s alpha coefficients for each summated scale within each measure, and the range of Pearson’s product-moment item-to-scale correlation coefficients corrected for overlap.
    Cronbach’s alpha, expressed as a coefficient between 0 and 1, is a measure of the degree to which a set of items (e.g., items in a scale) measure a single unidimensional latent construct, such as stress or pain26. When data have a multidimensional structure, Cronbach’s alpha usually is low. Cronbach’s alpha for scale items that together are tapping a unidimensional construct is high, reflecting high internal consistency among scale items. As is reported below, all of the AAOS instruments exhibited high alpha reliability in that their alpha coefficients all exceeded 0.80 with the exception of one scale in the parent-child cluster.
    The correlation between two variables reflects the degree to which variables are related, that is, the extent to which two variables covary23. The most common measure of correlation is the Pearson product-moment correlation usually designated by the letter "r" and sometimes called "Pearson’s r." Pearson’s correlation values, ranging from +1 to —1, reflect the degree of the linear relationship between two variables. A correlation of +1 means that there is a perfect positive linear relationship between variables.

    Item Discriminant Validity Tests

    In MAP (Multitrait/Multi-Item Analysis Program) scaling, discriminant validity assesses the extent to which correlations of items to their own scales is higher than their correlations to other scales24. In MAP scaling, item internal consistency of 90% is scored as satisfactory. Scaling success is achieved when 80% of the item-to-scale correlations in the total data set and within each individual scale are greater than two standard errors (Table VI).
    For this normative data-collection project, eleven musculoskeletal functional outcomes assessment instruments developed through the AAOS were modified for use in the general population of the United States. As anticipated, each of the AAOS scale scores was uniformly high and skewed toward more values representing good health. Given that the intent of this project was to collect data from the general population, this outcome is wholly in line with project objectives. However, an examination of the mean scores reveals meaningful variability, increasing their utility as comparative measures. When the scores are examined by quartiles and as percentages of the scores at the floor and ceiling ranges, it is evident that the normative scores will be most useful for the assessment of populations whose baseline scores are meaningfully lower (i.e., representing poorer health). In other words, important changes will be more difficult to detect for individuals undergoing treatment or being followed who report scores toward the end of a given scale representing better health.
    A review of the tests reported in the present study indicated that, without exception, the scales in the surveys used in the normative data project met assumptions for reliability and validity8. Additionally, standard deviations of the scores for scale items were roughly equivalent within the scales (i.e., approximately 1.0 for items with five response options), precluding the need to standardize individual items before scaling. Mean scores showed greater variability, which is to be expected for items measuring physical activities (from self-care to strenuous activities) because most populations vary in their underlying ability to perform these activities.
    The item-to-scale scores across all eleven surveys revealed that all but four of the scaled items had higher correlations with their hypothesized scales than they did with competing scales—that is, they demonstrated acceptable discriminant validity scores. The four items that did not have higher correlations with their own scales were on the parent-child survey. Three items asked the parents to rate the degree of ease or difficulty for their child to (1) use a fork and spoon, (2) put on socks, and (3) turn a doorknob. The fourth asked how often the child used an assistive device for walking or climbing.
    All but four items met the 0.40 standard for internal consistency. The four items, also on the parent-child survey, yielded lower-than-desired scores on tests of item internal consistency. Two of the items listed above were again identified as exhibiting less-than-desirable scaling scores, along with two others. One of the other items required parents to estimate how frequently in the past week their child got together with friends to do things, while the other asked them to estimate how often their child participated in gym or recess in the past week.
    All of the items that failed to meet minimum scaling standards are found on one instrument, the pediatric parent-child survey. This is noteworthy because a parent completed this survey for a child two to ten years of age. One might anticipate greater variability for responses that estimate another person’s functional status. Parents are more likely to over­estimate or underestimate their child’s physical functioning capabilities for any number of reasons. Additionally, it is reasonable to expect that young children will vary greatly in their ability to perform even very simple tasks. However, despite these scaling failures, none of the items substantially reduced the reliability or validity scores for the scales in which they were embedded. Each scale in the parent-child survey still met minimum standards for reliability and validity despite the scaling failure of these few items.
    Although a number of differences were found among the various categories of response rates examined, there was no clear-cut evidence to suggest that systematic responder versus nonresponder response bias effects were likely.
    In summary, the present study describes valid normative data for a series of questionnaires that address issues of musculoskeletal health. Such data provide a valuable baseline and permit age and gender adjustments of data collected with use of these questionnaires. The results of studies with use of these questionnaires can then be placed in a firmer context, which should, in turn, allow more clinically relevant conclusions to be drawn.
    Daltroy LH, Johanson MD, Fossel AH, Liang MH. American Academy of Orthopaedic Surgeons lower limb outcome scales: validity, reliability, and sensitivity to change. Unpublished data 
     
    Daltroy LH, Johanson MD, Fossel AH, Liang MH. American Academy of Orthopaedic Surgeons lumbar and cervical spine outcomes scales: validity, reliability, and sensitivity to change. Unpublished data 
     
    Daltroy LH, Liang MH, Fossel AH,Goldberg MJ. The POSNA pediatric mus­culoskeletal functional health questionnaire: report on reliability, validity, and sensitivity to change. Pediatric Outcomes Instrument Development Group. Pediatric Orthopaedic Society of North America. J Pediatr Orthop,1998;18: 561-71. 18561  1998  [PubMed]
     
    Davis AM, Beaton DE, Hudak P, Amadio P, Bombardier C, Cole D, Hawker G, Katz JN, Makela M, Marx RG, Punnett L,Wright JG. Measuring disability of the upper extremity: a rationale supporting the use of a regional outcome measure. J Hand Ther,1999;12: 269-74. 12269  1999  [PubMed]
     
    Engelberg R, Martin DP, Agel J, Obremsky W, Coronado G,Swiontkowski MF. Musculoskeletal Function Assessment instrument: criterion and construct validity. J Orthop Res,1996;14: 182-92.. 14182  1996  [PubMed]
     
    Hudak PL, Amadio PC,Bombardier C. Development of an upper extremity outcome measure: the DASH (disabilities of the arm, shoulder and hand) [corrected]. The Upper Extremity Collaborative Group (UECG). Am J Ind Med,1996;29: 602-8. Erratum: 1996;30:37229602  1996  [PubMed]
     
    Martin DP, Engelberg R, Agel J, Snapp D,Swiontkowski MF. Development of a musculoskeletal extremity health status instrument: the Musculoskeletal Function Assessment instrument. J Orthop Res,1996;14: 173-81. 14173  1996  [PubMed]
     
    McKee MD, Kim J, Kebaish K, Stephen DJ, Kreder HJ,Schemitsch EH. Functional outcome after open supracondylar fractures of the humerus. The effect of the surgical approach. J Bone Joint Surg Br,2000;82: 646-51. 82646  2000  [PubMed]
     
    Navsarikar A, Gladman DD, Husted JA,Cook RJ. Validity assessment of the disabilities of arm, shoulder, and hand questionnaire (DASH) for patients with psoriatic arthritis. J Rheumatol,1999;26: 2191-4. 262191  1999  [PubMed]
     
    Swiontkowski MF, Engelberg R, Martin DP,Agel J. Short musculoskeletal function assessment questionnaire: validity, reliability, and responsiveness. J Bone Joint Surg Am,1999;81: 1245-60. 811245  1999  [PubMed]
     
    Swiontkowski MF, Buckwalter JA, Keller RB,Haralson R. The outcomes movement in orthopaedic surgery: where we are and where we should go. J Bone Joint Surg Am,1999;81: 732-40. 81732  1999  [PubMed]
     
    Blyth FM, Lazarus R, Ross D, Price M, Cheuk G,Leeder SR. Burden and ­outcome of hospitalisation for congestive heart failure. Med J Aust,1997;167: 67-70. 16767  1997  [PubMed]
     
    Kendall PC, Marrs-Garcia A, Nath SR,Sheldrick RC. Normative comparisons for the evaluation of clinical significance. J Consult Clin Psychol,1999;67: 285-99. 67285  1999 
     
    Revicki DA, Wood M, Maton PN,Sorensen S. The impact of gastro­esophageal reflux disease on health-related quality of life. Am J Med,1998;104: 252-8. 104252  1998  [PubMed]
     
    Temple PC, Travis B, Sachs L, Strasser S, Choban P,Flancbaum L. Func­tioning and well-being of patients before and after elective surgical procedures. J Am Coll Surg,1995;181: 17-25. 18117  1995  [PubMed]
     
    Katz JN, Chang LC, Sangha O, Fossel AH,Bates DW. Can comorbidity be measured by questionnaire rather than medical record review?. Med Care,1996;34: 73-84. 3473  1996  [PubMed]
     
    Rochon PA, Katz JN, Morrow LA, McGlinchey-Berroth R, Ahlquist MM, Sarkarati M,Minaker KL. Comorbid illness is associated with survival and length of hospital stay in patients with chronic disability. A prospective comparison of three comorbidity indices. Med Care,1996;34: 1093­-101. 341093  1996  [PubMed]
     
    Sangha O, Stucki G, Fossel AH, Katz JN. A simplified method to assess comorbidity in clinical and health services research of rheumatic diseases. Unpublished data 
     
    Ware JE Jr, Snow KK, Kosinski M, Gandek B. SF-36 health survey: manual and interpretation guide. Boston: The Health Institute, New England Medical Center; 1993. 
     
    Dillman DA. Mail and telephone surveys: the total design method. New York: Wiley; 1978. 
     
    Frankfort-Nachmias C, Nachmias D. Research methods in the social sciences. 5th ed. London: Edward Arnold; 1996 
     
    Crowne DP, Marlowe D. The approval motive; studies in evaluative dependence. New York: Wiley; 1964. 
     
    Rea LM, Parker RA. Designing and conducting survey research: a comprehensive guide. 2nd ed. San Francisco: Jossey-Bass; 1997. 
     
    Campbell DT,Fiske DW. Convergent and discriminant validation by the multitrait-multimethod matrix. Psychol Bull,1959;56(2): 81-105. 56(2)81  1959 
     
    Hays RD, Hayashi T, Carson S, Ware JE. User’s guide for the multitrait analysis program (MAP). Santa Monica, CA: RAND, N-2786-RC; 1988. 
     
    Cronbach LJ. Coefficient alpha and the internal structure of tests. ­Psychometrika,1951;16: 297-334. 16297  1951 
     

    Submit a comment

    Topics

    Anchor for JumpAnchor for JumpTABLE I:  Lower Extremities Scale Scores
    MeasureFoot and AnkleHip and KneeLower LimbSports Knee
    GlobalComfortCoreRight Hip PainLeft Hip PainRight Knee PainLeft Knee PainCoreCoreGiving-Way of KneePreop. LimitationsCurrent LimitationsPain
    No. of responses1755158819001785177918021799189018271532678642616
    Mean93.1973.8791.0295.5896.0994.2494.6890.5292.7895.6593.8082.0183.66
    Standard deviation12.3329.5114.3512.3412.0813.3313.1613.7812.3515.0517.9826.3825.09
    Anchor for JumpAnchor for JumpTABLE II:  Upper Extremities Scale Scores
    MeasureUpper Limb
    FunctionSportsWork
    No. of responses170611131610
    Mean10.10?9.75?8.81
    Standard deviation14.6822.7218.37
    Anchor for JumpAnchor for JumpTABLE III:  Pediatric Scale Scores
    MeasureUpper ExtremityMobilitySports/Physical FunctioningPainHappinessGlobal Function
    Parent-Child
    No. of responses174517481762176216171716
    Mean91.9798.3590.2292.4389.8093.31
    Standard deviation11.49?5.6812.3213.7514.10?7.77
    Parent-Adolescent
    No. of responses353834993537352435183482
    Mean98.8299.2293.6688.9681.4795.15
    Standard deviation?5.08?4.5610.9916.6718.01?7.24
    Adolescent
    No. of responses3534348134601043506102
    Mean98.7199.0595.5189.3181.8395.88
    Standard deviation?4.93?4.70?9.7414.7917.59?5.38
    Anchor for JumpAnchor for JumpTABLE IV:  Spine Scale Scores
    MeasureCervical SpineLumbar Spine
    Neurological SymptomsPainNeurological SymptomsPain
    No. of responses1790179017801786
    Mean89.3589.0685.7086.74
    Standard deviation18.4415.4822.4017.17
    Anchor for JumpAnchor for JumpTABLE V:  SMFA* (General Musculoskeletal) Scale Scores
    *SMFA = Short Musculoskeletal Function Assessment.
    MeasureDaily ActivitiesEmotionArm-HandMobilityFunction IndexBother Index
    No. of responses189118851890188818711734
    Mean11.8520.54?6.0213.6112.7013.77
    Standard deviation19.2018.3812.2618.3115.5918.59
    Anchor for JumpAnchor for JumpTABLE VI:  Multitrait Analysis Summary Scores
    *AAOS = American Academy of Orthopaedic Surgeons, SMFA = Short Musculoskeletal Function Assessment, and DASH = Disabilities of the Arm, Shoulder and Hand questionnaire. †All item-to-scale correlations (corrected) were significant (p < 0.05, with use of two standard errors for testing significance). ‡Percentage of item-scale correlations of 0.40.
    AAOS Questionnaire*Number of ScalesRange of Cronbach’s AlphaRange of Inter-item Correlations†Item Internal Consistency‡ (%)Percentage of Item-to-Scale Correlations with 2 Standard Errors
    Cervical spine30.90 to 0.940.45 to 0.71100?98
    Foot and ankle30.81 to 0.960.46 to 0.56?98?99
    Hip and knee60.89 to 0.930.47 to 0.77100?99
    Lower limb20.89 to 0.940.49 to 0.54100100
    Lumbar spine30.91 to 0.940.49 to 0.72100?98
    SMFA (general musculoskeletal)60.88 to 0.980.50 to 0.63100?85
    Adolescent50.81 to 0.910.26 to 0.60?92?95
    Sports knee60.81 to 0.960.44 to 0.85100?93
    Upper limb (DASH)40.94 to 0.980.49 to 0.87100?97
    Parent-adolescent50.81 to 0.890.26 to 0.53?92?95
    Parent-child50.78 to 0.890.26 to 0.56?90?95
    Daltroy LH, Johanson MD, Fossel AH, Liang MH. American Academy of Orthopaedic Surgeons lower limb outcome scales: validity, reliability, and sensitivity to change. Unpublished data 
     
    Daltroy LH, Johanson MD, Fossel AH, Liang MH. American Academy of Orthopaedic Surgeons lumbar and cervical spine outcomes scales: validity, reliability, and sensitivity to change. Unpublished data 
     
    Daltroy LH, Liang MH, Fossel AH,Goldberg MJ. The POSNA pediatric mus­culoskeletal functional health questionnaire: report on reliability, validity, and sensitivity to change. Pediatric Outcomes Instrument Development Group. Pediatric Orthopaedic Society of North America. J Pediatr Orthop,1998;18: 561-71. 18561  1998  [PubMed]
     
    Davis AM, Beaton DE, Hudak P, Amadio P, Bombardier C, Cole D, Hawker G, Katz JN, Makela M, Marx RG, Punnett L,Wright JG. Measuring disability of the upper extremity: a rationale supporting the use of a regional outcome measure. J Hand Ther,1999;12: 269-74. 12269  1999  [PubMed]
     
    Engelberg R, Martin DP, Agel J, Obremsky W, Coronado G,Swiontkowski MF. Musculoskeletal Function Assessment instrument: criterion and construct validity. J Orthop Res,1996;14: 182-92.. 14182  1996  [PubMed]
     
    Hudak PL, Amadio PC,Bombardier C. Development of an upper extremity outcome measure: the DASH (disabilities of the arm, shoulder and hand) [corrected]. The Upper Extremity Collaborative Group (UECG). Am J Ind Med,1996;29: 602-8. Erratum: 1996;30:37229602  1996  [PubMed]
     
    Martin DP, Engelberg R, Agel J, Snapp D,Swiontkowski MF. Development of a musculoskeletal extremity health status instrument: the Musculoskeletal Function Assessment instrument. J Orthop Res,1996;14: 173-81. 14173  1996  [PubMed]
     
    McKee MD, Kim J, Kebaish K, Stephen DJ, Kreder HJ,Schemitsch EH. Functional outcome after open supracondylar fractures of the humerus. The effect of the surgical approach. J Bone Joint Surg Br,2000;82: 646-51. 82646  2000  [PubMed]
     
    Navsarikar A, Gladman DD, Husted JA,Cook RJ. Validity assessment of the disabilities of arm, shoulder, and hand questionnaire (DASH) for patients with psoriatic arthritis. J Rheumatol,1999;26: 2191-4. 262191  1999  [PubMed]
     
    Swiontkowski MF, Engelberg R, Martin DP,Agel J. Short musculoskeletal function assessment questionnaire: validity, reliability, and responsiveness. J Bone Joint Surg Am,1999;81: 1245-60. 811245  1999  [PubMed]
     
    Swiontkowski MF, Buckwalter JA, Keller RB,Haralson R. The outcomes movement in orthopaedic surgery: where we are and where we should go. J Bone Joint Surg Am,1999;81: 732-40. 81732  1999  [PubMed]
     
    Blyth FM, Lazarus R, Ross D, Price M, Cheuk G,Leeder SR. Burden and ­outcome of hospitalisation for congestive heart failure. Med J Aust,1997;167: 67-70. 16767  1997  [PubMed]
     
    Kendall PC, Marrs-Garcia A, Nath SR,Sheldrick RC. Normative comparisons for the evaluation of clinical significance. J Consult Clin Psychol,1999;67: 285-99. 67285  1999 
     
    Revicki DA, Wood M, Maton PN,Sorensen S. The impact of gastro­esophageal reflux disease on health-related quality of life. Am J Med,1998;104: 252-8. 104252  1998  [PubMed]
     
    Temple PC, Travis B, Sachs L, Strasser S, Choban P,Flancbaum L. Func­tioning and well-being of patients before and after elective surgical procedures. J Am Coll Surg,1995;181: 17-25. 18117  1995  [PubMed]
     
    Katz JN, Chang LC, Sangha O, Fossel AH,Bates DW. Can comorbidity be measured by questionnaire rather than medical record review?. Med Care,1996;34: 73-84. 3473  1996  [PubMed]
     
    Rochon PA, Katz JN, Morrow LA, McGlinchey-Berroth R, Ahlquist MM, Sarkarati M,Minaker KL. Comorbid illness is associated with survival and length of hospital stay in patients with chronic disability. A prospective comparison of three comorbidity indices. Med Care,1996;34: 1093­-101. 341093  1996  [PubMed]
     
    Sangha O, Stucki G, Fossel AH, Katz JN. A simplified method to assess comorbidity in clinical and health services research of rheumatic diseases. Unpublished data 
     
    Ware JE Jr, Snow KK, Kosinski M, Gandek B. SF-36 health survey: manual and interpretation guide. Boston: The Health Institute, New England Medical Center; 1993. 
     
    Dillman DA. Mail and telephone surveys: the total design method. New York: Wiley; 1978. 
     
    Frankfort-Nachmias C, Nachmias D. Research methods in the social sciences. 5th ed. London: Edward Arnold; 1996 
     
    Crowne DP, Marlowe D. The approval motive; studies in evaluative dependence. New York: Wiley; 1964. 
     
    Rea LM, Parker RA. Designing and conducting survey research: a comprehensive guide. 2nd ed. San Francisco: Jossey-Bass; 1997. 
     
    Campbell DT,Fiske DW. Convergent and discriminant validation by the multitrait-multimethod matrix. Psychol Bull,1959;56(2): 81-105. 56(2)81  1959 
     
    Hays RD, Hayashi T, Carson S, Ware JE. User’s guide for the multitrait analysis program (MAP). Santa Monica, CA: RAND, N-2786-RC; 1988. 
     
    Cronbach LJ. Coefficient alpha and the internal structure of tests. ­Psychometrika,1951;16: 297-334. 16297  1951 
     
    Accreditation Statement
    These activities have been planned and implemented in accordance with the Essential Areas and policies of the Accreditation Council for Continuing Medical Education (ACCME) through the joint sponsorship of the American Academy of Orthopaedic Surgeons and The Journal of Bone and Joint Surgery, Inc. The American Academy of Orthopaedic Surgeons is accredited by the ACCME to provide continuing medical education for physicians.
    CME Activities Associated with This Article
    Submit a Comment
    Please read the other comments before you post yours. Contributors must reveal any conflict of interest.
    Comments are moderated and will appear on the site at the discretion of JBJS editorial staff.

    * = Required Field
    (if multiple authors, separate names by comma)
    Example: John Doe




    Related Articles
    Related Cases
    Related Content
    Topic Collections
    Related Audio and Videos
    PubMed Articles
    Stability criteria for nonoperative ankle fracture management.
    Foot & ankle international / American Orthopaedic Foot and Ankle Society [and] Swiss Foot and Ankle Society: Issue date- 2011 Feb
    Clinical Trials
    Readers of This Also Read...
    jbjs jobs
    12/22/2011
    VA - Charleston Area Medical Center
    12/22/2011
    ME - Central Maine Medical Center
    12/22/2011
    Virginia - Charleston Area Medical Center