To The Editor:
We read with interest the article "Stulberg Classification System
for Evaluation of Legg-Calvç¬erthes Disease: Intra-Rater and Inter-Rater
Reliability (81-A: 1209-1216, Sept. 1999)," by Neyt et al. However,
we were concerned with regard to the authors' interpretation of
the classification and the effect that this may have had on the
reliability coefficients obtained.
The Stulberg classification system2 characterizes
the changes present in the affected hip at skeletal maturity and
also defines the prevalence of these changes in relation to the
severity of disease involvement and age. The authors state that,
in the Stulberg classification, there is no definition given for
a flat femoral head, an abnormally steep acetabulum, or a short femoral
neck. While we agree with this criticism regarding the flat femoral
head, we find the authors' definition difficult to understand, and
they give no evidence regarding the percentage of the femoral head
that is flat or the occurrence of long-term degenerative changes.
However, Stulberg et al.2 were
clear in their definitions of the other two changes. They described
the presence of a shorter-than-normal femoral neck and an abnormally steep
acetabulum in comparison with the normal side. These are not the
definitions that Neyt et al. used, and their algorithm depends on
these findings to distinguish between class-IV and class-V outcomes.
Hence, the most common misinterpretations reported by the authors
are, not surprisingly, between classes IV and V, and this has implications
for the reported reliability coefficients.
Finally, the authors state that there is considerable variability
in reported series regarding the prevalence of class-IV and class-V
outcomes, and they consider that this is due not only to treatment
but also to the classification system. They make no mention of the
ages of the populations studied. Older populations have a greater prevalence
of class-V outcomes since there is less time for acetabular remodeling.
The converse is true in younger populations, where class-IV outcomes
are more common. This feature was also clearly shown in the work
of Stulberg et al.
J. Chell, F.R.C.S.
M. J. Flowers, F.R.C.S.
Corresponding author: J. Chell, F.R.C.S.
Department of Orthopaedics, Sheffield Children's Hospital
Western Bank, Sheffield S10 2TH, United Kingdom
S. L. Weinstein, L. A. Dolan, and K. F. Spratt reply:
We would like to thank Mr. Chell and Mr. Flowers for their comments
on our article concerning the reliability of classification with
use of the Stulberg system. In response, we would like to reemphasize
that it was not our intent to propose new definitions for general
use or to suggest definitions based on their behavior as long-term predictors
but only to demonstrate that, in the face of nonexistent or unclear
definitions, clinicians and researchers will rely on their own nonstandardized
impressions of these components. This creates a source of uncontrollable
variance in the system. In order to quantify the effect of this
variance, we evaluated the reliability of classifications under
two conditions: (1) where physicians were left to use their own
interpretations of the system (the current state of practice), and
(2) where a concerted effort was made to standardize the evaluations,
leading to a final classification. We found the classifications
to be less reliable than desired under both conditions.
Chell and Flowers suggest that Stulberg et al.2 provide
clear definitions for the evaluation of the neck and the acetabulum.
We disagree with this characterization. Stulberg et al. only suggest how
to measure these components; they do not provide evaluative criteria,
nor do they suggest comparison with the contralateral side to determine
normality except in the case of coxa magna. With use of acetabular
slope as an example, the article references the acetabular angle
of Sharp1; however, it does not
provide the critical value separating normality and abnormality.
Sharp defines greater than 42 degrees as the upper range of normality,
but Stulberg et al. do not reference this definition directly. Their
article uses vague language, such as "tended to be abnormally steep"
(page 1099) and "the lateral lip of the acetabulum became flattened
or vertically inclined" (page 1101). Likewise, their depiction of femoral
neck length as quadrants suggests measurement but not evaluation
and provides no standard criteria for future users of the system.
In regard to Chell's and Flowers' comments concerning comparison
of outcomes across studies, we do not dispute that variables such
as age at onset influence long-term outcomes or that comparisons
between studies should keep these factors in mind. We do wish to
reemphasize, however, that in our study, where age at onset, treatment
type, skill of the practitioner, patient compliance, amount of head
involvement, extent of the fracture line, time in the fragmentation
and reossification stages, and the raters themselves were held constant
across the repeated measurement of a single sample, we still found
substantial differences in the classifications over readings, both
when physicians used their own interpretations of the system of Stulberg
et al.2 and when they used our
consensus definitions. We suggest that, even after controlling for
physician, treatment, and patient variables, the unreliability of
the classifications themselves cannot be ruled out as a contributor
to the range of outcomes found in the literature.
Stuart L. Weinstein, M.D.
Lori A. Dolan, R.N., M.A.
Kevin F. Spratt, Ph.D.
Corresponding author: Stuart L. Weinstein, M.D.
Department of Orthopaedic Surgery
University of Iowa Hospitals and Clinics
200 Hawkins Drive
Iowa City, Iowa 52242-1009