Background: Accurate and reliable radiographic classifications of
the relative severity and outcome of Legg-Calvé-Perthes disease are
essential in the study of that disease. As part of a prospective multicenter
study*, we sought to define more
clearly the lateral pillar classification of severity and the Stulberg
classification of outcome; we sought especially to define the borderlines
between classification groups.
Methods: We performed interobserver and intraobserver trials of the
lateral pillar and Stulberg classifications using sets of twenty radiographs
chosen from a prospective study of 345 hips. To establish reliable definitions
of the lateral pillar classification, we added a new, intermediate group
termed the B/C border group, which includes femoral heads with a thin
or poorly ossified lateral pillar and those with a loss of exactly 50% of the
original height of the lateral pillar. The resulting classification consists
of four groups: A, B, B/C border, and C. In our application of the
classification system of Stulberg et al., we defined a class-II femoral head
as round and fitting within 2 mm of a circle on both anteroposterior and
frog-leg lateral radiographs. We defined a Stulberg class-III femoral head as
out of round by more than 2 mm on either view and a Stulberg class-IV femoral
head as one with at least 1 cm of flattening of the weight-bearing articular
surface. To assess interobserver and intraobserver agreement, we performed two
trials of each classification with six orthopaedic surgeons reviewing twenty
radiographs or pairs of radiographs.
Results: In the first trial of the lateral pillar classification,
there was 81% agreement per radiograph and the average weighted kappa was
0.71. In the second trial, there was 85% agreement per radiograph and the
weighted kappa averaged 0.79. Intraobserver reliability testing showed a 77%
match between Trials 1 and 2, an average weighted kappa of 0.81, and an
average generalizability coefficient of 0.91. In Trial 1 of the Stulberg
classification, there was 91% agreement per radiograph and an average weighted
kappa of 0.82. In Trial 2, there was 92% agreement per radiograph and an
average weighted kappa of 0.82. Intraobserver reliability testing showed an
89% match between Trials 1 and 2, an average weighted kappa value of 0.88, and
an average generalizability coefficient of 0.92.
Conclusions: The interobserver and intraobserver trials of these
classifications produced kappa values and generalizability coefficients in the
excellent range. The modified lateral pillar classification and the redefined
Stulberg classification are sufficiently reliable and accurate for use in
studies of Legg-Calvé-Perthes disease.