Adolescent idiopathic scoliosis (AIS) is a 3D deformity of the spine, defined by a Cobb angle of at least 10 degrees. The prevalence of AIS is 1-3% and is more predominant in female patients (Hammond et al., 2011). One study showed that AIS is a progressive disease causing significant health impacts like pain, musculoskeletal problems and psycho-sociological issues due to its occurrence in adolescence (Barton & Weinstein, 2018; Goldberg et al., 1994; Mayo et al., 1994). Management of AIS is mainly guided by the assessment of bone maturity since patients with significant growth potential have a greater risk of curve progression (Weinstein et al., 2008). The Risser stage is the most commonly used indicator of bone maturity in AIS. In 1958, Risser introduced a comprehensive method observing the ossification of the iliac crest from conventional radiographs. Since then, two main classification systems emerged : the American and the French classifications. In the American classification, the ossification progression is divided in 6 stages, where stage 0 is a non-ossified iliac crest and 5 is a total fusion of the bones. The French classification divides the progression of the iliac ossification into thirds representing stages 0 to 3, stage 4 corresponds to the beginning of the crest’s fusion to the iliac bone, and stage 5 is a complete fusion of the two bones (Hacquebord & Leopold, 2012). As of today, the Risser grade is widely accepted for assessing bone maturity and the progressive potential of AIS.
Raters variability in the assessment of the Risser stage
Even with a clear clinical definition of the Risser stages, interpretation of plain radiographs is challenging due to : a) different image qualities between different acquisitions, b) variability between radiographic systems c) severe deformities where the strict frontal condition is no longer respected. Because of the rotated nature of the pelvis in AIS and subjective visual grading, an inter-observer and intra-observer variability was demonstrated and is accepted in the clinical practices. However, studies established a lack of consensus concerning the variability in the assessment of the Risser stage. Regarding the inter-observer variability, Goldberg et al. Goldberg et al. (1988) demonstrated a kappa of 0.80, and Dhar et al. (Dhar et al., 1993) showed an agreement of 89.2%. In contrast, more recent studies showed a 50% agreement all stages combined, while Hammond and al, in agreement with a Shuren and al (Shuren et al., 1992), showed moderate agreement between orthopedic surgeons and radiologists that can go up to three stages between the raters. Regarding the accuracy of the assessment, Izumi and al. suggest that anteroposterior radiographs reflect more on the iliac capping than the posteroanterior ones, leading to inaccuracies in the Risser stage assessment, while Reem et al. showed that it was an acceptable measure of the Risser stage (Hammond et al., 2011; Izumi, 1995; Reem et al., 2009; Sabour, 2018; Yang et al., 2014). This evident variability can have a high impact when considering the therapeutic strategies and the outcomes. Moreover, patient’s radiographs are sometimes graded by different observers, causing inconsistencies within the patient records. The controversy in the accuracy and reliability of the Risser grade can be resolved by a tool that guarantees a reliable and reproducible assessment without internal variability, and serves as a consistent second opinion. We propose such a computerized tool using deep learning methods(LeCun et al., 2015).
Related works
Deep learning is a subset of artificial intelligence where a computer is able to detect patterns and make predictions leveraging example data. Deep learning avoids the need of having humans explicitly specifying key regions. Instead, the most predictive features are learned from labelled examples as of a hierarchy of concepts, reflected in the architecture : deep learning networks are a stack of simple modules where more abstract representations are computed in terms of less abstract ones. Finally, the machine corrects its internal parameters to improve the predictions using an optimization method called back propagation(LeCun et al., 2015; Abdolmanafi et al., 2017). Recently, deep learning methods have been applied for segmentation, detection and classification. The major advances in machine learning and computer vision, and the availability of more computing power have motivated a shift from conventional algorithms towards deep learning approaches.
Deep learning for skeletal maturity
Skeletal maturity evaluation is an integral part of the pediatric practice in general, and especially important for endocrinology, radiology and orthopedics. However, manual grading of a large number of radiographs is time consuming and getting a second opinion to reduce its variability is unfit for clinical settings. Previous studies have proposed automatic assessment of skeletal maturity, focusing mainly on carpograms. For instance, Thodberg and Kreiborg presented BoneXpert, a four-step algorithm generating bone models and comparing the output to a reference (Thodberg et al., 2009). Such algorithms are highly comprehensive and easy to understand. Although useful and interpretable, they require high-quality images, and are based on heuristics which might fail to interpret borderline cases (Spampinato et al., 2017). Deep learning has recently been introduced for radiographic assessment of skeletal maturity and have shown promising results. Spampinato et al. introduced an automatic bone age assessment on carpograms using a five layers convolutional neural network (Spampinato et al., 2017). When looking at the key regions, the network suggested that some carpal regions accounted for by clinicians might not be relevant, while some new regions should be considered. The recent deep learning bone age assessment models yield satisfactory performance scores of 61% – 79% (Spampinato et al., 2017; Lee et al., 2017; Torres et al., 2017). To the best of our knowledge, deep learning has not yet been applied for the assessment of the Risser stage on radiographs. Hence the goal of this study is to propose a new deep learning technique for the automatic assessment of the Risser stage. We validate the performance of our method against observers by evaluating the intra and inter-observer variability.
INTRODUCTION |