Representative examples of misclassifications by Xp-Bodypart-Checker and the second model (CXp) Projection-Rotation-Checker. A) Chest x-ray of a woman in her 50s that was misclassified as “Abdomen” by Xp-Bodypart-Checker. B) PA chest radiograph of a woman in her 40s that was misclassified as “lateral” by CXp-Projection-Rotation-Checker. C) AP chest radiograph of a man in his 50s that was misclassified as “PA” by CXp-Projection-Rotation-Checker. D) PA chest radiograph of a woman in her 80s that was misclassified as “PA” by CXp-Projection-Rotation-Checker. AP, anteroposterior; PA, posteroanterior; AUC, area under the receiver operating characteristic curve
Image source: Mitsuyama Y, Takita H, Walston SL et al., European Radiology 2025 (CC PAR 4.0)
This is further complicated by images with various rotations. An x-ray can be taken from front to back or vice versa, and it can also be lateral, inverted or rotated, further complicating the data set. In large imaging archives, these minor errors quickly add up to hundreds or even thousands of mislabeled results.
A research team from the Osaka Metropolitan University Graduate School of Medicine, including graduate student Yasuhito Mitsuyama and Professor Daiju Ueda, aimed to improve the detection of mislabeled data by automatically identifying errors before they affect the input data of deep learning models. The group has developed two models: Xp-Bodypart-Checker, which classifies x-rays based on body part; and CXp-Projection-Rotation-Checker, which detects projection and rotation of chest X-rays.
Xp‑Bodypart‑Checker achieved 98.5% accuracy and CXp‑Projection‑Rotation‑Checker achieved accuracies of 98.5% for projection and 99.3% for rotation. Researchers are optimistic that integrating the two into a single model would achieve breakthrough performance in clinical settings.
Although the results are exceptional, the team hopes to further refine the method for clinical use. “We plan to retrain the model on radiographs that were flagged when they were correctly labeled, as well as those that were not flagged but were actually mislabeled, to achieve even greater accuracy,” Mitsuyama said.
Source: Osaka Metropolitan University