January 2025
·
28 Reads
Journal of Intelligent Manufacturing
Deep learning-based classification models show high potential for automating optical quality monitoring tasks. However, their performance strongly depends on the availability of comprehensive training datasets. If changes in the manufacturing process or the environment lead to defect patterns not represented by the training data, also called data drift, a model’s performance can significantly decrease. Unfortunately, assessing the reliability of model predictions usually requires high manual labeling efforts to generate annotated test data. Therefore, this study investigates the potential of intrinsic confidence calibration approaches (i.e., last-layer dropout, correctness ranking loss, and weight-averaged sharpness-aware minimization (WASAM)) for automatically detecting false model predictions based on these confidence scores. This task is also called model failure prediction and highly depends on meaningful confidence estimates. First, the data drift robustness of these calibration methods combined with three different model architectures is evaluated. Two datasets from the friction stir welding domain containing realistic forms of data drift are introduced for this benchmark. Afterward, the methods’ impact on model failure prediction performance is assessed. Findings confirm the positive influence of well-calibrated models on model failure prediction tasks, highlighting the need to look beyond classification accuracy during model selection. Moreover, transformer-based models and the WASAM technique were found to improve robustness to data drift, regarding the classification performance as well as obtaining useful confidence estimates.