Content uploaded by Konstantin Schekotihin
Author content
All content in this area was uploaded by Konstantin Schekotihin on Oct 20, 2024
Content may be subject to copyright.
Detection of Failure Analysis Methods
with Image Classification
Selene Lobnig1,3, Christian Burmer2, Konstantin Schekotihin3
1Infineon Technologies, Siemensstrasse 2, 9500 Villach, Austria
2Infineon Technologies, Am Campeon 1-15, 85579 Neubiberg, Germany
3University Klagenfurt, Universitätstr. 65-67, 9020 Klagenfurt, Austria
Email: SeleneElise.Lobnig@infineon.com Christian.Burmer@inifneon.com Konstantin.Schekotihin@aau.at
Abstract—Failure analysis (FA) in semiconductors is an
error-prone and knowledge-intensive activity. Therefore, timely
support of engineers with information about past analyses, best
practices, or technical data is crucial for successful FA
operations. Unfortunately, in many cases application of modern
Artificial Intelligence (AI) methods is limited since most of the
data is stored in human-readable formats only, thus, making its
automatic processing impossible. In this paper, we consider a
problem of method detection from images made by different
tools used in FA. We show that the proposed deep learning
technique can successfully recognize methods from various
images made in an FA lab with an accuracy of 91%. In addition,
we investigate the transferability of our results to images of
other labs. Obtained results show a slight drop in accuracy to
82%, which can be improved by fine-tuning a model on data
from other labs.
Keywords—Method Detection, Neural Networks, Semantic
Annotation
I. INTRODUCTION
Finding failures in semiconductors is an essential part of
the product lifecycle aiming at the prevention of different
malfunctions that might lead to severe incidents causing
substantial costs and potentially threatening human lives.
However, finding an explanation for an observed abnormal
behavior is a complex task demanding much technical
knowledge about electrical engineering and, in extreme cases,
about the application domain as well. In addition, engineers
should perform their investigations within a given time to
ensure timely feedback to customers. Semantic systems based
on AI techniques can support engineers in various activities
by providing them with information about similar cases
archived in the databases, best practices in analyzing failures
of related products, or technical support in applying different
methods. However, an essential prerequisite for creating such
semantic systems is the availability of data in a format that
machines can read and interpret in the same way as engineers
working in a lab. Unfortunately, most of the textual or image
data stored in databases, wikis, or file shares can only be used
by humans.
Recent advances in the application of AI methods in FA
showed that the creation of semantic vocabularies—
ontologies [1]—and the development of supervised
classification techniques for semantic annotation of
documents [2] [3] could be utilized to represent vast amounts
of data in a form that machines can read and interpret.
Nevertheless, much work must still be done to create an AI
toolbox for semantic annotation of different data formats used
in FA due to a large number of different methods applied in
the process. As a result, data output by these methods is very
heterogeneous and, therefore, the creation of single machine
learning models is problematic. Moreover, in many cases, the
storage of this data is regulated only by administrative
instructions, e.g., in which directory a file must be stored. As
a result, a significant part of the data is labeled erroneously,
e.g., if it is stored in the wrong directory, or is not labeled at
all, e.g., when an engineer forgets to move the file.
This paper suggests a deep learning approach to detect an
FA method whose application yielded an image. In the
considered application scenario, the photos made by FA
experts during the failure analysis process are saved in a
database but often with little information about their purpose,
depicted issues, or meta-data about the applied method. The
latter is essential for further image processing since quality,
colors, and other features of an image may vary significantly
depending on the method, see sample images in Table 1.
Table 1:Example images for the different methods
Optical
Microscopy
Scanning
Electron
Microscopy
Scanning
Acoustic
Microscopy
Pull-Shear
X-ray
Tomography
Focused Ion
Beam
Electrical
Measurement
Emission
Microscopy
Laser
Localization
IR LockIn
Thermography
Auger
Interferometry
Scanning
Probe
Microscopy
EDX
Layout
Analysis
Transmission
Electron
Microscopy
Contributions of this paper can be summarized as follows:
• We collect and analyze the data available in the
databases of three FA labs, including images and
various meta-data about images and depicted samples.
The created training set included images as well as
selected features like format, modus, and size.
• We develop a model that combines the selected
features with ResNet-18, a convolutional neural
network pre-trained on a large set of general images, to
yield a classification model to annotate images with a
method used to create them.
• Conducted evaluation indicates that the suggested
approach can identify the method with a total accuracy
of up to 0.96. In addition, we analyzed the possibilities
of transferring classifiers among the labs. Obtained
results show that a direct transfer causes the accuracy
to drop to 0.82, which can be raised to the previous
level by fine-tuning the model on the data of the new
FA lab.
The paper is organized as follows. Section II provides the
background and state-of-the-art methods relevant to the
studied classification problem. We discuss the data
preprocessing and describe the classification model in Section
III. Finally, we present the evaluation methodology and results
in Sections IV and V and give final remarks and outlook in
Section VI.
II. BACKGROUND
Classification is an essential supervised machine learning
task, which is widely applied to solve different problems by
training models on available sets of labeled examples. In the
context of computer vision, classification aims to select a
correct label from a set of labels for an input image. This
section introduces basic concepts of image classification and
provides an overview of related work.
A. Terminology
Modern image classification methods use Convolutional
Neural Networks (CNNs) [4] to establish models able to
assign a label to a given image with high accuracy
outperforming human experts [5] [6] [7] [8]. Convolution is
an operation that came to NNs from signal processing, where
it is often used to extract a relevant signal from a noisy one.
When applied to images, convolutions are viewed as a
composition of many frames that can also overlap [9]. As a
result, this operation considers pixels placed close to each
other in an image as related. This assumption helps to simplify
the learning process significantly by allowing a network to
focus on small portions of the input image. A convolutional
layer applies filters (kernels) to each channel of an image to
create activation maps that highlight different aspects of an
image, e.g., contours, contrast, sharpening edges, etc.
Activation maps can then be provided to another
convolutional layer, which kernels might extract more
complex features like corners or parts of objects. Thus,
training a convolutional layer aims to learn kernels from input
labeled images, which results in the extraction of features
providing the most information about depicted objects.
Extracted features are provided to classification layers of the
network, which yield a probability for each label to be
assigned. The label with the highest probability is selected as
the classification result and assigned to an input image.
Over the last decade, researchers suggested a vast number
of CNN architectures. In most cases, these approaches were
evaluated on a well-known image classification benchmark,
ImageNet [10]. Results obtained on this dataset allow one to
compare the architectures and select the most appropriate one
for the classification task at hand. In our case, we selected the
ResNet architecture [3], which won first place in the ImageNet
Large Scale Visual Recognition Challenge 2015. It uses
specific shortcut connections to overcome the learning
degradation problem when deep neural networks stagnate and
cannot learn required concepts with required accuracy even
over a very long training time. Shortcut connections allow the
network to skip one or more layers and thus help deeper
convolutional networks to learn their tasks more efficiently.
ResNet showed a good balance between training complexity
and performance, e.g., [11] compared 14 CNN architectures
on their accuracy, interference time, power consumption,
memory usage, operations count, and the number of
parameters.
B. Related work
The importance of FA automation is recently recognized
by research and application communities, who have started
developing and applying AI techniques, like image detection
and classification, in this area. Thus, Lin et al. [12] created a
framework that automates the processing of Scanning
Electron Microscopy images for integrated circuits. The
framework uses Faster R-CNN [13], ResNet-50 [3], FCN
[14], and VGG-16 [6] for (i) misalignment detection after the
image stitching, (ii) standard cell detection, via/contact
segmentation and metal line segmentation after feature
extraction, and (iii) stack movement regression after image
stacking. The authors apply various augmentation techniques
to Scanning Electron Microscopy images, such as
automated cropping and shifting for misalignment and
movement detection, as well as semi-supervised segmentation
of vias and metal lines with manual error correction.
Nagamura et al. [15] presented a CNN to split segments of
large-scale integrated circuit (LSI) layouts into risk and non-
risk classes. The authors tried two different models, and a
modified version of VGG-16 [6] reached good results. To
obtain the training and test data, LSI layout images were cut
into a grid, and all segments with defects were labeled as risk
segments.
Furthermore, several studies focused on the classification
and detection of wafer map patterns. Kiryong and Kim [16]
trained a hierarchy of CNN networks, where leave networks
aim to recognize base patterns. Their results are forwarded to
higher-level networks to detect the composite patterns. The
authors generated simulated wafer bin maps (WBM) training
data with various distributions to model the different mixture
patterns. Nakazawa and Kulkarni [17] use a CNN to extract
features from a wafer map image and compute the Hamming
distance to other known wafer maps in their database to
predict a pattern. They also use simulated wafer maps for
training and testing because of the highly imbalanced dataset.
Kim et al. [18] introduce a modified version of VGG-16 [6] to
classify a WBM image in 13 different pattern classes or the
out-of-distribution class. The authors train their model
only on the real data, preprocessed using a template matching
technique to ensure that the main part of the wafer is always
on the upper side of the image.
Although tools already exist to support engineers in the
failure analysis process, nearly all papers observe that it is
difficult to get enough training data. Therefore, our method
can help to annotate images in the database such that images
suitable for the training of problem-specific networks can be
identified easier.
III. APPROACH
The proposed CNN-based approach to FA method
recognition consists of two main steps: data preprocessing and
network training; see Figure 1 for an overview. Thus, we
extract meta-data from an input image and then forward both
to the preprocessing module. The results are provided to CNN,
which yields the final classification. Other FA support systems
can use the labels to filter possible training images for
different FA Analysis tasks like hot plate test, corn size
classification, crack detection, or mold void detection in
ScanningAcousticMicroscopy images. In this section, we
provide details of the data preprocessing and training of the
network and present the architecture of our neural network.
Figure 1: Workflow of an Image classified by our classifier and how it can
be further used in FA Analysis Tasks
A. Image Labels – FA Methods
The analyzed labs store images in a database with a lot of
free text information. The database comprises images taken at
three labs yielded by various methods using different tools.
Due to various issues, the text fields of many images are
empty and/or filled out incorrectly. Nevertheless, the amount
of labeled images taken in 2019 and 2020 is sufficient to
define three datasets , and . The dataset of
comprises images of 13 different FA methods, of 14 and
of 8. Out of 18 different methods observed in these
datasets, only six are shared among all locations and five occur
only in and . All other methods are unique for each
location: two for , three for , and two for .
To create a training dataset, we retrieved images from the
internal databases and removed all records referring to
missing or inaccessible data or comprising an incomplete set
of labels. Next, we unified labels of the remaining images
since most of them were defined by engineers in free text
fields leading to situations when the same method was
denoted differently among the locations. Some engineers used
long names and other abbreviations, or only “essential” parts
of names. In addition, some words were substituted by
synonyms, misspelled, or even written in languages other than
English. The unification process used the FA ontology [1] as
the main source of labels. The taxonomy of methods, defined
in the ontology, provides keywords, unique identifiers for
each method, as well as relations between them, indicating
single methods, their groups, and families.
However, after the unification, it turned out that some
classes had only a few images, which led to a significant
imbalance in the training set. This situation is problematic for
supervised learning techniques since obtained models often
get a strong bias towards overrepresented classes and ignore
underrepresented ones. To mitigate this issue, we used the
method taxonomy to merge images of different methods that
fit logically together into one class. As a side-effect of the data
preparation, we also identified a number of wrongly labeled
images, with a total estimate of approximately 2% of the
training. Finding all such images is tedious, so we decided to
keep them in the training set because their share is pretty small
and should not mislead the training much. Nevertheless, their
existence should be kept in mind when looking at the
evaluation results.
B. Data preparation
The data preparation step of our approach has two major
subprocedures: preparation of the input image and encoding
of the image meta-data.
Image. In our approach, every image is a three-dimensional
tensor, where one dimension encodes channels of an image,
i.e., red, green, and blue (RGB), and the other two provide
information about the color intensity for each pixel with
values between 0 and 1. For better training performance, we
converted every image into an RGB format and rescaled the
color intensity values to be equally distributed around zero.
The rescaling procedure computed the scores
as follows:
where
is the value of an image pixel from the color
channel .
Moreover, we used image augmentation to increase the
number of training samples and avoid overfitting. The training
images were first resized to 224 pixels for the smaller side and
then randomly cropped also on the longer side to a size of
pixels. Next, they get randomly horizontally and
vertically flipped. The cropped size of 224x224 is selected
because it is the standard size of images expected by many
popular CNN architectures trained with images from
ImageNet [10].
Meta-data. Our approach extracts the meta-data comprising
width, height, modus, and format for every image, where the
modus describes which color information is stored in an
image, and the format defines its compression method. Most
images use the RGB mode, with three color channels
providing three color intensity values for each pixel. However,
our dataset also included images with color information in
other modes, like P, where each color is defined in a specific
palette, or L, which stores only luminance in one channel. In
the case of format, most images are saved in JPEG or PNG
format, with only a few in TIFF or MPO. Since both the
modus and format represent categorial features, we used one-
hot-encoding to represent their values in a form that can be
provided to a neural network. The numerical features—width
and height—are rescaled using standard scores with the mean
and standard deviation of each feature computed over all
images. As a result, each observation vector with image meta-
data comprises 84 components.
C. Model architecture
The model suggested in this paper contains two main parts
shown in Figure 2. The first part depicted in green represents
convolutional layers of the ResNet-18 network pre-trained on
the ImageNet dataset. As a result, the layers already have
filters able to recognize various features of images. Therefore,
when we provide a resized RGB image from a FA lab, the
feature extraction layers do not start from scratch but can
immediately retrieve useful activation maps.
The activation map with 512 values extracted by
convolutional layers of ResNet-18 is concatenated with the
meta-data vector. The result is then forwarded to the
classification part, shown in blue, comprising layers of two
types. The dropout layer helps the network to avoid overfitting
by hiding values of the input vector with the probability of 0.5.
Therefore, to get good performance, a network cannot
concentrate on some specific part of the concatenated vector
and must search for more general regularities. Linear layers
with rectified linear (ReLU) activations are used to learn
weights that express non-linear dependencies between the
input and output vectors, which contain the logic of the
classifier. The dimensions of vectors yielded by each layer are
also presented in Figure 2. The final linear layer has no
activation function, and its output is used to generate the labels
of the method that created the input image. Note that in
different experiments presented in this paper, the number of
neurons in the output layer may vary depending on the number
of methods occurring in a training dataset.
Figure 2: Model Architecture
IV. EVALUATION
The model was trained in an OpenShift cloud with AMD
EPYC 7542, 26 GB RAM, and NVIDIA GPU. The
experiments were done in Ubuntu 18.04, with Cuda 11.2 and
Python 3.6. In our experiments, we considered three datasets
, and collected as described in the previous section
from three different labs. All layers of the model were trained.
Fine-tuning without fixing layers was used for this purpose.
The reason for this is that the images look very different from
the ones ResNet was originally trained on. Nevertheless, it
was better to use the pretrained ResNet for the training start.
Having a good initialization of the network layers allowed us
to save much training time compared to the training from
scratch.
A. Experiments leading to the proposed architecture
Different experiments led to the selection of the proposed
architecture. First, we focused on the choice of the backbone
architecture for feature extraction from preprocessed images.
Experiments with the vanilla ResNet-34 architecture trained
on 190 images from the dataset for ten epochs, one time
with randomly initialized weights and another with pretrained
ones. The accuracy of the first network on the test set did not
improve during the training and stayed at about 0.02.
However, the accuracy of the pretrained network increased
from 0.15 to 0.63 when applied to the test set. Further
experiments with pretrained network and larger subsets of
indicate that the imbalance of the dataset causes a rather
low recall for almost half of all classes. Using a weighted
variant of the Cross-Entropy loss function and a learning rate
of 0.01, we improved the test accuracy to 0.93 and got an
average recall of 0.93 and about 0.9 for major classes. In an
additional experiment with fixed convolutional layers of
ResNet-34, we could not reach the performance above and
have got the best test accuracy of only 0.91.
To further improve the classification accuracy, we
experimented with features comprising image meta-data.
Thus, we trained a simple network comprising of two linear
layers with ReLU activation on meta-data of 20952 randomly
selected images, 0.001 for learning rate, batch size of 100 and
50 epochs. The obtained model showed a test accuracy of
0.86, indicating a significant potential for using meta-data in
our classifier. In the next experiment, the output of the
pretrained ResNet-34 network was concatenated with the
preprocessed meta-data the test accuracy could be improved
to 0.96. Note that the training was quite unstable in all
experiments with raw meta-data, and significant loss
fluctuations were observed over the epochs. This shows the
importance of the right data preprocessing.
In the next step, a smaller network was tested to see if it
would still be able to classify the images as precise as ResNet-
34. Therefore, the same experiment was executed with
ResNet-18 instead of ResNet-34. The training and test
accuracy did not change and stayed at 0.96. Therefore, the
smaller architecture was used. To see if using another network
architecture would improve the accuracy, we also tested the
classification with training on EfficientNet, but the accuracy
stayed the same, so we stayed with ResNet-18.
Finally, we optimized the architecture with respect to
transfer learning applications, when a model trained on the
data from one lab can easily be adopted in another lab.
Therefore, we trained a classifier on and tested it on ,
which resulted in an accuracy of 0.63. To improve this and get
better generalization, we extended the last classification layer,
which was only one linear layer, to 3 linear layers with ReLU
activation and dropout layers in between. In addition, we
added image augmentation described in the previous section.
The new architecture resulted in a test accuracy on the dataset
of 0.94 and the accuracy on of 0.66. Although the
improvement on does not seem a lot, the recall of seven
classes improved and slightly decreased only for three classes,
with the maximum drop for the class Electrical-
Measurement by appr. 0.1.
Additional experiments were executed to check if other
image augmentation methods, like squeezing and random
rotation, would be more beneficial than only cropping. But
these modifications did not improve the accuracy and were
therefore discarded.
B. Experiments evaluating the proposed architecture
Various experiments were carried out to test the performance
of the classifier.
Experiment 1. To train the model in this experiment, we used
10348 labeled images from , which was split into 8278
(80%) training images and 2070 (20%) test images stratified
by the image method label. The validation and test images get
cropped in the center without any added image augmentation.
The weighted loss was used to learn the features of classes
with a small number of images. The weight per class is
indirectly proportional to the number of images of each class.
Cross-Entropy loss was used with the Stochastic Gradient
Descendent as the optimizer. The learning rate was set to 0.01,
the batch size was 100 images, and it was trained for 20
epochs, where every epoch corresponds to a complete training
run of a neural network over all training images.
Experiment 2. To test the generalizability of the model, we
trained the model first only on the images from the location
(Experiment 1) and then evaluated the generalizability of
this model on the images of .
Experiment 3. The model was trained again with the same
parameters as for Experiment 1, but this time with 13827
images from and .
Experiment 4. The generalizability was tested again with the
trained model from Experiment 3 on images from .
Experiment 5. We trained the model with the same parameters
as in Experiment 1, but this time for 30 epochs and with 18
656 images from , and .
V. RESULTS
To evaluate the results, we used the accuracy and recall
measures. Accuracy describes the number of correctly
classified images to the amount of all images in the dataset.
The recall represents the ratio of correctly classified images to
all images belonging to a class. The value of both measures is
between 0 and 1, where 0 indicates the worst performance
with respect to a measure and 1 the best one.
The diagrams with the training loss, training accuracy, test
accuracy are shown in Figure 6.
Experiment 1. The result looks good for our use case, with
0.96 for training accuracy and 0.94 for test accuracy. The
training loss declines, and the accuracy for training and test
datasets increases till the end, where they stagnate.
Experiment 2. This experiment shows that 66% of the
images were classified correctly. The classes which often
confuse the classifier are EmmissionMicroscopy, Laser-
Localisation, and IRLockInThermography. We observe
this result because, in general, the images look very similar.
The background is grayscale, and the differences are the
colored spots shown on the image. There even exist no spots
on some images, for example, when no error is found with this
method. For these cases, also domain experts cannot decide
which method was used to create the image. The spots on the
images from and are slightly different, which is not a
problem for experts in most cases, but the AI has difficulties
classifying them correctly. These three classes and the classes
ElectricalMeasurement, and ScanningProbeMicroscopy
only have a recall of around 0.5 or below. The class
ScanningProbeMicroscopy consists of many different kinds
of images and also different images in and which
results in this bad classification of unseen images from a
different location. The class ElectricalMeasurement
consists of different kinds of diagrams, but because diagrams
are also often found in the classes Auger and
ScanningProbeMicroscopy, the classifier confuses some
images of ElectricalMeasurement and predicts the wrong
class.
Experiment 3. After training the model on both image datasets
( and ) the test accuracy is 0.91, and the training accuracy
is 0.93. All classes have a recall equal to or higher than 0.75
again. The classifier still mixes up the prediction for the
classes EmmissionMicroscopy, ElectricalMeasurement,
and ScanningProbeMicroscopy but less than without the
training also on the images of .
Experiment 4. The classifier from Experiment 3 was tested on
images of . The confusion matrix, presented in Figure 5,
shows that the classifier correctly detects 82% of the images.
IRLockInThermography gets poorly recognized with only 0.3
recall. This is because for this class, the images from and
are quite similar but have more intense colors for the spots
than the images from . Figure 3 shows images from class
IRLockInThermography from the (a) and (b) dataset
the classifier was trained on. Figure 4 shows different images
produced from the method IRLockInThermography but
classified as (a) OpticalMicroscopy or (b) Xray
Thermography. The issue observed in Figure 4 might also be
due to an overlay of IRLockInThermography images with
images of OpticalMicroscopy or Xray microscopes.
(a)
(b)
Figure 3: IRLockInThermography images from (a) and (b) dataset
(a)
(b)
Figure 4: images from class IRLockInThermography classified as (a)
OpticalMicroscopy and (b) Xray Thermography
Experiment 5. For this experiment, more epochs than in the
previous experiments were necessary to fit the data. After 30
epochs, the training accuracy was 0.97 and the test accuracy
0.93. The confusion matrix for this model is shown in Figure
7. The classification of the classes EmmissionMicroscopy,
ElectricalMeasurement, and ScanningProbeMicroscopy
get mixed up sometimes like in the experiments before. The
class IR-Microscope has a low recall rate of 0.57, which is
because it has only a few labeled images to train on. More
images would be necessary for a better recall for this class. All
other classes, except these four, have a recall of 0.94 or higher.
These experiments show that the model fits the data well,
and high accuracy can be achieved. Additionally, for images
from different labs that use different tools for the same
method, the classifier still can achieve an accuracy of 0.83. It
gets better the more images from different locations it is
trained on.
Figure 5: Confusion Matrix for the classifier trained on and data and tested on data. The x-axis shows the prediction and the y-axis shows the
actual class
Training Loss
Training Accuracy
Test Accuracy
Experiment 1
Experiment 3
Experiment 5
Figure 6: Training Loss, Training Accuracy and Test Accuracy of all three training experiments. Experiment 1 was trained on D1 Data, Experiment 2 was
trained on D1 and D2 data and Experiment 3 was trained on D1, D2 and D3 data. The x-axis shows the number of images the model was trained on. For
every minibatch (for training accuracy) and every epoch (for test accuracy) a point in the graph was added.
Figure 7: Confusion Matrix for Model trained on , and
VI. CONCLUSION & FUTURE WORK
This paper proposes a tool for method detection from
images made by different tools used in FA. This is the first
step to automating parts of the failure analysis process to
support experts analyzing the images. We used the pre-trained
ResNet-18 architecture with additional meta-information
about the image for the suggested method. The training
reached an accuracy of 0.91. The transferability of this method
to images from other labs was shown, which achieved an
accuracy of 0.83. When the model gets fine-tuned on the other
lab’s new images, the accuracy can again be improved.
In the future, further image processing will be done on the
classified images from this tool to support engineers in various
activities by providing them with information about similar
cases archived in the databases, best practices in analyzing
failures of related products, or technical support in applying
different methods.
REFERENCES
[1]
A. Safont-Andreu, C. Burmer und K. Schekotihin,
„Using Ontologies in Failure Analysis,“ ISTFA 2021:
Conference Proceedings from the 47th International
Symposium for Testing and Failure Analysis, pp. 23-
28, 2021.
[2]
F. Platter, A. Safont-Andreu, C. Burmer und K.
Schekotihin, „Report Classification for
Semiconductor Failure Analysis,“ ISTFA 2021:
Conference Proceedings from the 47th International
Symposium for Testing and Failure Analysis, pp. 1-5,
2021.
[3]
K. He, X. Zhang, S. Ren und S. Jian, Deep Residual
Learning for Image Recognition, arXiv: 1512.03385,
2015.
[4]
Y. Lecun, L. Bottou, Y. Bengio and P. Haffner,
„Gradient-based learning applied to document
recognition,“ Proceedings of the IEEE, Bd. 86(11),
pp. 2278-2324, 1998.
[5]
A. Krizhevsky, I. Sutskever and G. E. Hinton,
„ImageNet Classification with Deep Convolutional
Neural Networks,“ Advances in Neural Information
Processing Systems, 2012.
[6]
K. Simonyan und A. Zisserman, „Very Deep
Convolutional Networks for Large-Scale Image
Recognition,“ arXiv: 1409.1556, 2015.
[7]
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D.
Anguelov, D. Erhan, V. Vanhoucke und A.
Rabinovich, „Going deeper with convolutions,“ 2015
IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 2015.
[8]
K. He, X. Zhang, S. Ren und J. Sun, „Delving Deep
into Rectifiers: Surpassing Human-Level Performance
on ImageNet Classification,“ 2015 IEEE International
Conference on Computer Vision (ICCV), pp. 1026-
1034, 2015.
[9]
A. Zhang, Z. C. Lipton, M. Li und A. J. Smola, Dive
into Deep Learning, arXiv preprint arXiv:2106.11342,
2021.
[10]
O. Russakovsky, J. Deng, H. Su, J. Krause, S.
Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla,
M. S. Bernstein, A. C. Berg und L. Fei-Fei,
„ImageNet Large Scale Visual Recognition
Challenge,“ International Journal of Computer Vision
(IJCV), Bd. 115(3), pp. 211-252, 2015.
[11]
A. Canziani, A. Paszke und E. Culurciello, „An
Analysis of Deep Neural Network Models for
Practical Applications,“ CoRR, Bd. abs/1605.07678,
2016.
[12]
T. Lin, Y. Shi, N. Shu, D. Cheng, X. Hong, J. Song
und B. H. Gwee, „Deep Learning-Based Image
Analysis Framework for Hardware Assurance of
Digital Integrated Circuits,“ 2020 IEEE International
Symposium on the Physical and Failure Analysis of
Integrated Circuits (IPFA), 2020.
[13]
S. Ren, K. He, R. B. Girshick und J. Sun, „Faster R-
CNN: Towards Real-Time Object Detection with
Region Proposal Networks,“ IEEE Transactions on
Pattern Analysis and Machine Intelligence, Bd. 39(6),
pp. 1137-1149, 2017.
[14]
J. Long, E. Shelhamer und T. Darrell, „Fully
convolutional networks for semantic segmentation,“
2015 IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), pp. 3431-3440, 2015.
[15]
Y. Nagamura, T. Ide, M. Arai und S. Fukumoto,
„CNN-Based Layout Segment Classification for
Analysis of Layout-Induced Failures,“ IEEE
Transactions on Semiconductor Manufacturing, Bd.
33(4), pp. 597-605, 2020.
[16]
K. Kyeong und H. Kim, „Classification of Mixed-
Type Defect Patterns in Wafer Bin Maps Using
Convolutional Neural Networks,“ IEEE Transactions
on Semiconductor Manufacturing, Bd. 31(3), pp. 395-
402, 2018.
[17]
Y. Kim, D. Cho und J.-H. Lee, „Wafer Map Classifier
using Deep Learning for Detecting Out-of-
Distribution Failure Patterns,“ 2020 IEEE
International Symposium on the Physical and Failure
Analysis of Integrated Circuits (IPFA), pp. 1-5, 2020.
[18]
T. Nakazawa und D. V. Kulkarni, „Wafer Map Defect
Pattern Classification and Image Retrieval Using
Convolutional Neural Network,“ IEEE Transactions
on Semiconductor Manufacturing, Bd. 31(2), pp. 309-
314, 2018.
APPENDIX
This appendix provides additional information about the results of our experiments. In the camera-ready version of the paper,
we will make this data available online on an accompanying website of our project.
Figure 8: Confusion Matrix of Experiment 1. The x-axis shows the predicted class, the y-axis shows the real class
Figure 9: Confusion Matrix for Experiment 2. Images of D2 get classified with the model (from Experiment 1) trained on images of D1. The x-axis shows the
predicted class, the y-axis shows the real class
Figure 10: Confusion Matrix of Experiment 3. The x-axis shows the predicted class, the y-axis shows the real class
Figure 11: Confusion Matrix for Experiment 4. Images of D3 get classified with the model (from Experiment 3) trained on images of D1 and D2. The x-axis
shows the predicted class, the y-axis shows the real class
Figure 12: Confusion Matrix of Experiment 5. The x-axis shows the predicted class, the y-axis shows the real class