Classification of welding flaw types with fuzzy expert systems
ABSTRACT The fuzzy expert system approach is proposed for the classification of different types of welding flaws. The fuzzy rules are generated from available examples using two different methods. The classification accuracy of fuzzy expert systems using fuzzy rules generated by the two methods is evaluated and compared. In addition, the fuzzy expert system approach is also compared with two other approaches: the fuzzy k-nearest neighbors algorithm and multi-layer perceptron neural networks, based on the bootstrap method. The results indicate that the fuzzy expert system approach outperforms all others in terms of classification accuracy.
-
Citations (0)
-
Cited In (0)
Page 1
Classification of welding flaw types with fuzzy expert systems
T.W. Liao*
Industrial and Manufacturing Systems Engineering Department, Louisiana State University, 3128 CEBA, Baton Rouge, LA 70803, USA
Abstract
The fuzzy expert system approach is proposed for the classification of different types of welding flaws. The fuzzy rules are generated from
available examples using two different methods. The classification accuracy of fuzzy expert systems using fuzzy rules generated by the two
methods is evaluated and compared. In addition, the fuzzy expert system approach is also compared with two other approaches: the fuzzy k-
nearest neighbors algorithm and multi-layer perceptron neural networks, based on the bootstrap method. The results indicate that the fuzzy
expert system approach outperforms all others in terms of classification accuracy.
q 2003 Elsevier Science Ltd. All rights reserved.
Keywords: Fuzzy expert systems; Genetic algorithms; Welding flaws; Classification; Radiography
1. Introduction
Radiography testing is one of several major nondestruc-
tive testing (NDT) methods to examine welds for subsurface
flaws. Real-time digital radiography is now available, but
not as affordable as film radiography. In most applications
today, a radiographic weld image is produced by permitting
an X- or g-ray source to penetrate the welded component
and expose a photographic film, which is then inspected by a
certified inspector usinga view box. The inspection of welds
is important not only to ensure the integrity of the welded
engineering artifacts but also to improve the fabrication
process. However, the manual interpretation process is
subjective, inconsistent, labor intensive, and sometimes
biased.
A few attempts have been made to developing
computer-aided interpretation systems for identifying the
anomalies in welds and for classifying their type. A
computer-aided weld quality interpretation system gener-
ally has three major functions: segmenting welds from the
background, detecting flaws in the weld, and classifying
the types of detected flaws. Past research that has been
published in the open literature in each one of these three
areas are first summarized below. A more detailed review
follows:
† Segmentation of welds: Liao and Ni (1996), Liao and
Tang (1997) and Liao, Li, and Li (2000).
† Detection of welding flaws: Daum, Rose, Heidt, and
Builtjes (1987), Gayer, Saya, and Shiloh (1990),
Hyatt, Kechter, and Nagashima (1996), Liao and Li
(1998), Liao, Li, and Li (1999) and Murakami
(1990).
† Classification of flaw types: Aoki and Suga (1999),
Kato et al. (1992), Murakami (1990) and Wang and
Liao (2002).
Liao and Ni (1996) proposed a methodology for the
extraction of linear welds, based on the observation that
the pixel intensities in a transversely sectioned weld
distribute more like Gaussian than other objects in the
image. Liao and Tang (1997) applied a multi-layered
perceptron (MLP) neural network (Rumelhart, Hinton, &
Williams, 1986) for segmenting a single linear or curved
weld at one time. Liao et al. (2000) employed fuzzy
classfiers, specifically fuzzy k-nearest neighbors (KNN)
(Keller, Gray, & Gigens, 1985) and fuzzy c-means (FCM)
(Bezdek, 1987), to segment multilple curved welds in one
radiographic image.
Daum et al. (1987) proposed a defect segmentation
algorithm based on a background subtraction algorithm,
which was proved effective regardless the defect type.
But it had difficulties to detect small defect regions with
size from 4 to 6 pixels. Murakami (1990) proposed a
simple algorithm, which conducts a local arithmetical
operation to a limited region, followed by the threshold-
ing operation. Gayer et al. (1990) described a two-step
process for the automatic recognition of welding defects,
in which a fast search for defect regions was followed by
0957-4174/03/$ - see front matter q 2003 Elsevier Science Ltd. All rights reserved.
doi:10.1016/S0957-4174(03)00010-1
Expert Systems with Applications 25 (2003) 101–111
www.elsevier.com/locate/eswa
* Tel.: þ1-225-578-5365; fax: þ1-225-578-5109.
E-mail address: ieliao@lsu.edu (T.W. Liao).
Page 2
identifying and locating defects. This method tried to
imitate the way a human inspector inspects radiographs:
first, a general glance with coarse resolution, followed by
fine focusing on defective regions. Hyatt et al. (1996)
presented a multiscale method designed to remove the
overall background structure while reserving the defect
details. Liao and Li (1998) proposed a method based on
the observation that welding flaws usually result in
distortions in the overall weld profile. Their method
comprises of four steps: preprocessing, curve fitting,
profile-anomaly detection, and postprocessing. Liao et al.
(1999) applied FCM and fuzzy KNN to detect welding
flaws using twenty-five features extracted from each line
of the radiographic image.
Research on the classification of welding flaw types is
scare. Murakami (1990) classified defect types with an
expert system using information such as the shape,
position and intensity level of the defect pattern. The
system could easily detect blowholes, but not cracks. Kato
et al. (1992) also used the expert system approach for
identifying different types of welding defects. The
identification rules were acquired from expert inspectors
by interviewing. Six features were extracted from each
welding defect, which cover information about the shape,
intensity and location. Aoki and Suga (1999) used a three-
layer artificial neural network to identify defect types
based on ten discriminitive features, which were auto-
matically generated from each defect by image processing
techniques. Most recently, Wang and Liao (2002) applied
the fuzzy KNN algorithm and MLP neural networks to
classify six types of welding flaws using twelve features.
This study continues our work in the area of computer-
assisted interpretation of welding flaws in radiographic
images. The fuzzy expert system approach is taken and
shown to produce better classification accuracy than that of
the fuzzy KNN algorithm and MLP neural networks
employed in the previous study (Wang & Liao, 2002).
The knowledge used in the fuzzy expert system in the form
of fuzzy rules is also more understandable, unlike the other
approaches.
The paper is organized as follows: Section 2 presents the
details of the fuzzy expert systems, including two methods
for the generation of fuzzy rules. The second method
improves on the first method by using a simple genetic
algorithm to determine the optimal partitions for each fuzzy
attribute. Test results are presented in Section 3. A
comparison with other techniques including fuzzy KNN
and MLP neural networks is discussed in Section 4,
followed by the conclusions.
2. Fuzzy expert systems
A fuzzy expert system is an expert system that uses a
collection offuzzy rules, instead of Boolean logic, to reason
about data. Roughly speaking, there are two major forms of
fuzzy rules: the Mamdani-like form (Mamdani & Assilian,
1975) and the Takagi–Sugeno–Kang (TSK) form (Takagi
& Sugeno, 1985; Sugeno & Kang, 1986). The Mamdani-like
rules have the following form:
IF ðx1is A1
i1Þ and ðx2is A2
i2Þ and … and ðxnis An
inÞ
THEN y is Bk;
where xjis the jth input ðj ¼ 1;…;nÞ; A1
term defined as a fuzzy membership function on xj;Bkis the
kth linguistic term defined on the output y. For classification
problems, Bkis simply a singleton. The TSK rules can be
described as:
i1is the ith linguistic
IF ðx1is A1
THEN y ¼ fðxjÞ;
in which the mapping function, f, could be linear,
nonlinear, or simply a real number. Among the two, the
Mamdani-like rules are better suited for the discovery
of human understandable knowledge from real world
data. Both forms have been successfully used in fuzzy
control.
The general fuzzy inference process proceeds in three (or
four) steps, as described below.
i1Þ and ðx2is A2
i2Þ and…and ðxnis An
inÞ
1. Fuzzification. The membership functions defined on the
input variables are applied to their actual values, to
determine the degree of truth (or the matching degree)
for each rule premise.
2. Aggregation. If there are more than one rule premise
‘anded’ together, the matching degrees of all rule
premises are combined normally with a t-form operator
(e.g. MIN, PRODUCT).
3. Inference. Once the truth-value of each rule is
computed, it is then applied to the conclusion part of
each rule. This results in one fuzzy subset to be
assigned to each output variable for each rule. Usually
either the MIN or PRODUCT operator is used. In
MIN inferencing (fuzzy logic AND), the output
membership function is clipped off at a height
corresponding to the rule premise’s computed degree
of truth. In PRODUCT inferencing, the output
membership function is scaled by the rule premise’s
computed degree of truth.
4. Composition. All of the fuzzy subsets assigned to each
output variable are combined together to form a single
fuzzy subset for each output variable. Usually either
the MAX or SUM operator is used. In MAX
composition (fuzzy logic OR), the combined output
fuzzy subset is constructed by taking the pointwise
maximum over all of the fuzzy subsets assigned to
the variable by the inference rule. In SUM compo-
sition, the combined output fuzzy subset is constructed
by taking the pointwise sum over all of the fuzzy
subsets assigned to the output variable by the
inference rule.
T.W. Liao / Expert Systems with Applications 25 (2003) 101–111 102
Page 3
5. Defuzzification. The step is optional and is used when
it is useful to convert the fuzzy output set to a crisp
number. Many defuzzification methods exist; but two
more common ones are the CENTROID and MAXI-
MUM methods. In the CENTROID method, the crisp
value of the output variable is computed by finding
the variable value of the center of gravity of the
membership function for the fuzzy value. In the
MAXIMUM method, one of the variable values at
which the fuzzy subset has its maximum truth-value is
chosen as the crisp value for the output variable.
The specific fuzzy inference method used in this study is
the one used by Wang and Mendel (1992), to be presented in
Section 2.1.5.
The most important step in developing a fuzzy expert
system is to acquire fuzzy rules. This step is often called
knowledge acquisition. This study will use the machine
learning approach for knowledge acquisition, which can
automatically generate fuzzy rules from examples. In
particular, we will employ the method proposed by Wang
and Mendel (1992), called the WM method (see Section
2.1), and a proposed improvement, called the SGA-WM
method (see Section 2.2).
2.1. The WM method
The WM method is a well-known fuzzy modeling
method. It consists of five steps, as summarized below.
2.1.1. Step 1—Divide the input and output spaces into fuzzy
regions
Given a set of examples with multiple inputs (m) and
single output, denoted as ðxk
k ¼ 1,…,n. Define the universe of discourse of each input
variable as ½x2
then divide each universe of discourse into N regions.
Though the authors indicated that N could be different for
different variables, but they did not offer any specific
method to do so and assumed equal lengths of all divided
regions throughout their study. The shape of each member-
ship function associated with each region that defines a
fuzzy term was assumed triangular, denoted as (l, c, r) for
(left bound, center, and right bound). Two special properties
of fuzzy terms so defined are: (1) adjacent terms have 1/2
overlap; and (2) for the middle terms, the left bound of term
i is the center of term i-1 and the right bound of term i is the
center of term i þ 1. Therefore, knowing term centers is
sufficient to determine all the fuzzy triangular terms.
The minimal and maximal values of each variable are
often used to define its universe of discourse. That is,
½x2
be the center of the left end term and the right end term,
respectively. That is, c1j¼ minðxjÞ and cNj¼ maxðxjÞ:
Accordingly, the other term center, cij; can be computed
j;yÞ where j ¼ 1,…,m and
j;xþ
j? and the output variable as ½y2;yþ? and
j;xþ
j? ¼ ½minðxjÞ; maxðxjÞ?: They are also considered to
as follows:
cij¼ minðxjÞ þ iðmaxðxjÞ 2 minðxjÞÞ=ðN 2 1Þ; where i
¼ 2;…;N 2 1:
2.1.2. Step 2—Generate fuzzy rules from given examples
First, determine the membership degrees of each
example belonging to each fuzzy term defined for each
region, variable by variable (including the output variable).
Secondly, associate each example with the term having the
highest membership degree variable-by-variable, denoted as
mdj. Finally, obtain one rule for each example using the
term selected in the previous step. The rules so generated are
‘and’ rules and the antecedents of the IF part of each rule
must be met simultaneously in order for the consequent of
the rule to occur. Letting Txjbe a term selected for variable
xjof an example, a rule could look like:
ð1Þ
IF x1is Tx1ðwith md1Þ and x2is Tx2ðwith md2Þ and…
and xniis Txniðwith mdmÞ THEN y is Ty ðwith mdyÞ:
ð2Þ
2.1.3. Step 3—Assign a degree to each rule
The rule degree is computed as the product of the
membership degree of all variables. Let Dkbe the degree of
the rule generated by example k. Mathematically,
Dk¼ Pj¼1;…m and ymdk
The degree of a rule generated by an example indicates our
belief of its usefulness.
j:
ð3Þ
2.1.4. Step 4—Create a combined fuzzy rule base
When the number of examples is high, it is quite possible
that the same rule could be generated for more than one
example. These rules are redundant rules. In addition, rules
with the same IF part but a different THEN part could also
be generated. Such rules are conflicting. The redundant and
conflicting rules must be removed to maintain the integrity
of the rule base. This is achieved by keeping only the rule
with the highest degree for each fuzzy region. The one with
the highest degree is deemed most useful; therefore, it is
kept.
Up to this step, the fuzzy rule base is complete. Next, the
usefulness of the rule base must be shown using some fuzzy
inference method.
2.1.5. Step 5—Determine a mapping based on the combined
fuzzy rule base
To predict the output of an unseen example denoted as x;
the centroid defuzzification formula is used. Accordingly,
the predicted output, ^ y; is computed as
^ y ¼
XR
XR
r¼1amdrcr
r¼1amdr
ð4Þ
T.W. Liao / Expert Systems with Applications 25 (2003) 101–111 103
Page 4
where amdr¼ Pj¼1;mmdr
quent term of rule r; and R denotes the total number of
combined rules.
j; cris the center value of conse-
2.2. Determine the best combination of fuzzy terms by
genetic algorithms
To allow for varying number of partitions of each
universe of discourse, a genetic algorithm is introduced to
modify the first step of the WM method. The genetic
algorithm used for this study is a simple GA (Goldberg,
1989) with specially tailored chromosome representation,
decoding scheme, and fitness function. Therefore, this new
method is called the SGA-WM method.
The binary strings are used to represent chromosomes.
The length of each binary string is dictated by the number of
variables and the maximally allowed number offuzzy terms
for each variable. In other words, each binary string is a
concatenation of binary coded number offuzzy terms for the
first variable, the second variable, ···, and the last variable.
For the sake of simplicity, the maximal allowed number of
fuzzy terms is assumed to be the same for all variables.
Therefore, each variable occupies the same segment size in
a binary string. Of course, this constraint can be relaxed if
there is a prior knowledge that this assumption is invalid.
The decoding scheme is a simple binary-to-decimal
conversion. Each decoded chromosome corresponds to a
possible combination of fuzzy terms for all variables
involved.
The fitness function is the entire process of testing the
fuzzy model generated either by the WM method that was
summarized in Section 2.1.5. The fitness value, F, is derived
from two values produced by the testing process: accuracy
rate and model size (or number of rules). First the model
size, R, is normalized by the maximal possible size that is
equal to the total number of possible fuzzy terms to the
power of the number of input variables, m. Let Max N and
Min N denote the maximum and minimum fuzzy term
value, respectively. The total number of possible fuzzy
terms is thus ðMax N 2 Min N þ 1Þ: The weighted aver-
aging operation is then applied to calculate the fitness value.
The accuracy rate, AR, is considered far more important
than the model size in this study by assigning the weight
ratio of 9:1. Mathematically,
F ¼ 0:9AR 2 0:1R=ðMax N 2 Min N þ 1Þm
The mechanics of the GA are standard: point mutation,
single point crossover, and roulette wheel selection.
Standard roulette wheel selection is used to reproduce
offspring for the next generation. Each current string in
the population has a roulette wheel slot sized in
proportion to its fitness. Strings with a higher fitness
value thus have a higher probability of contributing more
offspring. The probability of crossover is 0.6 per pair of
chromosomes selected. The per-bit probability of a
mutation is set at 0.1.
ð5Þ
The population is randomly initialized. The GA is run
with a population offixed size. Population size was found to
be a significant factor in another study of ours. Therefore,
three different population sizes (30, 300, and 3000) were
used to investigate their effects. The process stops when the
maximum number of generations that is set equal to 20 is
reached.
3. Test results
The data used in this study is taken from (Wang & Liao,
2002), which were extracted from radiographic images of
industrial welds. The original data set has 147 tuples with
each tuple having 12 numeric attributes (or features). The
categorical (or pattern) value of each record (ground truth or
baseline) is known, which indicates the welding flaw type.
Six commonly seen welding flaws are captured in this data
set; they include porosities, lack of penetrations, cracks,
lack offusions, gas holes, and inclusions. Refer to Wang and
Liao (2002) for a more detailed description about these
attributes and the extraction procedure.
The entire 147 tuples were randomly sampled three times
for two sample sizes: 100 and 50. The resultant data sets are
labeled 100a, 100b, 100c, 50a, 50b, and 50c. The fuzzy if–
then rules were generated from each sampled data set by
using either the WM method or the SGA-WM method. The
accuracy of the fuzzy expert system implementing the rule
base so generated is tested with the entire 147 tuples.
3.1. Feature selection
Feature selection refers to the selection of an optimum
subset of features derived from input variables. Many
feature selection methods exist and they are generally
classified into two categories. Methods assessing the quality
of feature subsets according to the prediction error of a
classifier are called wrapper methods. Those using criteria
such as correlation coefficients that do not involve the
classifier are called filter methods. The later method is used
here, but will also be confirmed by performance evaluation.
The major objective of this study is to show the
effectiveness of the fuzzy expert system approach compared
to other techniques. We could very well use all features.
However, the interpretability of the fuzzy if–then rules
tends to decrease as the number of dimensions increases.
According to the 7 ^ 2 rule that is the maximum number
that is considered manageable, we would like to use a
feature set with lower dimensions than 12. Of course, one
should guard against the degradation in classification
accuracy as a result of reduced dimensions.
To this end, a relative simple feature selection approach
is taken. First, the correlation between each independent
variable and the dependent variable is computed. Then,
starting with all features the independent variable having
lower correlation with the dependent variable is excluded
T.W. Liao / Expert Systems with Applications 25 (2003) 101–111 104
Page 5
from the feature set. In this manner, four feature sets were
created with 12 (all), 9, 7, and 5 features. To compare the
performance of different feature sets, the data set of 100
samples is used to generate rule sets by the original WM
method by assuming that each feature has five fuzzy terms.
These rules sets are then used to build fuzzy expert systems
for testing. Fig. 1 shows that the accuracy decreases as the
number of features is reduced. Therefore, the nine-feature
sets and the seven-feature sets, which meet the 7 ^ 2 rule
without too much loss of accuracy, are chosen for the
subsequent study.
3.2. Systems using fuzzy if–then rules generated by the
original WM method
For each sampled data set and each feature set, the WM
method was used to generate the fuzzy if–then rules. Since
the original WM method does not provide a method for
determining the number of fuzzy terms for each variable.
Three values were chosen to cover a wide range of possible
number of terms: 3, 6, and 9 (the maximum of the 7 ^ 2
rule). For each chosen number offuzzy terms, a different set
of fuzzy if–then rules is obtained. Note that the number of
fuzzy terms is assumed the same for each variable for the
sake of simplicity.
Figs. 2 and 3 show the classification accuracy of each
sampled data set as well as the average performance of three
sampled data sets with the same size, for the 9-feature sets
and the 7-feature sets, respectively. The results indicate that:
1. The accuracy depends greatly upon the number of fuzzy
terms used. Among all tested, five terms per variable is
the best. Choosing the correct number of fuzzy terms is
thus important.
2. Generally the smaller the sample size is, the lower the
classification accuracy performance and the higher the
coefficient of variance. However, when very few fuzzy
terms are used (i.e. 3), the smaller the sample size is, the
higher the accuracy.
3. The average accuracy either stays the same or
decreases as the sample size reduces, except the
case of 6 terms.
Therefore, using the original WM method to generate the
rule base for a fuzzy expert system has to address the issue
of choosing the right combination of fuzzy terms. Since the
possible number of combinations is quite high, an
Fig. 1. Accuracy versus number of features, with 5 terms per feature.
Fig. 2. Performance of fuzzy expert systems using 9-feature rules generated by the WM method.
T.W. Liao / Expert Systems with Applications 25 (2003) 101–111105