Improved SPC chart pattern recognition using statistical features

int. j. prod. res., 2003, vol. 41, no. 7, 1587–1603
Improved SPC chart pattern recognition using statistical features
A. HASSANy*, M. SHARIFF NABI BAKSHy,
A. M. SHAHAROUNy and H. JAMALUDDINy
Increasingly rapid changes and highly precise manufacturing environments
require timely monitoring and intervention when deemed necessary. Traditional
Statistical Process Control (SPC) charting, a popular monitoring and diagnosis
tool, is being improved to be more sensitive to small changes and to include more
intelligence to handle dynamic process information. Artificial neural networkbased SPC chart pattern recognition schemes have been introduced by several
researchers. These schemes need further improvement in terms of generalization
and recognition performance. One possible approach is through improvement in
data representation using features extracted from raw data. Most of the previous
work in intelligent SPC used raw data as input vector representation. The literature reports limited work dealing with features, but it lacks extensive comparative
studies to assess the relative performance between the two approaches. The objective of this study was to evaluate the relative performance of a feature-based SPC
recognizer compared with the raw data-based recognizer. Extensive simulations
were conducted using synthetic data sets. The study focused on recognition of six
commonly researched SPC patterns plotted on the Shewhart X-bar chart. The
ANN-based SPC pattern recognizer trained using the six selected statistical features resulted in significantly better performance and generalization compared
with the raw data-based recognizer. Findings from this study can be used as
guidelines in developing better SPC recognition systems.
1. Introduction
The increase in demand for faster delivery, small order quantity and highly
precise products has led manufacturing systems to move towards becoming more
flexible, integrated and intelligent. This requires that in monitoring critical processes,
process information should be analysed rapidly, in a timely fashion and continuously
for decision-making. Advances in manufacturing and measurement technology have
enabled real-time, rapid and integrated gauging and measurement of process and
product quality. Unfortunately, traditional Statistical Process Control (SPC)
monitoring and diagnosis approaches are insufficient to cope with these new
developments. Generally, enhancement is needed for this tool to be more sensitive
to small changes, to be adaptable to a dynamic process environment, to enable rapid
analysis and to become more informative and intelligent. Figure 1 shows the
relationship between advances in manufacturing technology and the need for
improvement in process monitoring and diagnosis.
SPC charts are widely used for monitoring manufacturing process and product
variability and are useful for ‘listening to the voice of the process’ (Oakland 1996).
Revision received August 2002.
{ Faculty of Mechanical Engineering, Universiti Teknologi Malaysia, 81310 UTM Skudai,
Johor Bahru, Malaysia.
* To whom correspondence should be addressed. e-mail: [email protected]
International Journal of Production Research ISSN 0020–7543 print/ISSN 1366–588X online # 2003 Taylor & Francis Ltd
http://www.tandf.co.uk/journals
DOI: 10.1080/0020754021000049844
1588
A. Hassan et al.
Manufacturing
Technology
*Automated , flexible,
integrated & intelligent
systems
*Rapid & short runs
*High precision
Measurement
Technology
*Real time/on-line data
acquisition
*Rapid gauging &
sensing
*abundant data
Product &
Process
Data
Monitoring &
Diagnosis
*monitoring and
diagnosis tools need to
be more sensitive to
small change,
adaptable, rapid,
informative, and
intelligent.
*Traditional SPC
monitoring tool need to
be enhanced
Figure 1.
Interaction between advances in manufacturing technology, quality monitoring
and diagnosis.
Properly implemented SPC charting techniques can identify when a particular
process is operating within a statistically in-control (stable) state or statistically
out-of control (unstable) state. Further, analysis of the observations plotted on the
SPC charts provides process information, which can be useful for diagnostic
purposes. Unstable processes may produce time series patterns such as cyclic,
linear trend-up, linear trend-down, sudden shift-up, sudden shift-down, mixtures,
stratification and systematic when plotted on a Shewhart X-bar chart. Identification
of these patterns coupled with engineering knowledge of the process leads to a more
focused diagnosis. This significantly minimizes efforts for troubleshooting.
Traditionally, SPC chart patterns have been analysed and interpreted manually.
Towards the end of the 1980s, several researchers such as Swift (1987) and Cheng
(1989) proposed the use of expert systems for SPC chart analysis and interpretation,
as manual methods were no longer sufficient for the situation described above.
Developments in computing technology have motivated researchers to explore the
use of artificial neural networks (ANN) for SPC chart pattern recognition (Hwarng
and Hubele 1991, Hwarng and Hubele 1993, Pham and Oztemel 1993). The use of
neural network technology has overcome some of the drawbacks in the traditional
expert system approaches. Neural networks offer useful properties and capabilities
such as non-linearity, input-output mapping, adaptability and fault tolerance,
Improved SPC chart pattern recognition using statistical features
1589
among others (Haykin 1999). Since then, several other researchers have proposed
various ANN-based SPC chart pattern recognizers.
There are many factors that influence the performance of ANN-based pattern
recognizers. Among these are the design of the network itself (micro- and macrolevels), selection of training algorithms and training strategies, and the representation of input data for training and testing.
Most of the existing SPC pattern recognition schemes in the literature use normalized raw data as the input vector to the recognizer. These data representations
normally produce large ANN structures and are not very effective and efficient for
complicated recognition problems. A smaller ANN size can lead to faster training
and generally more effective and efficient recognition. This limitation can be overcome with the use of features for representing data as demonstrated in pattern
recognition applications for handwritten (Zeki and Zakaria 2000), characters
(Amin 2000) and grain grading (Utku 2000) among others.
The common motivation for using features extracted from raw data is dimensionality reduction (Pandya and Macy 1996), which would significantly reduce the
size of the input vector. It was hypothesized that a smaller network size using the
feature-based SPC pattern recognizer would perform and generalize better than the
raw data-based recognizer. Generalization here means the ability of a recognizer to
recognize correctly a pattern it has not been trained on.
Very limited work has been reported on the use of features extracted from SPC
chart signals as the input vectors into ANN-based SPC pattern recognizers. Pham
and Wani (1997) introduced feature-based control chart pattern recognition. Nine
geometric features were proposed: slope, number of mean crossings, number of leastsquare line crossings, cyclic membership, average slope of the line segments, slope
difference and three different measures for area. The scheme was aimed at improving
the performance of the pattern recognizer by presenting a smaller input vector
(features).
Tontini (1996, 1998) developed an online learning pattern recognizer for SPC
chart pattern classification based on Radial Basis Functions Fuzzy-Artmap Neural
Network. His input vector consisted of combinations of 60 individual raw observation data, mean and standard deviation of 15 statistical windows, 10 lags of autocorrelation, results of the computational Cusum chart and chi-square statistic. It
would appear that combining all these simultaneously would result in a large input
vector. No comparison between raw data against features set as the input vector
representation was reported. The focus of his study was on developing a recognizer
for online incremental learning.
Pham and Wani (1997), Wani and Pham (1999) and Tontini (1996, 1998) did not
report on the relative merits of raw data and feature set as the input vector
representation. Our extensive literature review of major international journals only
found that Anagun (1998) conducted a rather limited comparative study between the
effectiveness of direct representation (raw data) against feature-based representation.
He used a set of frequency counts as the features. The robustness of his feature set
seems to be rather limited since it loses the information on the order of the data. A
more extensive investigation is needed to identify the relative merits of the two
approaches for input vector representation. Thus, the purpose of this current
study is to fill this gap through investigating the classification performance when
using a set of statistical features compared with the raw data as input representation.
1590
A. Hassan et al.
The paper is organized as follows. Section 2 presents the patterns used and their
generation, while section 3 discusses the statistical features investigated. Section 4
discusses the design of the pattern recognizers followed by the experimental procedures in section 5. Section 6 provides the results and discussion on the comparison
between the two types of recognizers. Section 7 presents some conclusions.
2.
Sample patterns
Fully developed patterns were investigated. Ideally, sample patterns should be
developed from a real process. Since a large amount of samples was required for the
recognizers’ training and they were not economically available, simulated data were
used. This is a common approach adopted by other researchers as mentioned above.
This study adopted Swift’s (1987) methodology to simulate individual process data
since the methodology has been widely accepted by other researchers. In this study,
each sample pattern consisted of 20 subgroup averages of time sequence data with a
sample size of five. The parameters used for simulating the six commonly researched
control chart patterns are given in table 1. The values of these parameters were
varied randomly in a uniform manner between the limits shown. Random noise of
1/3 was added to all unstable patterns.
These parameters were chosen to keep the patterns within the control limits since,
for preventive purposes, the status of a process should be identified while it is
operating within these limits. It may be too late for preventive action if the
‘alarm’ is only generated after hitting the control limits. The minimum parameter
values were chosen such that the patterns were sufficiently differentiable even after
being contaminated by random variation. It was assumed that only one fundamental
period existed for cyclic patterns, and a sudden shift only appeared in the middle of
an observation window. A total of 2160 and 3600 sample patterns were used in the
training and recall phases, respectively.
3.
Statistical features
The choice of statistical features to be extracted from the raw data to be presented as the input vector into the recognizer is very important. Battiti (1994) noted
that the mixture content of a feature set to represent the original signal has impact on
learning and generalization performance of recognizers. The presence of too many
input features can burden the training process and lead to inefficient recognizers.
Features low in information content or redundant should be eliminated whenever
possible. Redundant here refers to features with marginal contribution given that
other features are present.
Pattern type
Linear trend-up
Linear trend-down
Sudden shift-up
Sudden shift-down
Cyclic
Stable process
Table 1.
Parameters (in terms of )
gradient: 0.015–0.025
gradient: 0.025 to 0.015
shift magnitude: 0.7–2.5
shift magnitude: 2.5 to 0.7
amplitude: 0.5–2.5; period ¼ 10
mean ¼ 0, SD ¼ 1
Parameters for simulating SPC chart patterns.
Improved SPC chart pattern recognition using statistical features
Selected features
Omitted features
Mean
SD
Skewness
Mean-square value
Autocorrelation
Cusum
Median
Range
Kurtosis
Slope
Table 2.
1591
Selected and omitted features
(Hassan et al. 2002).
A two-level resolution IV fractional factorial experimental design, 2105
IV
(Montgomery 2001b) was used for screening and selecting a minimal set of representative statistical features from a list of 10 possible candidate features. Detailed
discussion on this can be found in Hassan (2002). Table 2 summarizes the selected
and omitted features. The mathematical expressions for these statistical features are
provided in the appendix.
Most of the above features are self-evident as they refer to commonly used
summary statistics. The mean square value is the ‘average power’ of the signal
(Brook and Wynne 1988). The Cusum statistics incorporates all information from
a sequence of sample values by accumulating the sums of deviations of sample values
from a target value (Montgomery 2001a). The last values of algorithmic Cusum were
used. The slope used was estimated based on the least-squares method (Neter et al.
1996). Averages for autocorrelation at lag 1 and 2 were used for feature autocorrelation. The six features recommended above were then used to represent the input data
for training and testing the feature-based recognizers.
4. Pattern recognizer design
The recognizer was developed based on multilayer perceptrons (MLPs) architecture, as it has been applied successfully to solve some difficult and diverse problems
such as in modelling, prediction and pattern classification (Haykin 1999). Its basic
structure comprises an input layer, one or more hidden layer(s) and an output layer.
Figure 2 shows an MLP neural network structure comprising these layers and their
respective weight connections, w1ji and w2kj . Details of design procedures for such
MLP are widely available, for example in Patterson (1996) and Haykin (1999).
Before this recognizer can be put into application, it needs to be trained and
tested. In the supervised training approach, sets of training data comprising input
and target vectors are presented to the MLP. The learning process takes place
through adjustment of weight connections between the input and hidden layers
(w1ji ) and between the hidden and output layers (w2kj ). These weight connections
are adjusted according to the specified performance and learning functions.
The number of input nodes at the input layer was set according to the actual
number of statistical features used. In this study, the size of input vector was six
corresponding to the selected feature set given in table 2. When the raw data were
used, the input node size was equal to the size of the observation window, i.e. 20. The
number of output nodes in this study was set corresponding to the number of pattern
classes, i.e. six. The number of nodes in the hidden layer was chosen based on trial
1592
A. Hassan et al.
Input layer
Hidden layer
Output layer
j
i
k
Input vector
O1
O2
OM
wkj2
w1ji
Figure 2.
MLP neural network structure.
and error. Thus, the ANN structures were 20 6 6 and 6 6 6 for the
recognizers using raw data and statistical features as the inputs, respectively.
Since this study used the supervised training approach, each pattern presentation
was tagged with its respective label. The labels, shown in table 3, are the targeted
values for the recognizers’ output nodes. The maximum value in each row (0.9)
identifies the corresponding node expected to secure the highest output for a pattern
to be considered correctly classified. The output values are denoted as O1 ,
O2 ; . . . ; OM in figure 2.
Preliminary investigations were conducted to choose a suitable training algorithm. Three types of back propagation training algorithms, namely gradient descent
with momentum and adaptive learning rate (traingdx), BFGS quasi-Newton
(trainbfg), and Levenberg-Marquardt (trainlm) algorithms (Demuth and Beale
1998), were evaluated. The traingdx was adopted here since it provided reasonably
good performance and more consistent results. It was also more memory-efficient
Targeted recognizer outputs
Node
Pattern class
1
2
3
4
5
6
Description
1
2
3
4
5
6
Random
Linear trend-up
Linear trend-down
Sudden shift-up
Sudden shift-down
Cyclic
0.9
0.1
0.1
0.1
0.1
0.1
0.1
0.9
0.1
0.1
0.1
0.1
0.1
0.1
0.9
0.1
0.1
0.1
0.1
0.1
0.1
0.9
0.1
0.1
0.1
0.1
0.1
0.1
0.9
0.1
0.1
0.1
0.1
0.1
0.1
0.9
Table 3.
Targeted recognizer outputs.
Improved SPC chart pattern recognition using statistical features
1593
compared with the trainlm. Trainlm gave the fastest convergence with the least
epochs but it required too much memory. Trainbfg gave much faster convergence
compared with traingdx, but the results were relatively less consistent. The network
performance was measured using the mean squared error (MSE). The activation
functions used were hyperbolic tangent (tansig) for the hidden layer and sigmoid
(logsig) for the output layer. The hyperbolic tangent function is given by
f ðxÞ ¼ ex ex =ex þ ex with output range from 1 to þ1. The sigmoid function
is given by f ðxÞ ¼ 1=1 þ ex with output varying monotonically from 0 to 1. The
variable x is the net input of a certain processing element (Hush and Horne 1993,
Patterson 1996).
5. Experimental procedure
Two types of ANN recognizers were developed: one used raw data as the input
vector while the other used statistical features extracted from the data as the input
vector. Before the relative merit between the use of raw data and the use of features
as input vector representation could be evaluated, the recognizers had to be properly
trained and tested. This section discusses the procedures for the training and recall
(recognition) phases of the recognizers. The recognition task was limited to the six
previously mentioned common SPC chart patterns. All the procedures were coded in
MATLAB1 using its ANN toolbox.
5.1. Training phase
Figure 3 shows the training procedure for the recognizers and table 4 provides
the details of training specifications.
The overall procedure began with the generation and presentation of process
data to the observation window. All patterns were fully developed when they
appeared within the recognition window. For raw data as the input vector, the
pre-processing stage involved only basic transformation into standardized Normal,
Nð0; 1Þ values. On the other hand, the statistical features approach involved extraction of statistical values from the raw data. These statistical features were then
normalized such that their values would fall within ½1; 1. The rest of the procedure
was the same for both approaches. Before the sample data were presented to the
ANN for the leaning process, it was divided into training (60%), validation (20%)
and preliminary testing (20%) sets (Demuth and Beale 1998). These sample sets were
then randomized to avoid possible bias in the presentation order of the sample
patterns to the ANN.
The training procedure was conducted iteratively covering ANN learning,
validation of in-training ANN and preliminary testing. During learning, a training
data set (2160 patterns) was used for updating the network weights and biases. The
ANN was then subjected to in-training validation using the validation data set (720
patterns) for early stopping to avoid over fitting. The error on the validation set will
typically begin to rise when the network begins to over fit the data. The training
process was stopped when the validation error increases for a specified number of
iterations. In this study, the maximum number of validation failures was set to five
iterations. Demuth and Beale (1998) provide further discussion on the use of early
stopping for improving the generalization of the network. The ANN was then
subjected to preliminary performance tests using the testing data set (720 patterns).
The testing set errors were not used for updating the network weights and biases.
However, the results from the preliminary tests in terms of percentage of correct
1594
A. Hassan et al.
Start
Process data &
observation window
Input data
representation
Preprocess:
Feature extraction and
normalisation
Preprocess:
Standardisation
Divide and randomize sample
patterns
Preliminary
testing set
Validation
set
Training
set
ANN learning
(Backpropagation)
Validation of intraining ANN
Training stopping
criteria met?
(Validation test)
Retraining using
new data set
Next
epoch
No
Yes
Testing the trained
recogniser
Satisfy
acceptance
criteria?
Yes
Accept the trained
recogniser
No
Retraining
allowed?
Yes
No
Select the best
trained recogniser
End
Figure 3.
Training procedure for the recognizers using raw data and statistical features.
Improved SPC chart pattern recognition using statistical features
Specification
1595
Value
Number of training samples
2160 patterns
Training stopping criteria
Maximum no. of epochs ¼ 300
Error goal ¼ 0:01
Maximum no. of validation failures ¼ 5
Acceptance criteria for
trained recognizer
MSE 0:01
Percentage of correct classification 95%, or the best
recognizer after retraining
Maximum number of
retraining allowed
twice
Table 4.
Training specifications.
classification and the test set errors were used as acceptance criteria of the trained
recognizers. In other words, the decision on either to accept the trained recognizers
or to allow for retraining was made based on these preliminary performance tests.
The training was stopped whenever one of the following stopping criteria was
satisfied: the performance error goal was achieved, the maximum allowable number
of training epochs was met or the maximum number of validation failures was
exceeded (validation test).
Once the training stopped, the trained recognizer was evaluated for acceptance.
The acceptance criteria as given in table 4 were compared with the recognizer’s
preliminary performance results. The recognizer would be retrained using a totally
new data set if its performance remained poor. This procedure was intended to
minimize the effect of poor training sets.
Each type of recognizer (statistical features input and raw data input) was replicated by exposing them to 10 different training cycles, giving rise to 10 different
trained recognizers for each type. These recognizers are labelled 1.1–1.10 in table 5
and 2.1–2.10 in table 6 for the raw data and statistical features input, respectively.
All 10 recognizers in each type have the same architecture and differ only in the
training data sets used. The Recognizer 1:j and Recognizer 2:j for j ¼ 1; 2; . . . ; 10,
were trained using the same training data set. Discussion on the training and recall
performance provided in tables 5 and 6 are given in section 6.
5.2. Recall or recognition phase
Once accepted, the trained recognizer was tested (recall phase) using 10 different
sets of fresh totally unseen data sets. The testing procedure for the recall phase is
shown in figure 4. Results of the recall phase are presented and discussed below.
6. Results and discussion
This section presents results and comparisons of the performance between the
feature-based recognizers trained and tested using the six recommended statistical
features as given in table 2 and the recognizers trained and tested using raw data.
Tables 5 and 6 show the training and recall performance of the 10 raw data-based
recognizers and the 10 feature-based recognizers, respectively. It was noted during
training that feature-based recognizers were more easily trained. None of the featurebased recognizers required retraining whilst all of the raw data-based recognizers
1596
A. Hassan et al.
Input: raw data
Training phase
Recognizer no.
Percentage correct
classification
Training
error
ðMSEÞ
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
1.10
Overall mean
SD
90.14
94.31
92.64
92.78
92.64
95.42
92.5
92.64
91.81
94.44
92.93
1.4863
0.0216
0.0113
0.0124
0.0235
0.0161
0.0131
0.0146
0.0124
0.0207
0.0121
0.0158
0.0045
Table 5.
Recall phase
Percentage correct
classification
No. of
epochs
Mean
SD
177
294
278
181
278
198
300
278
279
217
248
48.84
89.961
94.103
93.220
92.205
93.156
92.880
92.236
93.220
91.321
92.851
92.515
1.169
0.683
0.534
0.488
0.506
0.536
0.441
0.547
0.488
0.467
0.511
Training and recall performance for raw data-based recognizers.
Input data: statistical features
Training phase
Recognizer no.
Percentage correct
classification
Training
error
ðMSEÞ
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
2.10
Overall mean
SD
96.96
98.06
96.81
97.78
97.78
96.81
95.97
96.39
98.19
97.5
97.22
0.7467
0.0099
0.0099
0.0098
0.0099
0.0100
0.0100
0.0100
0.0100
0.0100
0.0099
0.00994
0.00007
Table 6.
Recall phase
Percentage correct
classification
No. of
epochs
Mean
SD
213
241
219
220
204
265
225
224
221
220
225.2
16.81
96.84
97.11
96.30
96.78
96.36
97.18
96.98
97.01
96.80
96.58
96.79
0.30
0.314
0.399
0.408
0.222
0.387
0.303
0.351
0.377
0.289
0.226
Training and recall performance for feature-based recognizers.
required such retraining before they could be accepted. The overall mean percentages
of correct recognition of raw- and feature-based recognizers were 92.5 and 96.8%,
respectively. The percentages ranged from 89.96 to 94.10% and from 96.30 to
97.18% for the raw data- and feature-based recognizers, respectively. The results
for statistical significance tests are summarized in table 7. Paired t-tests ( ¼ 0:01)
were conducted as described in Walpole et al. (1998) for 10 pairs of raw data- and
Improved SPC chart pattern recognition using statistical features
1597
INPUT DATA AND
PREPROCESSING
Process Data
(Testing sets)
SPC monitoring
window
Raw-data-based
recogniser
Which
recogniser
to Test?
Preprocess:
standardisation
Feature-based recogniser
Preprocess:
feature extraction &
normalisation
Testing the trained
ANN recogniser
Performance
evaluation
Figure 4.
Testing procedure for the trained recognizers (recall phase).
feature-based recognizers for their performance in terms of percent of correct
classification and the training error (MSE).
The results in table 7 suggest that the difference in recognition accuracy between
the type of recognizers was significant. This confirms that features as input data
representation give better recognition performance compared with raw data. This
finding is consistent with those reported in Pham and Wani (1997) and Wani and
Pham (1999) although they used different sets of features. Further, the above comparison shows that the difference between the training errors (MSE) for the training
phase between the two types of recognizers was significant. This result indicates that
1598
A. Hassan et al.
Performance measure
Hypotheses
tstatistics
(T)
tcritical
ðt )
Decision
Recall performance
H0 : recallðFeatureRawÞ ¼ 0
H1 : recallðFeatureRawÞ > 0
11.148
2.821
reject H0
MSE
H0 : MSEðRawFeatureÞ ¼ 0
H1 : MSEðRawFeatureÞ > 0
4.100
2.821
reject H0
Table 7.
Statistical significance tests for difference in performance.
more training efforts would be required if the raw data-based recognizer were to
achieve the required error margin.
6.1. Confusion matrix
The confusion matrix is a table summarizing the tendency of the recognizer to
classify a recognized pattern into a correct class or into any of the other five possible
(wrong) classes. Confusion matrices, as given in tables 8 and 9, provide the overall
mean percentages for confusions among pattern classes for 10 raw data-based and 10
feature-based recognizers, respectively. In other words, they are the mean scores
from 100 such matrices (10 recognizers 10 testing sets).
Tables 8 and 9 show that there is confusion in the classification process for both
types of recognizers. For the raw data-based recognizer, there was tendency for the
random pattern to be most confused with the cyclic, the linear trend-up pattern with
sudden shift up, and linear trend-down pattern with sudden shift down. The featurebased recognizers also demonstrated almost a similar confusion tendency except for
the random patterns. These pairings could be the result of the confused pairs sharing
many similar characteristics.
Random patterns were the hardest to be classified for the raw data-based recognizers (85.7%). They misclassify about 6% of random patterns as cyclic patterns and
about 5% of cyclic patterns as random ones. However, for the feature-based recognizers, shift patterns were the hardest to be classified (about 94%). These patterns
tended to be confused with random and linear trend patterns for about 2–3% of
cases.
Generally, the results for classification of random patterns in table 8 (85.7%) and
table 9 (95.2%) suggest that the type I error performance for both types of
recognizers does not seem to be very good. This is possibly due to the unpredictable
Pattern class identified by raw data-based recognizer
True
pattern
class
Random
Trend-up
Trend-down
Shift up
Shift-down
Cyclic
Table 8.
Random
Trend-up
Trend-up
Shift-up
Shift-down
Cyclic
85.73
0.01
0.01
3.35
2.81
5.27
1.24
98.31
0.00
7.64
0.00
0.01
1.35
0.00
97.44
0.00
6.81
0.00
2.99
1.68
0.00
88.62
0.00
0.01
2.71
0.00
2.55
0.00
90.28
0.00
5.98
0.10
0.00
0.38
0.09
94.71
Mean percentage for confusion using raw data as the input vector.
1599
Improved SPC chart pattern recognition using statistical features
Pattern class identified by raw features-based recognizer
True
pattern
class
Random
Trend-up
Trend-down
Shift up
Shift-down
Cyclic
Table 9.
Random
Trend-up
Trend-up
Shift-up
Shift-down
Cyclic
95.24
0.00
0.00
2.33
2.62
0.91
0.00
99.23
0.00
2.90
0.00
0.00
0.01
0.00
99.36
0.00
2.62
0.00
2.30
0.74
0.00
94.50
0.00
0.44
1.89
0.00
0.63
0.00
94.32
0.54
0.56
0.03
0.01
0.27
0.45
98.10
Mean percentage for confusion using statistical features as the input vector.
Percentage correction recognition
Random
Trend-up
Trend-down
Shift-up
Shift-down
Cyclic
Data set Feature Raw Feature Raw Feature Raw Feature Raw Feature Raw Feature Raw
1
2
3
4
5
6
7
8
9
10
96.58
95.28
95.75
94.62
95.65
95.03
94.05
95.80
94.88
94.75
87.25
86..58
86.43
84.93
86.02
86.77
83.97
85.00
86.07
84.32
98.83
99.42
99.30
99.25
99.03
99.49
99.23
99.27
99.02
99.45
98.73
98.65
98.47
98.19
98.82
98.05
97.98
97.92
98.08
98.18
99.30
99.52
99.27
99.60
99.28
99.05
99.44
99.45
99.48
99.23
97.85
97.75
97.93
97.05
98.05
97.33
97.03
97.45
96.62
97.33
99.30
99.52
99.27
99.60
99.28
99.05
99.44
99.45
99.48
99.23
88.44
87.92
90.67
99.63
89.52
88.58
87.60
87.90
88.95
88.03
93.72
94.62
93.80
95.15
93.92
95.08
93.38
94.10
95.07
94.40
90.27
90.30
90.02
90.50
89.28
91.25
90.12
89.25
91.98
89.85
97.62
97.93
98.57
97.62
98.48
98.55
97.17
98.23
98.95
97.92
93.85
95.53
94.62
93.98
95.28
94.77
93.30
94.48
95.68
95.58
Table 10. Recognition performance of raw data-based and feature-based recognizers.
Type of patterns
Hypotheses
tstatistics
(T)
tcritical
ðt )
Decision
Random
H0 : randomðFeatureRawÞ ¼ 0
H1 : randomðFeatureRawÞ > 0
37.91
2.82
reject H0
Trend-up
H0 : trendupðRawFeatureÞ ¼ 0
H1 : trendupðRawFeatureÞ > 0
6.32
2.82
reject H0
Trend-up
H0 : trenddownðRawFeatureÞ ¼ 0
H1 : trenddownðRawFeatureÞ > 0
11.26
2.82
reject H0
Sudden shift up
H0 : shiftupðRawFeatureÞ ¼ 0
H1 : shiftupðRawFeatureÞ > 0
34.86
2.82
reject H0
Sudden shift-down
H0 : shiftdownðRawFeatureÞ ¼ 0
H1 : shiftdownðRawFeatureÞ > 0
19.94
2.82
reject H0
Cyclic
H0 : cyclicðRawFeatureÞ ¼ 0
H1 : cyclicðRawFeatureÞ > 0
18.11
2.82
reject H0
Table 11. Statistical significance test for recognition performance of raw data-based and
feature-based recognizers.
1600
A. Hassan et al.
structure of random data streams that make them relatively more difficult to be
recognized compared with unstable patterns. On the other hand, unstable data
streams have a tendency to correlate among the successive data. As such, the
structures of their patterns are more predictable and this may have contributed
towards easier recognition of unstable patterns.
The confusion among patterns could also possibly be attributed to some vague
patterns (due to low amplitude, gradient, etc.) and interference from baseline noise.
The recognizers were designed to select the pattern corresponding to the output node
with the maximum value. One possible approach to overcome this confusion is by
considering the quality of each output node value. A weak output should be classified as a reject class even though it is the maximum value.
The average recognition performances of the respective 10 raw-based and 10
feature-based recognizers to recognize correctly different type of patterns are given
in table 10.
The recognition performance for each type of pattern was compared statistically
using paired t-tests ( ¼ 0:01) and the results are summarized in table 11. These
results clearly suggest that the feature-based recognizers are statistically less
confused for all type of patterns compared with the raw data-based recognizers
(all H0 were rejected).
7.
Conclusions
The objective of this study was to evaluate the relative performance of featurebased SPC recognizers compared with the raw data-based recognizers. The MLP
neural network was used as a generic recognizer to classify six different types of SPC
chart patterns. A set of six statistical features was used in training and testing the
feature-based recognizers.
In this study, feature-based recognizers achieved a statistically significant
improvement in recognition performance. Further, the use of the statistical feature
set required less training effort and resulted in better recall performance. These
confirm the expectation that a feature-based input vector representation results in
better recognizer performance. It is important to note that this is true only when a
proper set of representative features is used. Thus, summary statistics can be used as
a reliable and better alternative in representing the SPC chart pattern data. The
feature-based scheme used in this study is capable of coping with a high degree of
pattern variability that is within the control limits.
The findings can be used as guidelines to develop better SPC pattern recognizers.
Currently, we are extending this work to recognize transitional SPC chart patterns
for real time monitoring and recognition. In this effort, mechanisms to improve type
I errors are also being addressed. Other pattern types such as stratification, mixture
and systematic are to be included in future studies, as well as other features. This
work can also be extended to investigate online learning and effect of costs on the
decisions.
Acknowledgements
The authors thank Professor D. T. Pham for giving the opportunity to A. H. to
conduct part of this study in the Intelligent Systems Laboratory, Cardiff University
of Wales, UK. They also acknowledge the anonymous referees for comments and
suggestions.
Improved SPC chart pattern recognition using statistical features
1601
Appendix: Mathematical expressions for the statistical features
The mathematical expressions for extracting statistical features (mean, median,
standard deviation and range) are widely available in most of the texts on SPC or
statistics. The mathematical expressions for extracting the rest of the statistical
features are as the following:
A.1.
Mean-square value
2
2
2
2
x þ x 1 þ x 2 þ þ xN
1
xe2 ¼ 0
¼
Nþ1
Nþ1
N
X
x2i :
ðA 2Þ
i¼0
A.2. Cusum
The tabular cusum provides accumulated deviation fromo that are above target
with statistic C þ and accumulated deviations from o below target with C statistic.
The statistics C þ and C which represented one-sided upper and lower cusums are
computed as follows (Montgomery 2001a):
þ
Ciþ ¼ max½0; xi ð0 þ KÞ þ Ci1
ðA 2Þ
Ci ¼ max½0; ð0 KÞ xi þ Ci1
;
ðA 3Þ
Coþ
Co
where the starting values are
¼
¼ 0:
In this study, the last cusum statistic for each data stream was taken as the
representative feature. The reference value, K was set to half shift magnitude to
provide sensitivity to detect shift of 1.
A.3. Skewness
Skewness (a3 ) provides information concerning the shape of the distribution. It
indicates any lack of symmetry in the data distribution. Skewness of frequency
distribution is given by (Besterfield 1994):
h
P
a3 ¼
3
fi ðXi X Þ =n
i¼1
s3
:
ðA 4Þ
where n is the number of observed values, s is sample standard deviation, fi is the
frequency in a cell or frequency of an observed value, X is the average of the
observed value and Xi is the observed value.
A.4. Kurtosis
Kurtosis (a4 ) is the peakness of the data. Similar to skewness, it provides information concerning the shape of the distribution. The kurtosis value is used as a
measure of height of the peak in a distribution (Besterfield 1994):
h
P
a4 ¼ i¼1
4
fi ðXi X Þ =n
s4
:
ðA 5Þ
1602
A. Hassan et al.
A.5. Slope
Let Y denote the mean of Yi and X denote the mean of Xi of the (Xi ; Yi ) pairs of
sample observations. The best-fitting straight line is given by Yi ¼ b0 þ b1 Xi , where
b1 is the slope. The least-squares line and the slope is obtained as follows (Kleinbaum
et al. 1988, Neter et al. 1996):
Y^ ¼ ^o þ ^1 X
ðA 6Þ
n
X
ðXi X ÞðYi Y Þ
^1 ¼
i¼1
n
X
ðA 7Þ
2
ðXi X Þ
i¼1
^o ¼ Y ^1 X :
ðA 8Þ
The intercept is given by ^o and ^1 is the slope of the fitted line. The value of ^1
was used as the feature in this study.
A.6. Average autocorrelation
Autocorrelation exists when one part of a signal depends on another part of the
same signal. It measures the dependence of a data at one instant in time with another
data at another instant in time (Brook and Wynne 1988). A signal will show the
wildest fluctuation with time if there is no correlation between any two neighbouring
values. For random signals, one can expect that its value in the distant future to be
negligibly dependent on its present value.
The autocorrelation function is defined as follows (Brook and Wynne 1988):
Rxx ½k ffi
1
½x x þ x1 x1þk þ xNk xN ;
Nþ1k o k
ðA 9Þ
where N is the number of observations and k is the lag. In this study, the averages for
autocorrelation at lag 1 and 2 were used as the feature. These lags were chosen based
on preliminary simulation runs.
References
Amin, A., 2000, Recognition of printed Arabic text based on global features and decision tree
learning techniques. Pattern Recognition, 33, 1309–1323.
Anagun, A. S., 1998, A neural network applied to pattern recognition in statistical process
control. Computers Industrial Engineering, 35, 185–188.
Battiti, R., 1994, Using mutual information for selecting features in supervised neural net
learning. IEEE Transactions on Neural Networks, 5, 537–550.
Brook, D. and Wynne, R. J., 1988, Signal Processing: Principles and Applications (London:
Edward Arnold).
Cheng, C.-S., 1989, Group technology and expert systems concepts applied to statistical
process control in small-batch manufacturing. Unpublished PhD dissertation,
Arizona State University.
Demuth, H. and Beale, M., 1998, Neural Network Toolbox User’s Guide (Natick, MA: Math
Works).
Hassan, A., 2002, On-line recognition of developing control chart patterns. PhD thesis,
Universiti Teknologi, Malaysia.
Haykin, S., 1999, Neural Networks: A Comprehensive Foundation, 2nd edn (Englewood Cliffs,
NJ: Prentice-Hall).
Improved SPC chart pattern recognition using statistical features
1603
Hush, D. R. and Horne, B. G., 1993, Progress in supervised neural networks: what’s new
since Lippmann? IEEE Signal Processing Magazine, January, 8–39.
Hwarng, H. B. and Hubele, N. F., 1991, X-bar chart pattern recognition using neural nets.
ASQC Quality Congress Transactions, 884–889.
Hwarng, H. B. and Hubele, N. F., 1993, Back-propagation pattern recognisers for x control
charts: methodology and performance. Computers and Industrial Engineering, 24, 219–
235.
Jones, B., 1991, Design of experiments. In T. Pyzdek and R. W. Berger (eds), Quality
Engineering Handbook (New York: Marcel Dekker), pp. 329–387.
Montgomery, D. C., 2001a, Introduction to Statistical Quality Control, 4th edn (New York:
Wiley).
Montgomery, D. C., 2001b, Design and Analysis of Experiments, 5th edn (New York: Wiley).
Neter, J., Kutner, M. H., Natchtsheim, C. J. and Wasserman, W., 1996, Applied Linear
Statistical Models, 4th edn (Chicago: Irwin).
Oakland, J. S., 1996, Statistical Process Control (Oxford: Butterworth-Heinemann).
Pandya, A. S. and Macy, R. B., 1996, Pattern Recognition with Neural Network in C þþ
(Boca Raton, FL: CRC Press).
Patterson, D. W., 1996, Artificial Neural Networks: Theory and Applications (Singapore:
Prentice-Hall).
Pham, D. T. and Oztemel, E., 1993, Control chart pattern recognition using combinations of
multilayer perceptrons and learning vector quantisation neural networks. Proceedings of
the Institution of Mechanical Engineers, 207, 113–118.
Pham, D. T. and Wani, M. A., 1997, Feature-based control chart pattern recognition.
International Journal of Production Research, 35, 1875–1890.
Ross, P. J., 1996, Taguchi Techniques for Quality Engineering (New York: McGraw-Hill).
Swift, J. A., 1987, Development of a knowledge based expert system for control chart pattern
recognition and analysis. Unpublished PhD dissertation, Graduate College, Oklahoma
State University.
Tontini, G., 1996, Pattern identification in statistical process control using fuzzy neural networks. In Proceedings of the 5th IEEE International Conference on Fuzzy Systems, 3, pp.
2065–2070.
Tontini, G., 1998, Robust learning and identification of patterns in statistical process control
charts using a hybrid RBF fuzzy artmap neural network. The 1998 IEEE International
Joint Conference on Neural Network Proceedings (IEEE World Congress on
Computational Intelligence), 3, pp. 1694–1699.
Utku, H., 2000, Application of the feature selection method to discriminate digitised wheat
varieties. Journal of Food Engineering, 46, 211–216.
Walpole, R. E., Myers, R. H. and Myers, S. L., 1998, Probability and Statistics for
Engineers and Scientists, 6th edn (New York: Macmillan).
Wani, M. A. and Pham, D. T., 1999, Efficient control chart pattern recognition through
synergistic and distributed artificial neural networks. Proceedings of the Institution of
Mechanical Engineers: Part B, 213, 157–169.
Zeki, A. A. and Zakaria, M. S., 2000, New primitive to reduce the effect of noise for handwritten features extraction. IEEE 2000 Tencon Proceedings: Intelligent Systems and
Technologies for the New Millennium, pp. 24–27.