20_chapter 10

Chapter 10
Analysis of alternative features for
model building
The model for VFD tested on passenger vehicle is analysed for
performance improvement using alternate features. The use of
wavelet features is presented in Section 10.2 followed by use of
feature fusion in Section 10.3 and the use of spectral features are
presented in Section 10.4
181
CHAPTER 10 ANALYSIS OF ALTERNATIVE FEATURES FOR MODEL BUILDING
10.1 INTRODUCTION
The development of a comprehensive model for misfire and other multi-class vehicle fault
detection system using a single low cost sensor requires the assessment of alternative
signal features. This is done for identifying the diverse possibilities to achieve 100%
classification accuracy in detecting all the vehicle-faults under consideration. In this
section, the use of alternative features based on spectral information and wavelet
decomposition of time series signals are considered. This attempt is necessary since the
model developed using statistical features was not capable of reaching 100% classification
in all the classes, as observed from the results presented in Chapter 9. Alternative features
were consciously avoided at the initial stage since all the options like Discrete Wavelet
Transforms (DWT), Discrete Fourier Transform (DFT) using Fast Fourier Transform
(FFT) and Power Spectral Density (PSD) involve intensive computation, requiring a
complex onboard computation infrastructure in the vehicle. It increases the cost of the
setup and challenges the aim of developing a low cost system. A judicious decision based
on the cost-benefit relationship needs to be taken after performance analysis.
The vibration signature of the engine block is acquired as the base signal from which other
transforms are obtained to build the new sets of features. Misfire and other fault
simulations are as described in Section 4.5.2.
The following data transforms were used as base formulations for new feature extraction:
 Discrete Wavelet transform of vibration signals
o Harr
o Daubechies Db2 to Db9
 Feature fusion
o DWT features with statistical features of vibration signal
 Spectral decomposition of vibration signals
o Discrete Fourier transform using Fast Fourier Transform (FFT)
o Power Spectral Density (PSD)
182
The frequency plot of all the conditions is presented in Figure 10.1. The plots show the
presence of noise in a wide bandwidth.
Good
Misfire
Frequency in Hz
Frequency in Hz
Engine high Rpm
Gear knock
Frequency in Hz
Frequency in Hz
Choking
Low tyre pressure
Frequency in Hz
Frequency in Hz
Figure 10.1 Frequency plots of various vehicle faults
Moreover different faults will be represented by different frequency hence it cannot be
directly used for interpretation of vehicle condition.
183
Misfire
Good
Engine high Rpm
Gear knock
Low tyre pressure
Choking
Figure 10.2 PSD plots of various vehicle faults
The PSD plots presented in Figure 10.2 also show an almost constant energy content in the
entire frequency band considered. Hence the direct use of these features might not be very
feasible. An alternative feature formulation from these spectral information needs to be
used.
184
10.2 Model analysis using Discrete Wavelet transforms
The DT-CFS-KD and RF-CFS-KD models using CFS based FSS followed by Konenenko
discretisation of data is taken for evaluation with Harr and DWT features. The parameters
for DWT based system is presented in Table 10.1.
Table 10.1 Classifier parameters for DWT based model in the passenger car
Parameters for evaluation
Model performance evaluation
Model building time
Total Number of Instances
Correctly Classified Instances
Incorrectly Classified Instances
Classification accuracy
Misfire detection accuracy
Mean absolute error
Root mean squared error
MDL correction
Number of leaves
Size of the tree
Features used
Decision tree
10-fold stratified
cross-validation
0.01 s
Random forest
10-fold stratified
cross-validation
0.01 s
1200
985
215
82.1
100
0.0681
0.1901
1200
969
231
80.8
100
0.0687
0.1984
Incorporated
46
56 levels
Db7
Incorporated
10 trees
Db7
The results presented in Table 10.1 is not very encouraging since an overall classification
accuracy of 82.1% and 80.8% were recorded for DT-CFS-KD and RF-CFS-KD
respectively. However, both the models recorded 100% for misfire detection.
The confusion matrix presented in Table 10.4 clearly portrays that no condition is
misclassified as misfire and misfire is also detected with 100% accuracy. The good
condition is largely misclassified as choking (102 instances) and 15 as low tyre pressure
and a similar trend is observed in the choking condition as well. The other results are
comparable with DT-CFS-KD and RF-CFS-KD. The time taken for feature extraction and
classification using wavelets is very high when compared to models using statistical and
histogram features, as observed from Tables 10.2 and 10.3.
185
Table 10.2 Model performance using wavelet features with decision tree
Harr
db2
db3
db4
db5
db6
db7
db8
db9
Multi class
81.6
80.8
81.5
81.3
81.3
82.3
82.1
81.1
81.9
Misfire
92
98.5
98
99
99
99
100
98.5
99.5
733
792
801
799
856
851
974
970
978
MAE
0.0716
0.0703
0.069
0.0694
0.0689
0.0663
0.0681
0.0695
0.0676
RMSE
0.1916
0.1931
0.1903
0.1907
0.1903
0.1876
0.1901
0.1902
0.1875
Time
taken(s)
From the results presented in Tables 10.2 and 10.3, it is clearly recorded that only db7 is
capable of achieving 100% classification accuracy for misfire and records an overall
performance of 80.8% with Random forest and 82.1% with decision tree algorithm. The
results are not very satisfactory given the extremely high computation load encountered in
the wavelet decomposition.
Table 10.3 Model performance using wavelet features with Random forest
Multi
class
Misfire
Time
taken(s)
Harr
db2
db3
db4
db5
db6
db7
db8
db9
81.7
82.4
80.6
79.8
81.5
82.9
80.8
80.7
80.8
93
99.5
99
99
99
99
100
99
99.5
734
790
803
800
857
851
975
970
978
MAE
0.0707 0.0685 0.0678 0.0687 0.0674
RMSE
0.1902 0.1907 0.1897 0.1932 0.1904 0.1856 0.1984 0.1884 0.1889
186
0.064
0.0687 0.0676 0.0668
Table 10.4 Decision tree confusion matrix for model using db7 features
Good
82
0
0
53
0
38
Mis1
0
200
0
0
0
0
GnoK
1
0
197
0
0
0
Choke
102
0
0
129
0
15
GrHiRpm
0
0
0
0
200
0
20Psi
15
0
3
18
0
177
Good
Mis1
GnoK
Choke
GrHiRpm
20Psi
The use of DWT features with single level of decomposition and db is not a very
encouraging choice but performs satisfactorily. Additionally large number of good
conditions is misclassified as choking and vice-versa. Hence this model is not
recommended for consideration.
10.3 MODEL PERFORMANCE ANALYSIS USING FEATURE FUSION
A new formulation using feature fusion was also considered for evaluation. The statistical
features of the time series vibration signal and wavelet based features are used jointly to
evaluate if performance improves to a level higher than that of statistical or DWT features,
when used individually. The result of the analysis using random forest and decision tree
are presented in Tables 10.5 and 10.6
In this analysis, random forest achieved a higher multi-class performance when compared
to that of a decision tree. It is noticed that the feature fusion model delivers an impressive
95.5% multi class accuracy and 100 % misfire detection using statistical features, db2
features and random forest. A comparable performance is observed with the model
combination using statistical features, Harr features and decision tree with a maximum
performance of 94.1% multi class accuracy and 100 % misfire detection. Since the
Random forest performs better than decision tree, it is recommended. Both the models
have also recorded one of the lowest mean absolute error and root mean squared error.
187
Table 10.5 Model performance using feature fusion and decision tree
Multi
class
Misfire
Harr
db2
db3
db4
db5
db6
db7
db8
db9
94.1
93.9
93.8
94
93.75
93.7
94.1
94.4
94.4
100
100
100
100
100
100
100
100
100
MAE
0.0279 0.0285 0.0288 0.0277 0.0292 0.0291 0.0276 0.0265 0.0269
RMSE
0.1297 0.1296 0.1314 0.1291 0.1334 0.1318 0.1305 0.1276 0.1263
Table 10.6 Model performance using feature fusion and random forest
Multi
class
Misfire
Harr
db2
db3
db4
db5
db6
db7
db8
db9
95.2
95.5
95.3
95.5
95
95.2
94.8
95.3
94.9
99.5
100
100
99.5
100
100
100
100
100
MAE
0.0247 0.0244 0.0255 0.0256
0.025
0.0248 0.0246 0.0242 0.0258
RMSE
0.1127 0.1109 0.1122 0.1147 0.1133 0.1123 0.1129 0.1093 0.1155
Analysing the results presented in Table 10.7, it is clearly evident that the random forest
based model with db7 attains a maximum classification accuracy of 80.8% in multi-class
mode and 100% in two-class mode. The feature fusion results indicate that the maximum
classification accuracy in multi-class mode varies between 94.8% and 95.5% and achieves
100% in two-class mode except for Harr and db4. Comparing these results, with the
performance of statistical features recording 94.6% and 100%, as presented in Table 9.3 of
Section 9.3.1, the increase in classification is less than one percent.
188
Table 10.7 Random forest model performance comparison
Feature
Multi
Wavelet
features
Statistical
and wavelet
feature fusion
class
Misfire
Multi
class
Misfire
Harr
db2
db3
db4
db5
db6
db7
db8
db9
81.7
82.4
80.6
79.8
81.5
82.9
80.8
80.7
80.8
93
99.5
99
99
99
99
100
99
99.5
95.2
95.5
95.3
95.5
95
95.2
94.8
95.3
94.9
99.5
100
100
99.5
100
100
100
100
100
The use of feature fusion is encouraging but the time taken for formulating the DWT based
features, as observed from Tables 10.2 and 10.3 is the only setback for this system.
However feature fusion as a concept is very encouraging from this analysis.
10.4 MODEL PERFORMANCE ANALYSIS USING SPECTRAL INFORMATION
The frequency domain analysis is more commonly used for rotary machines where
segregation of vibration into distinct frequency regions and identify the frequency variation
in each such region is easily achievable. Moreover the minimal presence of noise renders
such application more effective. The use of frequency domain for vehicle fault detection is
challenged by the presence of noise in a wide band of frequency requiring multiple filters
and processes. In addition to all these challenges, the use of frequency will make the
system less reliable after a period of time when the frequency or vibration signature of the
system changes due to wear and tear. However, features extracted from the spectral
information of the engine block can be used for effective model building. Detection of
misfire alone could be possible using DFT (Horner 1995). Many of the reported systems
referred in Section 2.4 have conducted the experiment on a stationary engine or vehicle
which reduces the applicability of the results.
189
10.4.1 Spectral feature formulation
The engine block vibration acquired using the accelerometer is transformed to frequency
domain using Discrete Fourier Transform (DFT). DFT computation is computationally
intensive processes hence an efficient algorithm known as ‘Fast Fourier Transform (FFT)
using the Cooley and Tukey algorithm (Cooley and Tukey 1965) was opted to perform
DFT. The conversion of time domain signal to frequency domain signal is the first step
towards any frequency domain analysis. The PSD is essentially to identify the energy
content in a frequency or band of frequency. Since the FFT and PSD data contain a large
portion of noise their use as a base for statistical feature formulation is evaluated. The use
of mean of FFT signal for pump fault identification has been reported by Al-Hashmi (AlHashmi, 2008). The statistical features as presented in Section 3.2.1 are extracted using
DFT and PSD as base inputs independently. The extracted features are processed using
FSS and discretisation before being fed as an input to the classification algorithms for
model building and evaluation.
10.4.2 Model performance analysis using spectral features
The model performance based on spectral features is presented in Table 10.8. It is clearly
evident that spectral features do not offer any possibility even for misfire detection since it
is not able to achieve the mandatory 100% misfire classification accuracy needed. From
among DFT and PSD, only DFT turns out better than PSD. The multi-class identification
accuracy of DFT reaches 93.3% whereas it is very poor at 59.9% for PSD. An interesting
observation is that, in spite of poor performance in multi-class accuracy, the PSD is able to
record 92% misfire detection accuracy. The time taken for building the model is also
considerably higher than using statistical features, which is less than one second as
presented in Table 9.3 of Section 9.3.1.
190
Table 10.8 Model performance using DFT and PSD features
Discrete Fourier transform
Power spectral density features
Decision Tree
Random Forest
Decision Tree
Random Forest
Multi class
93.3
92.7
59.9
58.4
Misfire
96.5
96
92
92
Time taken
102
103
126
126
MAE
0.0303
0.0315
0.1564
0.1547
RMSE
0.1323
0.1411
0.2803
0.2797
Based on the results, it is concluded that the use of spectral features is not very
encouraging for building the model.
10.5 CONCLUSION
Only wavelets gives 82% overall and feature fusion returns 94 to 95% compared to
statistical features reported in Chapter 9 where the performance is at 94.6%. It is clear that
the heavy computational load due to the calculation of two diverse set of features including
wavelets is not a favourable choice.
In this analysis, an impressive performance is noticed when the feature fusion model
delivers an impressive 95.5% multi class accuracy and 100 % misfire detection using
statistical features, db2 features and Random forest. A comparable performance is
observed with the model combination using statistical features, Harr features and decision
tree with a maximum performance of 94.1% multi class accuracy and 100 % misfire
detection. Since the Random forest performs better that model is recommended. Both the
models have also recorded one of the lowest mean absolute error and root mean squared
error.
191