Software Release Time Management: How to Use

July 27, 2006
13:55
ICMIT 2006
Trim Size: 8.5in x 11in
main88
Software Release Time Management: How to Use
Reliability Growth Models to Make Better Decisions
Chu-Ti Lin and Chin-Yu Huang
Department of Computer Science
National Tsing Hua University
Hsinchu, Taiwan
Abstract-In late years, due to the significance of software
application, professional testing of software becomes an
increasingly important task. Once all detected faults are
removed, project managers can begin to determine when to stop
testing. Software reliability has important relations with many
aspects of software, including the structure, the operational
environment, and the amount of testing. Actually, software
reliability analysis is a key factor of software quality and can be
used for planning and controlling the testing resources during
development. Over the past three decades, many software
reliability growth models (SRGMs) have been proposed. For
most traditional SRGMs, one common assumption is that the
fault detection rate is a constant over time. However, the fault
detection process in the operational phase is different from that
in the testing phase. Thus, in this paper, we will use the testing
compression factor (TCF) to reflect the fact and describe the
possible phenomenon. In addition, sometimes the one-to-one
mapping relationship between failures and faults may not be
realistic. Therefore, we also incorporate the concept of
quantified ratio, not equal to 1, of faults to failures into software
reliability growth modeling. We estimate the parameters of the
proposed model based on real software failure data set and give
a fair comparison with other SRGMs. Finally, we will show how
to use the proposed model to conduct software release time
management.
I.
INTRODUCTION
The application of computer software has progressively
become ubiquitous and worldwide. Many software
application systems are responsible for critical services, such
as weapon systems, air traffic control systems, payment
systems, and so on. Due to the significance of software
application, professional testing of software becomes an
increasingly important task. During testing, engineers should
exercise the program with specified test cases again and again
to remove the detected faults and realize a bug-free software.
However, testing resources are limited. Project managers
need to determine when to stop testing and have to make an
interim risk evaluation and management (E/M). Actually,
software risk management is a software engineering practice
with tools, methods, and processes for managing risks in a
software project. While some risk assessment techniques
advocate assigning quantitative probabilities for perceived
likelihood, in practice it is very difficult for us to agree on the
subjective numbers. In general, risk involves two
characteristics: uncertainty and lost. Risk management is
concerned with identifying risks which may affect the project
and planning to ensure that the risks don’t develop into major
threats [1]. Thus, we can clearly see that the risk exposure is
mainly dominated by the probability of an unsatisfactory
outcome. In fact, we can easily relate such ideas to software
reliability modeling and analysis.
Software reliability is the probability that given software
will be functioning correctly under a given environment
during a specified period of time [2-4]. It is a key factor of
software quality and can be used for planning and controlling
all testing resources during development. Besides, it can also
give us a confidence about the correctness of software. Many
software reliability growth models (SRGMs) have been
proposed over past three decades [2-5]. SRGMs can be used
to evaluate software development status and software
reliability engineering technology quantitatively. The
quantified data can greatly benefit the software release time
management. However, the motive for exercising software
dominates the set of inputs into program evaluation. The fault
detection and operation phenomenon of the software in the
operational phase are different from those in the testing phase.
Thus, we will try to use the testing compression factor (TCF)
to reflect the fact and describe the transition. Besides, we
notice that the correlation between failures and faults may not
be limited to one-to-one mapping in the execution of software
system. We also incorporate the concept of quantified ratio of
faults to failures into software reliability growth modeling.
Finally, the proposed models are also used to establish
optimal software release time policies.
The rest of this paper is organized as follows. Section II
briefly reviews the concept of TCF and also discusses the
correlations between faults and failures. In Section III, we
propose a new SRGM which incorporates the TCF and
quantified ratio of faults to failures. Section IV shows the
experiment results through a real data set. We further discuss
the optimal software release time management based on the
proposed model in Section V. Finally, the conclusions are
given in Section VI.
II.
A.
The Correlation between Failures and Faults
In general, software debugging process can be composed of
three steps: fault detection, fault isolation, and fault
correction. Each time a failure occurs, it may not be possible
658
c
1-4244-0148-8/06/$20.00 2006
IEEE
QUANTIFIED RATIO OF FAULTS TO FAILURES AND
TESTING COMPRESSION FACTOR
July 27, 2006
13:55
ICMIT 2006
Trim Size: 8.5in x 11in
to immediately remove the fault which caused it. Before
specific root cause is found, a failure may be recounted due to
inability to locate and correct the fault [3]. During operation,
the maintenance of software heavily relies on the error reports
or comments from users. Sometimes, most of users may
suffer the same failure prior to removing the root cause of the
fault. Thus several failures caused by the same fault may be
identified as different ones. The error recognitions in
distributed system may also face similar problem [6]. It is
obvious that the many-to-1 mapping relationship between
failures and faults certainly exists. On the other hand, a failure
may also be the result of a combination of several mutually
dependent faults. The expected service will not be provided
until a series of dependent faults are all recognized and
removed. We can demonstrate this phenomenon through a
faulty program given in Fig. 1.
As seen from Fig. 1, the path selection made in line 32 (S2)
depends on the definition of variable loop in S1. That is, the
definition of loop in line 15 will determine the execution
times of S3 and then affect the value of count while escaping
from line 51. It is noted that line 66 will not print out the
expected message unless the fault of misusing operator in S4
is corrected, i.e., “” should be posterior to “count”.
Besides, the faulty definition of loop in line 15 will cause the
unexpected value of count. Thus, in addition to removing the
fault in S4, we still have to correct the leading fault in S1,
which shows the 1-to-many mapping between failures and
faults in the program.
It seems that the correlation between failures and faults is
not limited to one-to-one mapping. Thus, the quantified ratio
of faults to failures may not be equal to 1.
Line
:
15
:
32
33
:
42
:
51
:
66
:
Number
S1:
S2:
S3:
S4:
Code
:
loop=loop % 5
:
Bug! Should be “/”
While (loop < 100)
{
:
count = count * 10
: Bug! Should be “count++”
}
:
printf(++count)
Fig. 1. An example of faulty program.
B.
Testing Compression Factor
An input variable is a variable that exists external to the
program and is used by the program in executing its function.
In the executing of software, quantities of input variables are
usually associated with the program, and each set of values of
these variables characterize an input state [3]. Malaiya et al. [7]
reported that the input states during testing may be more
effective than random choices from operational profile during
operational phase. That is, during testing, the test teams
always want to increase testing coverage within finite testing
resources. There are few overlaps of input states between
different test cases. Compared with professional test
engineers, end users don’t care the content of input states and
only exercise the software according to their actual regular
main88
needs. Consequently, the occurrence of failures may tend to be
sparse.
Musa defined the testing compression factor (TCF) as “the
ratio of execution time required in the operational phase to
execution time required in the test phase to cover the input
space of the program” [3, 8]. If the equivalence partitioning is
applied to the design of test cases, TCF will increase. In
addition, Carman et al. [9] also calculated TCF by taking the
ratio of the failure rate (intensity) at the end of system test to
the failure rate at the start of field operation. Later, Jung and
Seong [10] reported that testing environment factors is the
combination of the coverage factor and the aging factor. The
aging factor is a measure that quantifies the relative number of
execution times of the more probable functions in the invalid
input space compared to the ordinary operation of program. In
fact, the concept of the aging factor is very similar to TCF.
III.
SOFTWARE RELIABILITY MODELING
Assumptions [2, 4, 6]:
(1) The software system is subject to failures at random times
caused by the manifestation of remaining faults in the
system.
(2) The total number of fault at the beginning of testing is
finite and the failures caused by it are also finite.
(3) The mean number of expected failures in the time interval
(t, t+'t] is proportional to the mean number of remaining
faults in the system. It is equally likely that a fault will
generate more than one failure and a failure may be
caused by a series of dependent faults.
(4) The proportionality is not just a constant or in some case
may be changed at some time moment called changepoint. Besides, the efficiency of fault detection is affected
by the input states.
(5) Each time a failure occurs, the fault that caused it is
perfectly removed and no new faults are introduced.
Let mean value function m(t) represent the s-expected
number of software failures by time t and {N(t), t t 0 } be a
counting process representing the cumulative number of
failures at time t. N(t) has a Poisson distribution with sexpected value m(t) [2]. That is,
[m(t )]
exp[ m(t )] n! poim ( n, m(t ))
, n=0, 1, 2,…., (1)
where poim(x, y) is the Poisson pmf with mean y. Then, we
can describe the stochastic behavior of software failure
phenomena through the N(t) process.
Since the fault detection and debugging activities in operation
are different from those during testing, thus the input states
may be various. Here, we can use testing compression factor c
to reflect the possible change of test efficiency. Let W be the
timing of transit between two specific stages, called a
change-point (CP) [11]. From assumptions (3)-(4), we have
dm(t )
(2)
r (t ) u [a D u m(t )],
dt
and
Pr [ N (t )
n]
n
2006 IEEE International Conference on Management of Innovation and Technology
659
July 27, 2006
13:55
ICMIT 2006
Trim Size: 8.5in x 11in
main88
0 d t W,
­ r,
(3)
®
r
c
t tW,
/
,
¯
where a is the expected number of initial faults, r is the fault
detection rate before CP, r/c is the fault detection rate after CP,
and D is the quantified ratio of faults to failures in software
system.
Solving above two equations under boundary condition
m(0)=0, we have
­a D u 1 exp[rDt ] , 0 d t W ,
(4)
m(t ) ®
t W
¯ a D u 1 exp> rD c W @, t t W .
r (t )
IV.
A.
NUMERICAL RESULTS
Data Set and Comparison Criteria
The selected real data set was from the Brazilian Electronic
Switching software system [12]. The number of failures was
counted per time unit of ten days. The software system was
employed in system validation during the first 30 time units
and then was under field trial test until the 42ed time unit.
During 42 units of time, 352 faults were removed. Moreover,
another 108 corrections were taken during the first year of
operation, i.e., from the 43rd time unit to the 81st time unit.
In this paper, we use three criteria to judge the performance
of the proposed model.
(1) The Mean Square Error (MSE) is defined as [13]:
MSE ¦ >m(ti ) mi @ K ,
K
2
(5)
i 1
where mi is the observed number of faults by time ti, m(ti)
is the estimated number of faults by time ti, and K is the
sample size of selected data set.
(2) The relative error (RE) is defined as [3]:
RE = m (ti ) mi mi .
(6)
It can also be used to represent the capability of the model
to predict failure behavior.
(3) The Kolmogorov-Smirnov (KS) goodness-of-fit is defined
as [2, 13]:
DK = Supx| F*(x) F(x) |,
(7)
*
where F (x) is the normalized observed cumulative
distribution at x-th time point, and F(x) is the expected
cumulative distribution at x-th time point, based on the
model.
B.
Performance Validation
In addition to the proposed model (i.e., Eq. (4)), some
existing SRGMs such as the Goel-Okumoto model (GO), the
Yamada delayed S-shaped model (YSS) and the Ohba-Chou
imperfect debugging model (OCID) are also selected for
comparison [4-5, 14]. The least squares estimation (LSE) is
used to estimate the model’s parameters and it is preferred
because it produces unbiased results [2-4, 15]. Because field
trial test usually takes place at the site which is close to the
operational environment, we view the input states of field trial
test as those in operational phase. Here we assume that the 31st
time unit is the change-point, i.e., W=31. We choose the failure
data collected during testing phase (including system
660
validation and field trial test) to estimate the parameters and
then use the estimates to forecast the operational failure data.
Table I gives the estimated parameters of all selected models.
The performance comparisons of different models are shown
in Table II, which includes the comparison of using 1-42,
43-81, and 1-81 time unit data. As seen from Table II, the
proposed model gives a good fit to the failure data during
testing phase (both the MSE and the KS values are the lowest).
Considering the capability of predicting operational failure
data, the proposed model also gives the lowest MSE. It is
noted that although the GO model and the OCID model give
the lowest KS, too high MSE shows the weakness on
predicting this failure data during operation. Finally, if we
take both test and operational failure data into consideration,
the proposed model provides lowest value for both MSE and
KS. Fig. 2 shows the RE curves. It is obvious that the proposed
model gives the slightest bias during operation. On the whole,
the proposed model not only fits the testing failure data well
but also provides an outstanding prediction capability on the
operational phase.
Given x as the length of conducted period ( x t 1), we can
define the time period from the 31-xth to the 30th time unit as
the end of systems validation and the time period from the 31st
to the 30+xth time unit as the start of field trial. If the TCF can
be calculated by taking the ratio of the failure rate at the end of
system validation to the failure rate at the start of field trial [9],
the estimated values of TCF for various x are given in Fig. 3.
Musa reported that the reasonable value of TCF is in [8, 21]
when the number of input states ranges from 103 to 109 [3, 8].
Thus, the estimated TCF is relatively low. It means that the
design of test cases during system validation may not be
efficient. Moreover, D 0.323 shows the high frequency of
failure recurrence in the execution of software system.
TABLE I
PARAMETER ESTIMATION OF SELECTED MODELS.
Model
Eq. (4)
GO
YSS
OCID
a
509.51
727.57
382.06
420.69
r
2.14×102
1.65×102
9.34×102
2.86×102
Remarks
D 3.23¯10-1, c 1.98, W 31
—
—
E=4.22¯10-1
TABLE II
COMPARISON RESULTS OF SELECTED MODELS.
MSE
KS
TP*
OP**
TP*
OP**
Total***
Total***
63.96
402.24
5.02×102 2.29×101
Eq.(4)
174.03
9.09×102
97.86
1258.70 8.91×102 1.78×101
GO
656.79
1.28×101
238.18 2327.67 8.94×102 3.30×101
YSS
1244.23
1.60×101
97.86
1258.70 8.91×102 1.78×101
OCID
656.79
1.28×101
*
st
: Criteria evaluation conducted between the 1 and the 42nd time units.
**
: Criteria evaluation conducted between the 43rd and the 81st time units.
***
: Criteria evaluation conducted between the 1st and the 81st time units.
Model
2006 IEEE International Conference on Management of Innovation and Technology
13:55
ICMIT 2006
Trim Size: 8.5in x 11in
Relative Error
0.4
Eq.(4)
0.2
0.2
-0.2
0.4
main88
R0 is given, we can obtain a unique T2 satisfying R('T | T2)=R0.
That is,
rD u 'T ])
­° ln a (1exp[
rD , 0 d T2 W , (11)
D uln R0
T2 ®
a 1 exp > rD u 'T c @
rD , T2 t W .
°¯(1 c )W c u ln
D uln R0
>
GO&OCID
0.6
0.8
1
Time(normalized)
-0.4
YSS
Fig. 2. Relative error curves for selected models.
4
3.5
3
2.5
2
1.5
By the method in [9]
6
8
10
12
14
16
18
Length of Period x (time unit)
Fig. 3. Estimated values of TCF.
V.
20
SOFTWARE RELEASE TIME MANAGEMENT
Given the intense competition in the marketplace, releasing
the software at right time has become a critical factor in
determining the success of the software development team. In
general, the optimal release time is usually determined based
on two aspects, reliability requirement and total cost in
software life cycle [2, 4, 16-18]. In this section, we will
discuss the optimal release time policies.
A.
Software release time based on reliability requirement
Failure intensity is a common reliability metric [3]. It can be
defined as the expected number of failures per time unit. If T
is the length of testing, the failure intensity function of Eq. (4)
is given as
, 0 d T W,
­ ar u exp[rDT ]
(8)
O (T ) ®
T W
¯ar c u exp> rD c W @ , T t W .
When c t 1 , O(T) is a monotonically decreasing function. If the
failure intensity objective (FIO) is F0 and T1 is the time to meet
the desired failure intensity (i.e., satisfying O(T1)=F0), we have
­ ln Far0 rD , 0 d T1 W ,
(9)
T1 ®
cF0
¯(1 c) u W c u ln ar rD , T1 t W .
> @
@
Software release time based on cost criterion
The determination of Tc can be discussed in three cases.
Case 1: If O (0) d C3 D u (C2 C1 ) , then we can find that
O (T ) d C3 D u (C2 C1 ) for 0 T TLC . Hence, in this case,
By Eq.(4)
1
0.5
B.
>
@
In general, total cost during software lifecycle can be
formulized as follows [4, 17-18]
C (T ) C1 u D u m(T ) C2 u D u >m(TLC ) m(T )@ C3 u T , (12)
where TLC is the length of software life-cycle, C1 is the cost of
removing a fault during testing, C2 is the cost of removing a
fault during operation (C2 >>C1>0), and C3 is the cost per unit
time of software testing.
Thus,
dC(T )
C1 u D u O (T ) C2 u D u O (T ) C3 0 , (13)
dT
and
(14)
O (T ) C3 D u (C2 C1 ) .
-0.6
TCF
July 27, 2006
> @
If ar c u exp[rDW ] F0 ar u exp[rDW ] , T1=W.
On the other hand, we can also define software reliability as
follows [2, 4]:
R('T | T ) exp> m(T 'T ) m(T ) @ , T t 0 , 'T >0, (10)
which represents the probability that a software failure doesn’t
occur in time interval (T, T +'T]. If the acceptable reliability
we have Tc=0.
Case 2: If O (TLC ) t C3 D u (C2 C1 ) , then it is found that
O (T ) t C3 D u (C2 C1 ) for 0 T TLC . Therefore, Tc=TLC.
Case 3: If O (0) ! C3 D u (C2 C1 ) ! O (TLC ) , then there exists
a unique time Tc satisfying O (Tc ) C3 D u (C2 C1 ) . Solving
Eq. (13), we have
, 0 d Tc W ,
­ln[ Car3 (C2 C1D )] (rD )
(15)
Tc ®
ar
, Tc t W .
(
1
c
)
W
c
ln
C
C
D
r
D
u
2
1
C 3 uc
¯
>
C.
@
Software release time based on cost-reliability criterion
In this section, our goal is to estimate the time T1* which
minimizes C(T) subject to the constraint O (T ) d F0 [4].
Differentiating Eq. (8) with respect to T, we obtain
­ ar 2D u exp> rDT @
, 0 d T W , (16)
®
2
T W
>
@
ar
D
exp
r
D
W
c2 , T t W ,
u
c
¯
which is negative in time interval (0, T].
If T1>Tc, it means that the testing has already minimized the
expected total cost at time Tc but has not satisfied the desired
reliability yet. Then, the testing should be continued until time
T1 which satisfies O (T1 ) d F0 . Therefore, T1*=T1. If T1 d Tc, the
dO (T )
dT
failure intensity reaches the FIO at T1. Because additional
testing can reduce software cost, we should continue testing
until time Tc. That is, T1*=Tc. Consequently,
T1*=max{T1, Tc}.
(17)
Similarly, if T2* is the time that minimizes the cost function
C(T) subject to the constraint R('T | T ) t R0 , then we have
T2*=max{T2, Tc}.
D.
(18)
Numerical Examples
From the estimated parameters in Table I, we have a=
2006 IEEE International Conference on Management of Innovation and Technology
661
July 27, 2006
13:55
ICMIT 2006
Trim Size: 8.5in x 11in
main88
509.51, r=2.14×102, D 3.23¯10-1, c 1.98 and W 31. If the
FIO is given as 4 failures per period of ten days, T1 is
estimated as 61.06. Besides, given R0=0.95 and 'T=0.015, we
estimate T2 as 106.02. It means that, after about 106 time units
of testing, the probability of failure-free execution in time
period 'T=0.015 may not be less than 95%. Considering total
cost during life cycle, we assume C1=$100, C2=$900,
C3=$1000 and TLC =1000. From Eq. (15), we know the
minimum cost is about $395,554 if the software is released at
time Tc=70.39. Besides, from Eq. (17) and Eq. (18), T1* and
T2* are estimated as 70.39 and 106.02, respectively. Finally,
the estimated failure intensity and related costs versus time
are depicted in Fig. 4. The estimated conditional reliability
and expected costs versus time are given in Fig. 5.
Cost($)
Failure intensity
399607
failure data during operation. Finally, we also discussed some
pragmatic software release policies based on reliability
constraint and cost criterion.
ACKNOWLEDGMENT
This research was supported by the National Science
Council, Taiwan, under Grant NSC 94-2213-E-007-087 and
also substantially supported by a grant from the Ministry of
Economic Affairs (MOEA) of Taiwan (Project No. 94-EC17-A-01-S1-038).
REFERENCES
[1]
[2]
[3]
[4]
[5]
4.0
O(T)
[6]
C(T)
[7]
395554
3.74
Tc=T1*=70.39
Time(time unit)
T1= 61.06
[8]
Fig. 4. The estimated failure intensity curves v.s. related cost functions.
[9]
Reliability
Cost($)
399607
[10]
0.95
R(T+'T)
[11]
C(T)
[12]
0.938
395554
Tc=70.39
T2=T2*=106.02
Time(time unit)
[13]
[14]
Fig. 5. The estimated conditional reliability curves v.s. related cost functions.
VI.
CONCLUSIONS
Software testing is a necessary but expensive process. It
helps software engineers to achieve higher defect coverage
and then improves software quality. In this paper, to describe
the fault detection and debugging processes more accurately,
we incorporated the ratio of faults to failures and TCF into
software reliability modeling. The performance validation of
proposed model was presented based on real data. Numerical
results show that, compared to existing traditional SRGMs,
the proposed model gives a better prediction capability on the
662
[15]
[16]
[17]
[18]
R. S. Pressman, Software Engineering: A Practitioner's Approach,
McGraw-Hill, 6th Edition, 2005.
M. R. Lyu, Handbook of Software Reliability Engineering, McGraw
Hill, 1996.
J. D. Musa, A. Iannino, and K. Okumoto, Software Reliability,
Measurement, Prediction and Application, McGraw Hill, 1987.
M. Xie, Software Reliability Modeling, World Scientific Publishing
Company, 1991.
C. Y. Huang, M. R. Lyu, and S. Y. Kuo, “A Unified Scheme of Some
Non-Homogenous Poisson Process Models for Software Reliability
Estimation,” IEEE Trans. on Software Engineering, Vol. 29, No. 3,
pp. 261-269, March 2003.
P. K. Kapur and S. Bhushan, “An Exponential SRGM with a Bound
on the Number of Failures,” Microelectron. Reliab., Vol. 33, No. 9,
pp. 1245-1249, 1993.
Y. K. Malaiya, A. von Mayrhauser, and P. K. Srimani, “An
Examination of Fault Exposure Ratio,” IEEE Trans. on Software
Engineering, Vol. 19, No. 11, pp. 1087-1094, November, 1993.
N. Li and Y. K. Malaiya, “On input profile selection for software
testing,” Proceedings of the 5th IEEE International Symposium on
Software Reliability Engineering, pp. 196-205, November, 1994,
Monterey, CA, USA.
D.W. Carman, A. A. Dolinsky, M. R. Lyu, J. S. Yu, “Software
reliability engineering study of a large-scale telecommunications
software system,” Proceedings of the 3rd IEEE International
Symposium on Software Reliability Engineering, pp. 350-359,
October, 1995, Toulouse, France.
H. S. Jung and P. H. Seong, “Prediction of Safety Critical Software
Operational Reliability from Test Reliability Using Testing
Environment Factors,” J. Korean Nuclear Society, Vol. 10, No. 1, pp.
49-57, Feb. 1999.
M. Zhao, “Change-Point Problems in Software and Hardware
Reliability,” Communications in Statistics–Theory and Methods, Vol.
22, No. 3, pp. 757-768, 1993.
K. Kanoun and J. C. Laprie, “Software Reliability Trend Analyses
from Theoretical to Practical Considerations,” IEEE Trans. on
Software Engineering, Vol. 20, No. 9, pp. 740-747, Sept. 1994.
M. R. Lyu and A. Nikora, “Applying Software Reliability Models
More Effectively,” IEEE Software, pp. 43-52, July 1992.
M. Ohba and X. Chou, “Does Imperfect Debugging Affect Software
Reliability Growth?” Proceedings of the 11th International
Conference on Software Engineering, pp. 237-244, May 1989,
Pittsburgh, USA.
M. Xie, “Software Reliability Models Past, Present and Future,”
Recent Advances in Reliability Theory: Methodology, Practice and
Inference (eds. N. Limnios and M. Nikulin), Birkhauser, Boston, pp.
323-340, 2000.
M. S. Krishnan, “Software release management: a business
perspective,” Proceedings of the 1994 conference of the Centre for
Advanced Studies on Collaborative research, pp.36-48, October
31-November 03, 1994, Toronto, Ontario, Canada.
C. Y. Huang and M. R. Lyu, “Optimal Release Time for Software
Systems Considering Cost, Testing-Effort, and Test Efficiency,”
IEEE Trans. on Reliability, Vol. 54, No. 4, pp. 583-591, December,
2005.
H. Pham, Software Reliability, Springer-Verlag, 2000.
2006 IEEE International Conference on Management of Innovation and Technology