July 27, 2006 13:55 ICMIT 2006 Trim Size: 8.5in x 11in main88 Software Release Time Management: How to Use Reliability Growth Models to Make Better Decisions Chu-Ti Lin and Chin-Yu Huang Department of Computer Science National Tsing Hua University Hsinchu, Taiwan Abstract-In late years, due to the significance of software application, professional testing of software becomes an increasingly important task. Once all detected faults are removed, project managers can begin to determine when to stop testing. Software reliability has important relations with many aspects of software, including the structure, the operational environment, and the amount of testing. Actually, software reliability analysis is a key factor of software quality and can be used for planning and controlling the testing resources during development. Over the past three decades, many software reliability growth models (SRGMs) have been proposed. For most traditional SRGMs, one common assumption is that the fault detection rate is a constant over time. However, the fault detection process in the operational phase is different from that in the testing phase. Thus, in this paper, we will use the testing compression factor (TCF) to reflect the fact and describe the possible phenomenon. In addition, sometimes the one-to-one mapping relationship between failures and faults may not be realistic. Therefore, we also incorporate the concept of quantified ratio, not equal to 1, of faults to failures into software reliability growth modeling. We estimate the parameters of the proposed model based on real software failure data set and give a fair comparison with other SRGMs. Finally, we will show how to use the proposed model to conduct software release time management. I. INTRODUCTION The application of computer software has progressively become ubiquitous and worldwide. Many software application systems are responsible for critical services, such as weapon systems, air traffic control systems, payment systems, and so on. Due to the significance of software application, professional testing of software becomes an increasingly important task. During testing, engineers should exercise the program with specified test cases again and again to remove the detected faults and realize a bug-free software. However, testing resources are limited. Project managers need to determine when to stop testing and have to make an interim risk evaluation and management (E/M). Actually, software risk management is a software engineering practice with tools, methods, and processes for managing risks in a software project. While some risk assessment techniques advocate assigning quantitative probabilities for perceived likelihood, in practice it is very difficult for us to agree on the subjective numbers. In general, risk involves two characteristics: uncertainty and lost. Risk management is concerned with identifying risks which may affect the project and planning to ensure that the risks don’t develop into major threats [1]. Thus, we can clearly see that the risk exposure is mainly dominated by the probability of an unsatisfactory outcome. In fact, we can easily relate such ideas to software reliability modeling and analysis. Software reliability is the probability that given software will be functioning correctly under a given environment during a specified period of time [2-4]. It is a key factor of software quality and can be used for planning and controlling all testing resources during development. Besides, it can also give us a confidence about the correctness of software. Many software reliability growth models (SRGMs) have been proposed over past three decades [2-5]. SRGMs can be used to evaluate software development status and software reliability engineering technology quantitatively. The quantified data can greatly benefit the software release time management. However, the motive for exercising software dominates the set of inputs into program evaluation. The fault detection and operation phenomenon of the software in the operational phase are different from those in the testing phase. Thus, we will try to use the testing compression factor (TCF) to reflect the fact and describe the transition. Besides, we notice that the correlation between failures and faults may not be limited to one-to-one mapping in the execution of software system. We also incorporate the concept of quantified ratio of faults to failures into software reliability growth modeling. Finally, the proposed models are also used to establish optimal software release time policies. The rest of this paper is organized as follows. Section II briefly reviews the concept of TCF and also discusses the correlations between faults and failures. In Section III, we propose a new SRGM which incorporates the TCF and quantified ratio of faults to failures. Section IV shows the experiment results through a real data set. We further discuss the optimal software release time management based on the proposed model in Section V. Finally, the conclusions are given in Section VI. II. A. The Correlation between Failures and Faults In general, software debugging process can be composed of three steps: fault detection, fault isolation, and fault correction. Each time a failure occurs, it may not be possible 658 c 1-4244-0148-8/06/$20.00 2006 IEEE QUANTIFIED RATIO OF FAULTS TO FAILURES AND TESTING COMPRESSION FACTOR July 27, 2006 13:55 ICMIT 2006 Trim Size: 8.5in x 11in to immediately remove the fault which caused it. Before specific root cause is found, a failure may be recounted due to inability to locate and correct the fault [3]. During operation, the maintenance of software heavily relies on the error reports or comments from users. Sometimes, most of users may suffer the same failure prior to removing the root cause of the fault. Thus several failures caused by the same fault may be identified as different ones. The error recognitions in distributed system may also face similar problem [6]. It is obvious that the many-to-1 mapping relationship between failures and faults certainly exists. On the other hand, a failure may also be the result of a combination of several mutually dependent faults. The expected service will not be provided until a series of dependent faults are all recognized and removed. We can demonstrate this phenomenon through a faulty program given in Fig. 1. As seen from Fig. 1, the path selection made in line 32 (S2) depends on the definition of variable loop in S1. That is, the definition of loop in line 15 will determine the execution times of S3 and then affect the value of count while escaping from line 51. It is noted that line 66 will not print out the expected message unless the fault of misusing operator in S4 is corrected, i.e., “” should be posterior to “count”. Besides, the faulty definition of loop in line 15 will cause the unexpected value of count. Thus, in addition to removing the fault in S4, we still have to correct the leading fault in S1, which shows the 1-to-many mapping between failures and faults in the program. It seems that the correlation between failures and faults is not limited to one-to-one mapping. Thus, the quantified ratio of faults to failures may not be equal to 1. Line : 15 : 32 33 : 42 : 51 : 66 : Number S1: S2: S3: S4: Code : loop=loop % 5 : Bug! Should be “/” While (loop < 100) { : count = count * 10 : Bug! Should be “count++” } : printf(++count) Fig. 1. An example of faulty program. B. Testing Compression Factor An input variable is a variable that exists external to the program and is used by the program in executing its function. In the executing of software, quantities of input variables are usually associated with the program, and each set of values of these variables characterize an input state [3]. Malaiya et al. [7] reported that the input states during testing may be more effective than random choices from operational profile during operational phase. That is, during testing, the test teams always want to increase testing coverage within finite testing resources. There are few overlaps of input states between different test cases. Compared with professional test engineers, end users don’t care the content of input states and only exercise the software according to their actual regular main88 needs. Consequently, the occurrence of failures may tend to be sparse. Musa defined the testing compression factor (TCF) as “the ratio of execution time required in the operational phase to execution time required in the test phase to cover the input space of the program” [3, 8]. If the equivalence partitioning is applied to the design of test cases, TCF will increase. In addition, Carman et al. [9] also calculated TCF by taking the ratio of the failure rate (intensity) at the end of system test to the failure rate at the start of field operation. Later, Jung and Seong [10] reported that testing environment factors is the combination of the coverage factor and the aging factor. The aging factor is a measure that quantifies the relative number of execution times of the more probable functions in the invalid input space compared to the ordinary operation of program. In fact, the concept of the aging factor is very similar to TCF. III. SOFTWARE RELIABILITY MODELING Assumptions [2, 4, 6]: (1) The software system is subject to failures at random times caused by the manifestation of remaining faults in the system. (2) The total number of fault at the beginning of testing is finite and the failures caused by it are also finite. (3) The mean number of expected failures in the time interval (t, t+'t] is proportional to the mean number of remaining faults in the system. It is equally likely that a fault will generate more than one failure and a failure may be caused by a series of dependent faults. (4) The proportionality is not just a constant or in some case may be changed at some time moment called changepoint. Besides, the efficiency of fault detection is affected by the input states. (5) Each time a failure occurs, the fault that caused it is perfectly removed and no new faults are introduced. Let mean value function m(t) represent the s-expected number of software failures by time t and {N(t), t t 0 } be a counting process representing the cumulative number of failures at time t. N(t) has a Poisson distribution with sexpected value m(t) [2]. That is, [m(t )] exp[ m(t )] n! poim ( n, m(t )) , n=0, 1, 2,…., (1) where poim(x, y) is the Poisson pmf with mean y. Then, we can describe the stochastic behavior of software failure phenomena through the N(t) process. Since the fault detection and debugging activities in operation are different from those during testing, thus the input states may be various. Here, we can use testing compression factor c to reflect the possible change of test efficiency. Let W be the timing of transit between two specific stages, called a change-point (CP) [11]. From assumptions (3)-(4), we have dm(t ) (2) r (t ) u [a D u m(t )], dt and Pr [ N (t ) n] n 2006 IEEE International Conference on Management of Innovation and Technology 659 July 27, 2006 13:55 ICMIT 2006 Trim Size: 8.5in x 11in main88 0 d t W, r, (3) ® r c t tW, / , ¯ where a is the expected number of initial faults, r is the fault detection rate before CP, r/c is the fault detection rate after CP, and D is the quantified ratio of faults to failures in software system. Solving above two equations under boundary condition m(0)=0, we have a D u 1 exp[rDt ] , 0 d t W , (4) m(t ) ® t W ¯ a D u 1 exp> rD c W @, t t W . r (t ) IV. A. NUMERICAL RESULTS Data Set and Comparison Criteria The selected real data set was from the Brazilian Electronic Switching software system [12]. The number of failures was counted per time unit of ten days. The software system was employed in system validation during the first 30 time units and then was under field trial test until the 42ed time unit. During 42 units of time, 352 faults were removed. Moreover, another 108 corrections were taken during the first year of operation, i.e., from the 43rd time unit to the 81st time unit. In this paper, we use three criteria to judge the performance of the proposed model. (1) The Mean Square Error (MSE) is defined as [13]: MSE ¦ >m(ti ) mi @ K , K 2 (5) i 1 where mi is the observed number of faults by time ti, m(ti) is the estimated number of faults by time ti, and K is the sample size of selected data set. (2) The relative error (RE) is defined as [3]: RE = m (ti ) mi mi . (6) It can also be used to represent the capability of the model to predict failure behavior. (3) The Kolmogorov-Smirnov (KS) goodness-of-fit is defined as [2, 13]: DK = Supx| F*(x) F(x) |, (7) * where F (x) is the normalized observed cumulative distribution at x-th time point, and F(x) is the expected cumulative distribution at x-th time point, based on the model. B. Performance Validation In addition to the proposed model (i.e., Eq. (4)), some existing SRGMs such as the Goel-Okumoto model (GO), the Yamada delayed S-shaped model (YSS) and the Ohba-Chou imperfect debugging model (OCID) are also selected for comparison [4-5, 14]. The least squares estimation (LSE) is used to estimate the model’s parameters and it is preferred because it produces unbiased results [2-4, 15]. Because field trial test usually takes place at the site which is close to the operational environment, we view the input states of field trial test as those in operational phase. Here we assume that the 31st time unit is the change-point, i.e., W=31. We choose the failure data collected during testing phase (including system 660 validation and field trial test) to estimate the parameters and then use the estimates to forecast the operational failure data. Table I gives the estimated parameters of all selected models. The performance comparisons of different models are shown in Table II, which includes the comparison of using 1-42, 43-81, and 1-81 time unit data. As seen from Table II, the proposed model gives a good fit to the failure data during testing phase (both the MSE and the KS values are the lowest). Considering the capability of predicting operational failure data, the proposed model also gives the lowest MSE. It is noted that although the GO model and the OCID model give the lowest KS, too high MSE shows the weakness on predicting this failure data during operation. Finally, if we take both test and operational failure data into consideration, the proposed model provides lowest value for both MSE and KS. Fig. 2 shows the RE curves. It is obvious that the proposed model gives the slightest bias during operation. On the whole, the proposed model not only fits the testing failure data well but also provides an outstanding prediction capability on the operational phase. Given x as the length of conducted period ( x t 1), we can define the time period from the 31-xth to the 30th time unit as the end of systems validation and the time period from the 31st to the 30+xth time unit as the start of field trial. If the TCF can be calculated by taking the ratio of the failure rate at the end of system validation to the failure rate at the start of field trial [9], the estimated values of TCF for various x are given in Fig. 3. Musa reported that the reasonable value of TCF is in [8, 21] when the number of input states ranges from 103 to 109 [3, 8]. Thus, the estimated TCF is relatively low. It means that the design of test cases during system validation may not be efficient. Moreover, D 0.323 shows the high frequency of failure recurrence in the execution of software system. TABLE I PARAMETER ESTIMATION OF SELECTED MODELS. Model Eq. (4) GO YSS OCID a 509.51 727.57 382.06 420.69 r 2.14×102 1.65×102 9.34×102 2.86×102 Remarks D 3.23¯10-1, c 1.98, W 31 — — E=4.22¯10-1 TABLE II COMPARISON RESULTS OF SELECTED MODELS. MSE KS TP* OP** TP* OP** Total*** Total*** 63.96 402.24 5.02×102 2.29×101 Eq.(4) 174.03 9.09×102 97.86 1258.70 8.91×102 1.78×101 GO 656.79 1.28×101 238.18 2327.67 8.94×102 3.30×101 YSS 1244.23 1.60×101 97.86 1258.70 8.91×102 1.78×101 OCID 656.79 1.28×101 * st : Criteria evaluation conducted between the 1 and the 42nd time units. ** : Criteria evaluation conducted between the 43rd and the 81st time units. *** : Criteria evaluation conducted between the 1st and the 81st time units. Model 2006 IEEE International Conference on Management of Innovation and Technology 13:55 ICMIT 2006 Trim Size: 8.5in x 11in Relative Error 0.4 Eq.(4) 0.2 0.2 -0.2 0.4 main88 R0 is given, we can obtain a unique T2 satisfying R('T | T2)=R0. That is, rD u 'T ]) ° ln a (1exp[ rD , 0 d T2 W , (11) D uln R0 T2 ® a 1 exp > rD u 'T c @ rD , T2 t W . °¯(1 c )W c u ln D uln R0 > GO&OCID 0.6 0.8 1 Time(normalized) -0.4 YSS Fig. 2. Relative error curves for selected models. 4 3.5 3 2.5 2 1.5 By the method in [9] 6 8 10 12 14 16 18 Length of Period x (time unit) Fig. 3. Estimated values of TCF. V. 20 SOFTWARE RELEASE TIME MANAGEMENT Given the intense competition in the marketplace, releasing the software at right time has become a critical factor in determining the success of the software development team. In general, the optimal release time is usually determined based on two aspects, reliability requirement and total cost in software life cycle [2, 4, 16-18]. In this section, we will discuss the optimal release time policies. A. Software release time based on reliability requirement Failure intensity is a common reliability metric [3]. It can be defined as the expected number of failures per time unit. If T is the length of testing, the failure intensity function of Eq. (4) is given as , 0 d T W, ar u exp[rDT ] (8) O (T ) ® T W ¯ar c u exp> rD c W @ , T t W . When c t 1 , O(T) is a monotonically decreasing function. If the failure intensity objective (FIO) is F0 and T1 is the time to meet the desired failure intensity (i.e., satisfying O(T1)=F0), we have ln Far0 rD , 0 d T1 W , (9) T1 ® cF0 ¯(1 c) u W c u ln ar rD , T1 t W . > @ @ Software release time based on cost criterion The determination of Tc can be discussed in three cases. Case 1: If O (0) d C3 D u (C2 C1 ) , then we can find that O (T ) d C3 D u (C2 C1 ) for 0 T TLC . Hence, in this case, By Eq.(4) 1 0.5 B. > @ In general, total cost during software lifecycle can be formulized as follows [4, 17-18] C (T ) C1 u D u m(T ) C2 u D u >m(TLC ) m(T )@ C3 u T , (12) where TLC is the length of software life-cycle, C1 is the cost of removing a fault during testing, C2 is the cost of removing a fault during operation (C2 >>C1>0), and C3 is the cost per unit time of software testing. Thus, dC(T ) C1 u D u O (T ) C2 u D u O (T ) C3 0 , (13) dT and (14) O (T ) C3 D u (C2 C1 ) . -0.6 TCF July 27, 2006 > @ If ar c u exp[rDW ] F0 ar u exp[rDW ] , T1=W. On the other hand, we can also define software reliability as follows [2, 4]: R('T | T ) exp> m(T 'T ) m(T ) @ , T t 0 , 'T >0, (10) which represents the probability that a software failure doesn’t occur in time interval (T, T +'T]. If the acceptable reliability we have Tc=0. Case 2: If O (TLC ) t C3 D u (C2 C1 ) , then it is found that O (T ) t C3 D u (C2 C1 ) for 0 T TLC . Therefore, Tc=TLC. Case 3: If O (0) ! C3 D u (C2 C1 ) ! O (TLC ) , then there exists a unique time Tc satisfying O (Tc ) C3 D u (C2 C1 ) . Solving Eq. (13), we have , 0 d Tc W , ln[ Car3 (C2 C1D )] (rD ) (15) Tc ® ar , Tc t W . ( 1 c ) W c ln C C D r D u 2 1 C 3 uc ¯ > C. @ Software release time based on cost-reliability criterion In this section, our goal is to estimate the time T1* which minimizes C(T) subject to the constraint O (T ) d F0 [4]. Differentiating Eq. (8) with respect to T, we obtain ar 2D u exp> rDT @ , 0 d T W , (16) ® 2 T W > @ ar D exp r D W c2 , T t W , u c ¯ which is negative in time interval (0, T]. If T1>Tc, it means that the testing has already minimized the expected total cost at time Tc but has not satisfied the desired reliability yet. Then, the testing should be continued until time T1 which satisfies O (T1 ) d F0 . Therefore, T1*=T1. If T1 d Tc, the dO (T ) dT failure intensity reaches the FIO at T1. Because additional testing can reduce software cost, we should continue testing until time Tc. That is, T1*=Tc. Consequently, T1*=max{T1, Tc}. (17) Similarly, if T2* is the time that minimizes the cost function C(T) subject to the constraint R('T | T ) t R0 , then we have T2*=max{T2, Tc}. D. (18) Numerical Examples From the estimated parameters in Table I, we have a= 2006 IEEE International Conference on Management of Innovation and Technology 661 July 27, 2006 13:55 ICMIT 2006 Trim Size: 8.5in x 11in main88 509.51, r=2.14×102, D 3.23¯10-1, c 1.98 and W 31. If the FIO is given as 4 failures per period of ten days, T1 is estimated as 61.06. Besides, given R0=0.95 and 'T=0.015, we estimate T2 as 106.02. It means that, after about 106 time units of testing, the probability of failure-free execution in time period 'T=0.015 may not be less than 95%. Considering total cost during life cycle, we assume C1=$100, C2=$900, C3=$1000 and TLC =1000. From Eq. (15), we know the minimum cost is about $395,554 if the software is released at time Tc=70.39. Besides, from Eq. (17) and Eq. (18), T1* and T2* are estimated as 70.39 and 106.02, respectively. Finally, the estimated failure intensity and related costs versus time are depicted in Fig. 4. The estimated conditional reliability and expected costs versus time are given in Fig. 5. Cost($) Failure intensity 399607 failure data during operation. Finally, we also discussed some pragmatic software release policies based on reliability constraint and cost criterion. ACKNOWLEDGMENT This research was supported by the National Science Council, Taiwan, under Grant NSC 94-2213-E-007-087 and also substantially supported by a grant from the Ministry of Economic Affairs (MOEA) of Taiwan (Project No. 94-EC17-A-01-S1-038). REFERENCES [1] [2] [3] [4] [5] 4.0 O(T) [6] C(T) [7] 395554 3.74 Tc=T1*=70.39 Time(time unit) T1= 61.06 [8] Fig. 4. The estimated failure intensity curves v.s. related cost functions. [9] Reliability Cost($) 399607 [10] 0.95 R(T+'T) [11] C(T) [12] 0.938 395554 Tc=70.39 T2=T2*=106.02 Time(time unit) [13] [14] Fig. 5. The estimated conditional reliability curves v.s. related cost functions. VI. CONCLUSIONS Software testing is a necessary but expensive process. It helps software engineers to achieve higher defect coverage and then improves software quality. In this paper, to describe the fault detection and debugging processes more accurately, we incorporated the ratio of faults to failures and TCF into software reliability modeling. The performance validation of proposed model was presented based on real data. Numerical results show that, compared to existing traditional SRGMs, the proposed model gives a better prediction capability on the 662 [15] [16] [17] [18] R. S. Pressman, Software Engineering: A Practitioner's Approach, McGraw-Hill, 6th Edition, 2005. M. R. Lyu, Handbook of Software Reliability Engineering, McGraw Hill, 1996. J. D. Musa, A. Iannino, and K. Okumoto, Software Reliability, Measurement, Prediction and Application, McGraw Hill, 1987. M. Xie, Software Reliability Modeling, World Scientific Publishing Company, 1991. C. Y. Huang, M. R. Lyu, and S. Y. Kuo, “A Unified Scheme of Some Non-Homogenous Poisson Process Models for Software Reliability Estimation,” IEEE Trans. on Software Engineering, Vol. 29, No. 3, pp. 261-269, March 2003. P. K. Kapur and S. Bhushan, “An Exponential SRGM with a Bound on the Number of Failures,” Microelectron. Reliab., Vol. 33, No. 9, pp. 1245-1249, 1993. Y. K. Malaiya, A. von Mayrhauser, and P. K. Srimani, “An Examination of Fault Exposure Ratio,” IEEE Trans. on Software Engineering, Vol. 19, No. 11, pp. 1087-1094, November, 1993. N. Li and Y. K. Malaiya, “On input profile selection for software testing,” Proceedings of the 5th IEEE International Symposium on Software Reliability Engineering, pp. 196-205, November, 1994, Monterey, CA, USA. D.W. Carman, A. A. Dolinsky, M. R. Lyu, J. S. Yu, “Software reliability engineering study of a large-scale telecommunications software system,” Proceedings of the 3rd IEEE International Symposium on Software Reliability Engineering, pp. 350-359, October, 1995, Toulouse, France. H. S. Jung and P. H. Seong, “Prediction of Safety Critical Software Operational Reliability from Test Reliability Using Testing Environment Factors,” J. Korean Nuclear Society, Vol. 10, No. 1, pp. 49-57, Feb. 1999. M. Zhao, “Change-Point Problems in Software and Hardware Reliability,” Communications in Statistics–Theory and Methods, Vol. 22, No. 3, pp. 757-768, 1993. K. Kanoun and J. C. Laprie, “Software Reliability Trend Analyses from Theoretical to Practical Considerations,” IEEE Trans. on Software Engineering, Vol. 20, No. 9, pp. 740-747, Sept. 1994. M. R. Lyu and A. Nikora, “Applying Software Reliability Models More Effectively,” IEEE Software, pp. 43-52, July 1992. M. Ohba and X. Chou, “Does Imperfect Debugging Affect Software Reliability Growth?” Proceedings of the 11th International Conference on Software Engineering, pp. 237-244, May 1989, Pittsburgh, USA. M. Xie, “Software Reliability Models Past, Present and Future,” Recent Advances in Reliability Theory: Methodology, Practice and Inference (eds. N. Limnios and M. Nikulin), Birkhauser, Boston, pp. 323-340, 2000. M. S. Krishnan, “Software release management: a business perspective,” Proceedings of the 1994 conference of the Centre for Advanced Studies on Collaborative research, pp.36-48, October 31-November 03, 1994, Toronto, Ontario, Canada. C. Y. Huang and M. R. Lyu, “Optimal Release Time for Software Systems Considering Cost, Testing-Effort, and Test Efficiency,” IEEE Trans. on Reliability, Vol. 54, No. 4, pp. 583-591, December, 2005. H. Pham, Software Reliability, Springer-Verlag, 2000. 2006 IEEE International Conference on Management of Innovation and Technology