ISSN: 2378315X BBIJ
Biometrics & Biostatistics International Journal
Research Article
Volume 2 Issue 4  2015
Second Order Optimality of Sequential Designs with Application in Software Reliability Estimation
Kamel Rekab* and Xing Song
Department of Mathematics and Statistics, University of MissouriKansas City, USA
Received: April 13, 2015  Published: April 27, 2015
*Corresponding author: Kamel Rekab, Department of Mathematics and Statistics, University of MissouriKansas City, PO Box 32464 Kansas City, MO 64171, USA, Tel: 8162694432; Email:
Citation: Rekab K, Song X (2014) Second Order Optimality of Sequential Designs with Application in Software Reliability Estimation. Biom Biostat Int J 2(4): 00037. DOI: 10.15406/bbij.2015.02.00037
Abstract
We propose three efficient sequential designs in the software reliability estimation. The fully sequential design the multistage sequential design and the accelerated sequential design. These designs make allocation decisions dynamically throughout the testing process. We then refine these estimated reliabilities in an iterative manner as we sample. Monte Carlo simulation seems to indicate that these sequential designs are second order optimal.
Keywords: Software reliability, Partition testing, Fully sequential design, Multistage sequential design, Accelerated sequential design.
Introduction
Reliability of a system is an important aspect of any system design since any user of the system would expect some type of guarantee that the system will function to some level of confidence. Failing to meet such guarantee will result in disastrous consequences. On the other hand, overly exceeding such guarantee level may incur additional and unnecessary expense to the developers. Moreover, for any nontrivial software system, an exhaustive testing among the entire input domain can be very expensive. By adopting the partition testing strategy, we attempt to break up the testable input domain of possible test cases into partitions, which must be nonoverlapping, such that if test case $i$
belongs to partition $j$
, then no partition other than $j$
will contain $i$
. Sayre and Poore[11] have given several possible mechanics to partition the domain into finitely many subdomains, ${X}_{ij}=\{\begin{array}{c}1,\text{iftest}j\text{takenfrompartition}i\text{isprocessedcorrectly}\\ 0,\text{otherwise}\end{array}$
, such that :
$D=\underset{i=1}{\overset{k}{{\displaystyle \cup}}}{D}_{i};{D}_{i}\cap {D}_{j}=\varnothing ,i\ne j$
which allows us to define the system reliability by a weighted sum of reliabilities of these subdomains, i.e.
$R={\displaystyle \sum}_{i=1}^{k}{p}_{i}{R}_{i}$
Where $R$
denotes the system reliability ${R}_{i}$
and is the reliability of each subdomain ${D}_{i}$
; and ${p}_{i}$
, parameters of the operational profile is the likelihood of this test case belongs to partition ${D}_{i}$
, which are assumed to be known [12]. As mentioned above, a complete testing of any software system of nontrivial size is practically impossible, ${R}_{i}$
are usually unknown parameters to us. So as to gain knowledge about ${R}_{i}$
, we must distribute the $k$
test cases among these $k$
partitions, and generate reasonable estimates for each. Specifically, we denote ${n}_{1},{n}_{2},\dots ,{n}_{k}$
as sizes of the samples which are taken from sub domain ${D}_{1},{D}_{2},\dots ,\text{'}{D}_{k}$
, respectively, where $${\sum}_{i=1}^{k}{n}_{i}=N$$
.
We model the outcome of the ${j}^{th}$
taken from the ${i}^{th}$
partition as a Bernoulli random variable ${X}_{ij}$
such that:
${X}_{ij}=\{\begin{array}{c}1,\text{iftest}j\text{takenfrompartition}i\text{isprocessedcorrectly}\\ 0,\text{otherwise}\end{array}$
and each ${X}_{ij}$
follows a Bernoulli distribution with parameter ${R}_{i}$
. Then, the estimate of the overall system reliability, $R$
denoted by $\widehat{R}$
can thus be defined as:
$\widehat{R}=\underset{i=1}{\overset{k}{\rangle}}{p}_{i}{\widehat{R}}_{i}$
where ${\widehat{R}}_{i}$
is the estimate of ${R}_{i}$
after ${n}_{i}$
test cases have been allocated to partition such that:
${\widehat{R}}_{i}=\frac{{\displaystyle {\sum}_{j=1}^{{n}_{i}}{X}_{ij}}}{{n}_{i}}$
and $$Var\left(\widehat{R}\right)={\displaystyle \sum}_{i=1}^{k}\frac{{p}_{i}^{2}{R}_{i}\left(1{R}_{i}\right)}{{n}_{i}}$$
(1.1)
Optimal Sampling Scheme
Ideally, we would like to execute all possible test paths through the software and determine the true overall reliability of the system. In practice though, resources are often limited, sample test cases must be chosen and allocated strategically to attain the best reliability estimate possible given all kinds of constraints. One of the criteria of distributing test cases among the partitions, which proceeds from rewriting (1.1) as follows:
$Var\left(\widehat{R}\right)=\frac{{\left[{{\displaystyle \sum}}_{i=1}^{k}{p}_{i}\sqrt{{R}_{i}\left(1{R}_{i}\right)}\right]}^{2}}{N}+\frac{1}{N}{\displaystyle \sum}_{i=1}^{k1}{\displaystyle \sum}_{j=i+1}^{k}\frac{{\left[{n}_{i}{p}_{j}\sqrt{{R}_{j}\left(1{R}_{j}\right)}{n}_{j}{p}_{i}\sqrt{{R}_{i}\left(1{R}_{i}\right)}\right]}^{2}}{{n}_{i}{n}_{j}}$
which is bounded below by the first term:
$Var\left(\widehat{R}\right)\ge \frac{{\left[{{\displaystyle \sum}}_{i=1}^{k}{p}_{i}\sqrt{{R}_{i}\left(1{R}_{i}\right)}\right]}^{2}}{N}$
with equality of (2.1) to hold is and only if:
$\frac{{n}_{i}}{{n}_{j}}=\frac{{p}_{i}\sqrt{{R}_{i}\left(1{R}_{i}\right)}}{{p}_{j}\sqrt{{R}_{j}\left(1{R}_{i}\right)}}$
for all . Hence, the optimal allocation is determined by:
$\frac{{n}_{i}}{N}=\frac{{p}_{i}\sqrt{{R}_{i}\left(1{R}_{i}\right)}}{{{\displaystyle \sum}}_{j=1}^{k}{p}_{j}\sqrt{{R}_{j}\left(1{R}_{i}\right)}}$
for $i=1,2,\dots ,k1$
, and
${n}_{k}=N\underset{i=1}{\overset{k1}{{\displaystyle \rangle}}}{n}_{i}$
and the variance incurred by this optimal allocation is:
$Var\left(O\right)=\frac{{\left[{{\displaystyle \sum}}_{j=1}^{k}{p}_{j}\sqrt{{R}_{j}\left(1{R}_{i}\right)}\right]}^{2}}{N}$
Note that the optimal allocation depends on the actual reliabilities ${R}_{1},{R}_{2},\dots ,{R}_{k},$
which are unknown. Therefore the optimal sampling scheme is not practical. It is this shortcoming that motivates us to adopt such dynamic allocation approaches that will be discussed in the following three sections.
Fully Sequential Sampling Scheme
By adopting a fully Bayesian approach with Beta priors, Rekab, Thompson and Wei [5] proposed a fully sequential design shown to be first order optimal. The fully sequential design is defined as follows;
We first test one case from each of the partitions and estimate the reliability for each of these partitions.
Stage 1:
After cases have been allocated, $l\ge k,$
the next test will be taken from partition $i$
if for all $j$
,
$\frac{{n}_{l,i}}{{n}_{l,j}}<\frac{{p}_{i}\sqrt{{\widehat{R}}_{l,i}\left(1{\widehat{R}}_{l,i}\right)}}{{p}_{j}\sqrt{{\widehat{R}}_{l,j}\left(1{\widehat{R}}_{l,j}\right)}}$
where ${n}_{l,i}$
is the cumulative test cases allocated to partition $i$
after $l$
tests have been allocated and the current estimated reliability for partition $i$
is determined by:
${\widehat{R}}_{l,i}=\frac{{{\displaystyle \sum}}_{m=1}^{{n}_{l,i}}{X}_{im}}{{n}_{l,i}}$
Stage 2:
Repeat step 2 sequentially until all the test cases are allocated, and the final estimate of reliability for partition is:
${\widehat{R}}_{i}=\frac{{{\displaystyle \sum}}_{m=1}^{{n}_{N,i}}{X}_{im}}{{n}_{N,i}}$
And thus, the estimate of the overall reliability of the system is:
$\widehat{R}=\underset{i=1}{\overset{k}{{\displaystyle \rangle}}}{p}_{i}{\widehat{R}}_{i}$
Multistage Sequential Sampling
By adopting a fully Bayesian approach with Beta priors, Rekab, Thompson and Wei [6] proposed a multistage sequential design shown to be first order optimal. Instead of making an allocation decision for each test at a time, the multistage sequential sampling allocates test cases among the partitions in stages in batches, where and are both prespecified. The multistage sequential design is defined as follows:
Stage 1:
We start with an initial sample of test cases, which are allocated among the partitions with a balanced allocation scheme, such that:
${S}_{1,i}=\frac{{S}_{1}}{k}$
and estimate the reliability for partition by:
${\widehat{R}}_{1,i}=\frac{{{\displaystyle \sum}}_{m=1}^{{s}_{1,i}}{X}_{im}}{{S}_{1,i}}$
Therefore,
${\widehat{C}}_{i}\left({S}_{1}\right)=\frac{{p}_{i}\sqrt{{\widehat{R}}_{1,i}\left(1{\widehat{R}}_{1,i}\right)}}{{{\displaystyle \sum}}_{j=1}^{k}{p}_{j}\sqrt{{\widehat{R}}_{1,j}\left(1{\widehat{R}}_{1,j}\right)}}$
Stage 2 through L:
At stage$j,$
for partition the test cases are distributed by the following way: $2\le j\le L,$
for partition $i=1,2,\dots ,k1,$
the test cases are distributed by the following way:
${S}_{j,i}=\left(\underset{l=1}{\overset{j}{{\displaystyle \rangle}}}{S}_{l}\right){\widehat{C}}_{i}\left({\overline{S}}_{j1}\right);$
and
${S}_{j,k}=\underset{l=1}{\overset{j}{{\displaystyle \rangle}}}{S}_{l}\underset{i=1}{\overset{k1}{{\displaystyle \rangle}}}{S}_{j,i}$
where
${\overline{S}}_{j1}=\underset{y=1}{\overset{j1}{{\displaystyle \rangle}}}{S}_{y}$
At the final stage , the cumulative number of tests allocated to partition is:
${N}_{i}=\mathrm{min}\left\{N\underset{j=1,j\ne i}{\overset{k1}{{\displaystyle \rangle}}}{S}_{L1,j},\mathrm{max}\left(N{\widehat{C}}_{i}\left({\overline{S}}_{L1}\right),{S}_{L1,i}\right)\right\}$
and
${N}_{k}=N\underset{i=1}{\overset{k1}{{\displaystyle \rangle}}}{N}_{i}$
Among several ways of determining the number of cases at each sampling stage, the simplest one is to select:
${S}_{1}={S}_{2}=\dots =N/L$
However, choosing stage sizes, especially the initial stage size, can be done by following some general criteria, a good initial stage size can be $\sqrt{N}$
for when a two stage sampling scheme is considered, and more generally, Rekab [9] shows that for a two stage procedure, must be chosen such that:
$\underset{N\to \infty}{\mathrm{lim}}{S}_{1}=\infty ,and\underset{N\to \infty}{\mathrm{lim}}\frac{{S}_{1}}{N}=0$
Accelerated Sampling Scheme
By adopting a fully Bayesian approach with Beta priors, Rekab, Thompson and Wei [6] proposed an accelerated sequential design shown to be first order optimal. The accelerated sampling scheme combines the fully sequential sampling scheme and the multistage sampling scheme. It is defined as follows:
Stage 1:
The procedure starts with an initial sample ${S}_{1}$
, which satisfies the conditions specified in (4.1). Then, allocate ${S}_{1}$
equally among partitions
Stage 2 through $L1$
At stage $j$$2\le j\le L1,$
for partition $i=1,2,\dots ,k1,$
the test cases are distributed by the following way:
${S}_{j,i}=\left(\underset{l=1}{\overset{j}{{\displaystyle \rangle}}}{S}_{l}\right){\widehat{C}}_{i}\left({\overline{S}}_{j1}\right);$
and
${S}_{j,k}=\underset{l=1}{\overset{j}{{\displaystyle \rangle}}}{S}_{l}\underset{l=1}{\overset{k1}{{\displaystyle \rangle}}}{S}_{j,i}$
where
${\overline{S}}_{j1}=\underset{y=1}{\overset{j1}{{\displaystyle \rangle}}}{S}_{y}$
At each stage, ${S}_{j}$
must satisfy the two conditions as ${S}_{1}$
.
Stage L:
In the final stage, we adopt a fully sequential approach by allocating one test from partition , if for all ,
$\frac{{n}_{l,i}}{{n}_{l,j}}<\frac{{p}_{i}\sqrt{{\widehat{R}}_{l,i}\left(1{\widehat{R}}_{l,i}\right)}}{{p}_{j}\sqrt{{\widehat{R}}_{l,j}\left(1{\widehat{R}}_{l,j}\right)}}$
until all the test cases have been allocated. Note that ${n}_{l,i}$
, ${n}_{l,i}$
are the cumulative test cases allocated to partition and after the allocations of a total of test cases, where
$\sum}_{j=1}^{L1}{S}_{j,i}\le l\le {\displaystyle \sum}_{j=1}^{L1}{S}_{j,i}+N{\displaystyle \sum}_{j=1}^{L1}{S}_{j$
Therefore, the estimate for the whole system is finally obtained as:
Optimality of Sequential Designs
First order optimality of these three sequential designs was established by Rekab, Thompson and Wu[5][6][7], although the focus here is on minimizing the variance incurred by the sequential designs rather than minimizing the Bayes risk incurred by these designs. For estimating the mean difference of two independent normal populations, Woodroofe and Hardwick [12] adopted a quasiBayesian approach. They determined an asymptotic lower bound for the integrated risk and proposed a threestage design that is secondorder optimal. For estimating the mean difference of two general one parameter exponential family, Rekab and Tahir [10] adopted a fully Bayesian approach with conjugate priors. They determined an asymptotic second order lower bound for the Bayes risk.
Monte Carlo Simulations
We consider the case where the test domain is partitioned into two subdomains ${D}_{1}$
and ${D}_{2}$
with reliability $R$
and ${R}_{2}$
respectively with equal usage probabilities ${p}_{1},{p}_{2}.$
Second order optimality of the three sequential designs is investigated through Monte Carlo simulations.
Table I. ${N}^{2}*\left(Var\left(\Delta \right)Var\left(O\right)\right)$
by Fully Sequential Scheme 
$\left({R}_{1},{R}_{2}\right)$

$N=300$

$N=500$

$N=800$

$N=2000$

$N=5000$

$N=8000$

0.1,0.9 
28.1562 
3.2104 
0.7390 
2.4225 
9.3040 
0.7133 
0.5,0.2 
29.0380 
6.1020 
0.6300 
7.9375 
4.6700 
4.0500 
0.5,0.5 
1.5972 
0.1460 
0.2120 
0.9125 
0.5120 
0.4933 
0.5,0.9 
80.7747 
71.9032 
19.9510 
7.9600 
5.7960 
1.2666 
0.9,0.3 
64.0246 
53.8008 
5.5497 
3.2565 
15.2546 
2.6517 
Table II. ${N}^{2}*\left(Var\left(\Delta \right)Var\left(O\right)\right)$
by Multistage Scheme 
$\left({R}_{1},{R}_{2}\right)$

$N=300$

$N=500$

$N=800$

$N=2000$

$N=5000$

$N=8000$

0.1,0.9 
10.9835 
0.0288 
4.0718 
55.7962 
2.3320 
1.8422 
0.5,0.2 
11.8285 
19.1720 
2.0555 
5.5100 
0.8432 
7.1258 
0.5,0.5 
0.02749 
0.0459 
0.1750 
0.4568 
0.47324 
0.1064 
0.5,0.9 
37.6665 
31.6257 
17.6417 
3.6377 
6.8483 
9.2952 
0.9,0.3 
20.2034 
2.86352 
7.6161 
3.1589 
11.7869 
5.2018 
Table III. ${N}^{2}*\left(Var\left(\Delta \right)Var\left(O\right)\right)$
by Accelerated Scheme 
$\left({R}_{1},{R}_{2}\right)$

$N=300$

$N=500$

$N=800$

$N=2000$

$N=5000$

$N=8000$

0.1,0.9 
30.5713 
6.5870 
6.4568 
17.4023 
2.1138 
1.0098 
0.5,0.2 
8.1424 
6.1346 
7.1683 
0.8968 
0.1462 
2.1009 
0.5,0.5 
0.03801 
0.23284 
0.0319 
0.7025 
0.0640 
0.0392 
0.5,0.9 
37.6665 
31.6257 
17.6417 
3.6377 
6.8483 
9.2952 
0.9,0.3 
48.4148 
13.8170 
7.8473 
8.4656 
5.8734 
1.6708 
Table I, II, III seem to indicate that the speed ${N}^{2}*\left(Var\left(\Delta \right)Var\left(O\right)\right)$
is bounded.
Conclusion
Second optimal designs are more efficient than the first order optimal designs especially when the total number of cases is very large. This is the main argument that led us to investigate the second order optimality of these three dynamic designs. Simulation studies seem to indicate that these designs are second order optimal. We conjecture that second order optimally may be obtained theoretically as well.
It is very common in parametric estimation to use the squared error loss. However, in reliability estimation one should distinguish between the cost of overestimating and underestimating the system reliability. Examples of practical loss functions were presented by Stüger[2]:
$l\left(\widehat{R},R\right)={c}_{o}{\left(\widehat{R}R\right)}^{2}{I}_{\left\{\widehat{R}>R\right\}}+{c}_{u}{\left(R\widehat{R}\right)}^{2}{I}_{\left\{\widehat{R}<R\right\}}$
and by Granger[1]:
$l\left(\widehat{R},R\right)={c}_{o}\left(\widehat{R}R\right){I}_{\left\{\widehat{R}>R\right\}}+{c}_{u}\left(R\widehat{R}\right){I}_{\left\{\widehat{R}<R\right\}}$
where represents the overestimation and underestimation costs, respectively.
References
 Sankaran M (1970) The discrete PoissonLindley distribution. Biometrics 26(1): 145149.
 Ghitany ME, AlMutairi DK (2009) Estimation Methods for the discrete PoissonLindley distribution. Journal of Statistical Computation and Simulation 79(1): 19.
 Shanker R, Mishra A (2014) A twoparameter PoissonLindley distribution. International Journal of Statistics and Systems 9(1): 7985.
 Shanker R, Mishra A (2013 a) A twoparameter Lindley distribution. Statistics in Transition new Series 14(1): 4556.
 Shanker R, Mishra A (2015) A quasi PoissonLindley distribution (submitted).
 Shanker R, Mishra A (2013 b) A quasi Lindley distribution. African journal of Mathematics and Computer Science Research 6(4): 6471.
 Shanker R, Sharma S, Shanker R (2012) A Discrete twoParameter Poisson Lindley Distribution. Journal of Ethiopian Statistical Association 21: 1522.
 Shanker R, Sharma S, Shanker, R (2013) A twoparameter Lindley distribution for modeling waiting and survival times data, Applied Mathematics 4: 363368.
 Shanker R, Tekie AL (2014) A new quasi PoissonLindley distribution. International Journal of Statistics and Systems 9(1): 8794.
 Shanker R, Amanuel AG (2013) A new quasi Lindley distribution, International Journal of Statistics and Systems 8(2): 143156.
 Johnson NL, Kotz S, Kemp AW (1992) Univariate Discrete Distributions, 2nd edition, John Wiley & sons Inc.
 Fisher RA, Corpet AS, Williams CB (1943) The relation between the number of species and the number of individuals in a random sample of an animal population. Journal of Animal Ecology 12(1): 4258.
 Kempton RA (1975) A generalized form of Fisher’s logarithmic series. Biometrika 62 (1): 2938.
 Tripathi RC, Gupta RC (1985) A generalization of the logseries distribution. Comm. in Stat. (Theory and Methods) 14(8): 17791799.
 Mishra A, Shanker R (2002) Generalized logarithmic series distributionIts nature and applications, Proceedings of the Vth International Symposium on Optimization and Statistics 2830: 155168.
 Loeschke V, Kohler W (1976) Deterministic and Stochastic models of the negative binomial distribution and the analysis of chromosomal aberrations in human leukocytes. Biometrische Zeitschrift 18: 427451.
 Janardan KG, Schaeffer DJ (1977) Models for the analysis of chromosomal aberrations in human leukocytes. Biometrical Journal (8)599612.
 Mc Guire JU, Brindley TA, Bancroft TA (1957) The distribution of European cornborer larvae pyrausta in field corn. Biometrics 13: 6578.
 Catcheside DG, Lea DE, Thoday JM (1946) Types of chromosome structural change induced by the irradiation on Tradescantia microspores. Journal of Genetics 47: 113136.
 Catcheside DG, Lea DE, Thoday JM (1946) The production of chromosome structural changes in Tradescantia microspores in relation to dosage, intensity and temperature. Journal of Genetics l.47: 137149.
 Lindley DV (1958) Fiducial distributions and Bayes theorem. Journal of Royal Statistical Society Ser. B Vol.20: 102107.

