1. Stochastic Orders We all know how to compare real variables, but do we know how to compare random variables, random vectors, and functions of random vectors? In the two ideal (trivial?) cases ≤ω and ≤as described below, the comparison of random variables is essentially the same as that of real variables. Definition 1.1. Sample-Path Dominance (≤ω): X ≤ω Y if X and Y are defined on the same sample space Ω and X(ω) ≤ Y(ω) ∀ ω ∈ Ω. ◊◊ Definition 1.2. Almost Sure Sample-Path Dominance ≤as: X ≤ as Y if X and Y are defined on the same Ω and X(ω) ≤ Y(ω) ∀ ω ∈ A ⊂ Ω, where P(A) = 1. ◊◊ In general, life is not as ideal as in ≤ω and ≤as, since random variables may not be defined on the same Ω, and even if they are, they may not be ordered at every sample points. In this section, we will compare random variables in more realistic settings. We would like to know that if X ≥ Y in some (stochastic) senses, can we say anything on P(X ≤ t) < P(Y ≤ t) for some t or for all t? Can we predict which of f(X) and f(Y), or E[f(X)] and E[f(Y)], is “larger” for a given function f? Do we need other conditions, such as a precise notion of order and some restrictions on f in order to draw a concrete conclusion? 1.1. Stochastic Order ≤st Stochastic order ≤st is the most common notion of comparison of random variables. Let F and G be the distribution function of random variables X and Y, respectively. The following is an inviting definition of ≤st for X and Y. Trial Definition 1.1.1. X ≤st Y if E(X) ≤ E(Y). ◊◊ Does the above definition have nice properties? For example, if E(X) ≤ E(Y), will E[f(X)] ≤ E[(Y)] for a wild class of functions? Will E(X) ≤ E(Y) shed light on P(X ≤ t) ≥ P(Y ≤ t) for some t? For all t? A more useful definition is: Definition 1.1.2. X ≤st Y if Fc(t) ≤ Gc(t) ∀ t. ◊◊ Remark 1.1.3. It is equally valid to say that F ≤st G when X ≤st Y. Wherever applicable, we can also say that the density function or the probability mass function of one random variable is stochastically less than or equal to that of another random variable. ◊◊ Stochastic ordering is a partial ordering, i.e., it is reflexive (F ≤st F), transitive (F ≤st G, G ≤st H ⇒ F ≤st H), and antisymmetric (F ≤st G and G ≤st F ⇒ F = G). It is not a complete 1 ordering since there exist random variables (distributions) which cannot be ordered through the ordering. The antisymmetry defines stochastic equivalence: X (~ F) =st Y (~ G) if F = G (X ≤st Y and Y ≤st X). There are many equivalent definitions of ≤st. Equivalent Definitions 1.1.4. (a). X ≤st Y ⇔ F(t) ≥ G(t) ∀ t. (b). X ≤st Y ⇔ P(X ≥ t) ≤ P(Y ≥ t) ∀ t. (c). X ≤st Y ⇔ E[f(X)] ≤ E[f(Y)] ∀ increasing (non-decreasing) function f. (d). Coupling (Sample-space construction): X ≤st Y ⇔ there exist random variables X ~ X and Y ~ Y such that X (ω ) ≤ Y (ω ) ∀ ω in the constructed sample space Ω. ◊◊ Example 1.1.5. (a). If X ~ exp(λ) and Y ~ exp(µ) such that λ < µ, X ≥st Y. (b). If X ~ Poisson(λ) and Y ~ Poisson(µ) such that λ < µ, X ≤st Y. ◊◊ Properties of Stochastic Order 1.1.6. (a). Ordering of mean values: If X ≤st Y and E(Y) is well defined, E(X) ≤ E(Y). (b). Closed under transformations by increasing (decreasing) functions: If X ≤st Y, f(X) ≤st f(Y) for any increasing function f. (c). Closed under scaling: X ≤st Y ⇒ cX ≤st cY ∀ c > 0. (d). Closed under shifting: X ≤st Y ⇒ X+c ≤st Y+c ∀ c, (e). Closed under positive part: The positive part of a random variable Z, Z+ ≡ max{0, Z}, and its negative part is Z- ≡ -min{0, Z}. If X ≤st Y, then X+ ≤st Y+ and Y− ≤st X−. (f). X ≤st Y ⇒ -Y ≤st -X. (g). Closed under mixture: If (X|Θ = θ) ≤st (Y|Θ = θ) ∀ θ ∈ support of Θ. then X ≤st Y. 2 (h). Closed under convolution: Let {Xi} and {Yi}, Xi and Yi, be two m-dimensional random m vectors, with independent elements within each vector. If Xi ≤st Yi for all i, ∑i =1 Xi ≤st ∑ m Y. i =1 i D D (i). Closed under convergence in distribution: Suppose that Xn   → X, Yn   → Y, and Xn ≤st Yn ∀ n. Then X ≤st Y. Example 1.1.7. Use of stochastic orderings Let Dn be the delay of the nth customers in a GI/G/1 queue. Suppose that the queue starts empty. It is straightforward to check that Dn+1 = [Dn+Sn−Tn]+, where Sn and Tn are the service time of the nth customer and the inter-arrival time between the (n-1)st and the nth customers. Consider two GI/G/1 queues A1/G1/1 and A2/G2/1 with A2 ≤st A1 and G1 ≤st G2. Let Dn(i ) be Dn in the ith queue. The properties above and recursive expression Dn+1 = [Dn+Sn−Tn]+ imply (1) that D2 ≤st D 2( 2 ) . By induction, Dn(1) ≤st D n( 2 ) . If the queues are stable and the limiting distributions of D∞(i ) ’s exist, implies that D∞(1) ≤st D∞( 2 ) . ◊◊ Exercise 1.1.8. (a). Extension of Closed Under Convolution: Let {Xi} be m independent random variables and {Yi} be another m independent random variables such that Xi ≤st Yi. For any increasing function f:ℜm → ℜ, f(X1, ..., Xm) ≤st f( Y1, ..., Ym). (b). Closed under Random Sum: Let {Xi} be a sequence of independent random variables; M be a positive integer-valued random variable independent of {Xi}. Let {Yi} be another sequence of independent, non-negative random variables; N be a positive integer-valued M random variable independent of {Yi}. If M ≤st N and Xi ≤st Yi, then ∑i =1 Xi ≤st ∑iN=1 Yi . (c). Closed under Order Statistics: Let {Xi} be a (finite) collection of i.i.d. random variables and {Yi} be another collection of i.i.d. random variables. Suppose that Xi ≤st Yi. Then X(i) ≤st Y(i), where {X(i)} and {Y(i)} are the order statistics of {Xi} and {Yi}, respectively. (d). Let f:ℜ→ℜ be such that x ≤ f(x) ∀ x. Then X ≤st f(X). (e). (X|X ≤ t) and (X|X > t) are stochastically increasing in t, i.e., (X|X ≤ t1) ≤st (X|X ≤ t2) and (X|X > t1) ≤st (X|X > t2) for t1 ≤ t2. (f). (i). (ii). If X ≤st Y and E(X) = E(Y), then X =st Y. If X ≤st Y and E[g(X)] = E[g(Y)] for some strictly increasing function g, then X =st Y. (g). Let {Xi} be a collection of i.i.d. random variables and {Yi} be another collection of i.i.d. random variables. Let M and N be two non-negative integer-valued random variables 3 such that M is independent of {Xi} and N is independent of {Yi}. Let k be a positive integer. k M N i =1 i =1 i =1 Suppose that M ≥st kN, and ∑ X i ≥ st Y1 . Then ∑ X i ≥ st ∑ Yi . Let Yi =st X(Θi). X(θ1) ≤st X(θ2) if θ1 ≤ θ2. Suppose Θ1 ≤st Θ2. Then Y 1 ≤st Y2. (h). Example 1.1.9. (Motivation for Bi-variate Characterization of Stochastic Order) n jobs of independent processing times, X1, ..., Xn, are available for processing. For any schedule n (policy) π, the completion time of job i is Ciπ , and the total flow time of jobs is Fπ = ∑ C iπ . i =1 We would like to find a policy π that minimizes Fπ, assuming that no new job comes, and that the processing of any job cannot be pre-preempted once it has been started. Sol. If the problem is deterministic Problem, i.e., Xi = xi , i = 1 to n. The policy SEPT, shortest-expected-processing-time-first, minimizes Fπ. When the problem is stochastic, SEPT still minimizes the mean total flow time. However, what happens if the objective is to minimize f(Fπ), where f is some pre-specified function? Can we conclude E[f( Fπ )] ≤ 1 E[f( Fπ )] or compare f( Fπ ) and f( Fπ ) simply by noting Fπ , Fπ , and property of f? 2 1 2 1 2 To compare policies π1 and π2, the interchange argument basically leads to comparing functions of two random variables. So it is natural to find bi-variate characterization of stochastic orders. ◊◊ By a bi-variate characterization of ≤st, we want to find (a collection of) functions g(X, Y) such that we can assert whether X ≤st Y from properties of g. Think about the deterministic > case. How can we tell x = y if we cannot see the values of x and y. Suppose that we need to < determine which of x and y is bigger just by looking at the function value of g(x, y). What do we require the properties of g to be? A natural choice is g(x, y) = 0 if x = y; g(x, y) > 0 if x < y; g(x, y) < 0 if x > y. Then x ≤ y ⇔ g(x, y) ≥ 0. Any such g completely characterizes the relative magnitude of x and y. Can we borrow such an idea to stochastic orderings? Let G = {g(x, y): g(x, y) = 0 if x = y; g(x, y) > 0 if x < y; g(x, y) < 0 if x > y}. Does the following work? Trial Definition 1. Let g ∈ G. X ≤st Y ⇔ There exist independent random variables Xˆ and Yˆ such that Xˆ = st X , Yˆ = st Y , and E[ g ( Xˆ , Yˆ )] ≥ 0. Remark 1.1.10. Stochastic order is defined through the marginal distributions of two random variables. It does not care whether the two random variables are independent or not. However, in bi-variate characterizations, the operations on g may depend on the dependence of the random variables. Presently, most (all?) bi-variate characterizations are defined on independent random variables. To simply notation, in the following, we will simply put E[g(X, Y)] ≥ 0 instead of saying there exist independent Xˆ = st X and Yˆ = st Y such that 4 E[ g ( Xˆ , Yˆ )] ≥ 0. We can either assume that the original X and Y are independent, or we the construction of independent Xˆ = X and Yˆ = Y is done automatically. st st Does this definition work? No, it does not. For example, g(x, y) = -x+y satisfies the conditions of the required function. However E[g(X, Y)] ≥ 0 ⇔ E(X) ≤ E(Y), which is not sufficient to conclude X ≤st Y. Trial Definition 2. X ≤st Y ⇔ E[g(X, Y)] ≥ 0 for any g ∈ G. Suppose that E[g(X, Y)] ≥ 0 for any g ∈ G. Then for any increasing function f, g(x, y) = f(y)- f(x) ∈ G. E[g(X, Y)] ≥ 0 ⇔ E[f(X)] ≤ E[f(Y)] ⇔ X ≤st Y. One side implication has been established. Suppose that X ≤st Y. Can we establish E[g(X, Y)] ≥ 0 for any g ∈ G? No, G is too loose without specifying the positive and negative values of g(x, y), which must bear some symmetry. Define Gst = {g(x, y): g(x, y)-g(y, x) is decreasing in x for all x}. Note that implicitly g(x, y)-g(y, x) is increasing in y for all y. Definition 1.1.11. X ≤st Y ⇔ E[g(X, Y)] ≥ E[g(Y, X)] for any g ∈ Gst. Proof. “⇐”: Let f be any increasing function. Then g(x, y) = - f(x) ∈ Gst. E[g(X, Y)] ≥ E[g(Y, X)] ⇔ -E[f(X)] ≥ -E[f(Y)] ⇔ E[f(X)] ≤ E[f(Y)] ⇔ X ≤st Y. ( ) ( ) “⇒”: Suppose that X ≤st Y. Generate two independent samples Xˆ 1 , Yˆ1 and Xˆ 2 , Yˆ2 such that Xˆ ≤ Yˆ , i = 1, 2. i = = i g ( Xˆ 1 , Yˆ2 ) − g (Yˆ2 , Xˆ 1 ) + g ( Xˆ 2 , Yˆ1 ) − g (Yˆ1 , Xˆ 2 ) g ( Xˆ , Yˆ ) − g (Yˆ , Xˆ ) + g ( Xˆ , Yˆ ) − g (Yˆ , Xˆ ) 1 2 1 2 2 1 2 1 g ( Xˆ 1 , Yˆ2 ) − g ( Xˆ 1 , Xˆ 2 ) + g ( Xˆ 1 , Xˆ 2 ) − g (Yˆ1 , Xˆ 2 ) + g ( Xˆ 2 , Yˆ1 ) − g ( Xˆ 2 , Xˆ 1 ) + g ( Xˆ 2 , Xˆ 1 ) − g (Yˆ2 , Xˆ 1 ) ≥ 0 ⇔ E[g(X, Y)] ≥ E[g(X, Y)]. ◊◊ The current form of Gst is not convenient to check. Are there any subsets of Gst which characterize stochastic order and whose properties are easy to check? Consider a function g(x, y) such that g is decreasing in x and increasing in y. Let X and Y be independent random variables such that X ≤st Y. Can we say something about g(X, Y) and g(Y, X)? Construct two independent pairs ( Xˆ 1 , Yˆ1 ) and ( Xˆ 2 , Yˆ2 ) such that Xˆ i ≤ Yˆi , i = 1, 2. By construction, g ( Xˆ , Yˆ ) + g ( Xˆ , Yˆ ) ≥ g (Yˆ , Xˆ ) + g (Yˆ , Xˆ ), which implies E[g(X, Y)] ≥ E[g(Y, 1 2 2 1 1 2 2 X)]. Now define 5 1 Gst2 = {g(x, y): g(x, y) is decreasing in x and increasing in y}. Suppose that for all g ∈ Gst2 , E[g(X, Y)] ≥ E[g(Y, X)]. Then let f:ℜ → ℜ be an arbitrary increasing function. –f ∈ Gst2 and hence -E[f(X)] ≥ -E[f(Y)] ⇔ E[f(X)] ≤ E[f(Y)] ⇔ X ≤st Y. Hence, we have shown Definition 1.1.12. X ≤st Y ⇔ E[g(X, Y)] ≥ E[g(Y, X)] for any g ∈ Gst2 . Exercise 1.1.13. Let X and Y be independent random variables. X ≤st Y ⇔ g(X, Y) ≥st g(Y, X) for any g ∈ Gst2 . Remark 1.1.14. We deliberately use the notation Gst2 , because there are other bi-variate characterizations using related but different sets of functions Gsti . In some of these bi-variate characterizations, weaker orders between bi-variate functions, such as the increasing convex order, may imply the stochastic order between two random variables. 1.2. Hazard Rate Order Hazard rate is a concept from reliability. Let X be a lifetime (i.e., nonnegative random variable) with an absolutely continuous distribution function F. The hazard rate of X (or F) when X has survived t units, r(t), is defined as r(t) P( X fails within next ∆t units | X survives t units) ∆t →0 ∆t P(t < X < t + ∆t|X > t ) f (t ) t ≥ 0. = lim = , ∆t →0 ∆t F c (t ) = lim X is of increasing failure rate (IFR) if r(t) is an increasing function of t. Note that r(t) = d [− log F c (t )] . Hence, X is of IFR ⇔ -log Fc(t) is convex on {t: Fc(t) > 0}. dt Let {r(t)} and {q(t)} be the hazard rate function of (nonnegative) X and Y, respectively. Definition 1.2.1. X is smaller than Y in the hazard rate order, i.e., X ≤hr Y, if r(t) ≥ q(t), t ≥ 0. This definition can be generalized to any two random variables with absolutely continuous distributions. In that case, X ≤hr Y iff r(t) ≥ q(t) for -∞ < t < ∞. 6 Equivalent Definitions 1.2.2. (Continuous random variables) (a). F c (t ) is decreasing in t. G c (t ) (b). Fc(u)Gc(v) ≥ Fc(v)Gc(u) for all u ≤ v. (c). f (u ) c F (v ) ≥ g (u ) G c (v ) X ≤hr Y ⇔ for all u ≤ v. (d). F c (t + s ) G c (t + s ) ≤ for all s ≥ 0, for all t. F c (t ) G c (t ) (e). P(X - t > s|X > t) ≤ P(Y - t > s|Y > t) for all s, for all t. (f). (X|X > t) ≤st (Y|Y > t) for all t. (g). 1 − FG −1 (1 − u ) 1 − FG −1 (1 − v) ≤ for all 0 < u ≤ v < 1. u v Hazard rate order can be applied to non-negative discrete random variables X and Y defined on {0, 1, 2, 3, …}. Definition 1.2.3. (Discrete random variables) X ≤hr Y if P( X = n) P(Y = n) ≥ , P( X ≥ n) P(Y ≥ n) n ∈ {0, 1, 2, …}. Equivalent definitions can easily be deduced from those for continuous random variables. For example, X ≤hr Y ⇔ P(X ≥ n1) P(Y ≥ n2) ≥ P(X ≥ n2) P(Y ≥ n1) for all n1 ≤ n2. Properties of Hazard Rate Order 1.2.4. (a). ≤hr ⇒ ≤st. (b). Closed under increasing functions: If X ≤hr Y and f is an increasing function, then f(X) ≤hr f(Y). (c). Closed under minimization: Let (Xi, Yi), i = 1, …, m, be independent pairs of random variables such that Xi ≤hr Yi, i = 1, …, m. Then min{X1, …, Xm} ≤hr min{Y1, …, Ym}. (d). Closed under order statistics: Let Xi, i = 1, …, m, be i.i.d. random variables, and Yi, i = 1, …, m, be another set of i.i.d. random variables such that {Xi} is independent of {Yi}. Suppose that Xi ≤hr Yi, i = 1, …, m. Then X(k) ≤hr Y(k), k = 1, …, m. 7 (e). Closed under convolution: Let (Xi, Yi), i = 1, …, m, be independent pairs of random m m i =1 i =1 variables such that Xi ≤hr Yi, i = 1, …, m. If all Xi’s and Yi’s are of IFR, then ∑ X i ≤ hr ∑ Yi . (f). Closed under random convolution: Let {Xi} be a sequence of nonnegative IFR independent random variables. Let M and N be two discrete positive integer-valued random M N i =1 i =1 variable such that M ≤hr N. Then ∑ X i ≤ hr ∑ Yi . (g). Closed under mixture: Let X, Y, and Θ be random variables such that [X|Θ = θ1] ≤hr [Y|Θ = θ2] for all θ1 and θ2 in the support of Θ. Then X ≤hr Y. How can we find out a bi-variate characterization of ≤h? It is simple to check that X ≤hr Y ⇒ X ≤st Y. So if Ghr is a set of functions that characterizes ≤hr, Gst = {g(x, y): ∆g(x, y) = g(x, y)-g(y, x) is decreasing in x for all x} ⊂ Ghr. A definition of X ≤hrY is that Fc(u)Gc(v) ≥ Fc(v)Gc(u) for all u ≤ v, which is equivalent to E[1{X>u, Y > v}] ≥ E[1{X>v, Y > u}] for all u ≤ v. Now consider g(u, v) = 1{x > u, y > v} for fixed x and y. ∆g(u, v) = 1{x > u, y > v} - 1{x > v, y > u} for u ≤ v. Check that ∆g(u, v) is a decreasing function of u. With this result, let us conjecture that Definition 1.2.5. (Bi-variate Characterization of ≤hr) X ≤hr Y ⇔ E[g(X, Y)] ≥ E[g(Y, X)] for all g ∈ Ghr = {g(x, y): ∆g(x, y) = g(y, x) - g(x, y) is decreasing in x for all x ≥ y}. ◊◊ Another bi-variate characterization is: Definition 1.2.6. (Bi-variate Characterization of ≤hr) 2 X ≤hr Y ⇔ E[g(X, Y)] ≥ E[g(Y, X)] for all g ∈ Ghr = {g(x, y): g(x, y) is decreasing in x for all x ≥ y and increasing in y for all y ≥ x}. ◊◊ 8 1.3. Likelihood Ratio Order X ≤lr Y Likelihood ratio order is a strong form of stochastic order. Definition 1.3.1. X ≤lr Y ⇔ f (t ) is decreasing in t over the union of the supports of X and Y. g (t ) ◊◊ Equivalent Definition 1.3.2. X ≤lr Y ⇔ (a). f(u)g(v) ≥ f(v)g(u), for all u ≤ v. (b). GF-1 is a convex function. (c). X ≤lr Y ⇔ (X|a ≤ X ≤ b) ≤st (Y|a ≤ Y ≤ b) whenever a ≤ b. ◊◊ Exercise 1.3.3. Show that ≤lr ⇒ ≤hr. Properties of Likelihood Ratio Order 1.3.4. (a). Closed under increasing functions: If X ≤lr Y and f is an increasing function, then f(X) ≤lr f(Y). (b). Convolution: Let (Xi, Yi), i = 1, …, m, be independent pairs of random variables such that Xi ≤lr Yi, i = 1, …, m. If Xi, Yi, i = 1, 2, …, m, all have logconcave densities, except m m i =1 i =1 possibly one Xl and one Yk, l ≠ k, then ∑ X i ≤ lr ∑ Yi . (c). Convolution: Let {Xi} be a sequence of nonnegative independent random variables with logconcave densities. Let M and N be two discrete positive integer-valued random M N i =1 i =1 variable such that M ≤lr N. Then ∑ X i ≤ lr ∑ Yi . The assertion holds even when M ≤hr N. (d). Let {Xi} be a sequence of nonnegative IFR independent random variables. Let M and N be two discrete positive integer-valued random variables such that M ≤lr N. Then M N i =1 i =1 ∑ X i ≤ hr ∑ Yi . (e). Closed under ordered statistics: Let Xi, i = 1, …, m, be i.i.d. random variables, and Yi, i = 1, …, m, be another set of i.i.d. random variables such that {Xi} is independent of {Yi}. Suppose that Xi ≤l\r Yj for al choices of i and j. Then X(k) ≤hr Y(k), k = 1, …, m. 9 (f). Let X, Y, and Θ be random variables such that [X|Θ = θ1] ≤lr [Y|Θ = θ2] for all θ1 and θ2 in the support of Θ. Then X ≤hr Y. (g). Consider a family of density functions {gθ, θ ∈ X}. Let Θ1 and Θ2 be two random variables with support in X and distribution functions F1 and F2, respectively. Let Y1 and Y2 be two random variables such that Yi =st X(Θi), i = 1, 2, that is, suppose that the density function of Yi is given by hi = ∫ g θ ( y )dFi (θ), y ∈ R, i = 1, 2. If X(θ1) ≤lr X(θ2) whenever θ1 ≤ θ2, and if Θ1 ≤lr Θ2, then Y1 ≤lr Y2. Bi-variate Characterizations 1.3.5. (a). X ≤lr Y ⇔ E[g(X, Y)] ≤ E[g(Y, X)] for all g ∈ Glr = {g(x, y): g(x, y) ≥ g(y, x) whenever x ≥ y}. (b). X ≤lr Y ⇔ g(X, Y) ≤st g(Y, X) for all g ∈ Glr. 10 1.4. Monotone Convex Order This order is again developed from concept in reliability. Let X and Y be two random lifetimes. X in some sense is smaller than Y if at any t, the expected remaining lifetime of X is less than that of Y. Such a notion of order is formalized and is extended to any pairs of random variables. We say that X is smaller than Y in monotone convex order, X ≤c Y (F ≤c G) if E[(X-t)+] ≤ E[(Y-t)+] ∀ t. Definition 1.4.1. The name of this order is adopted from Shaked and Shanthikuma. This order is called mean residual life order in Wolff, variability order in Ross, and convex order in Stoyan. Note that Shaked and Shanthikuma also define a related stochastic order, the convex order. The monotone convex order is a partial ordering, i.e., it is reflexive (X ≤c X), transitive (X ≤c Y, Y ≤c Z ⇒ X ≤c Z), and antisymmetric (X ≤c Y and Y ≤c X ⇒ X =st Y). Proof. The reflexivity and transitivity are obvious. To show antisymmetry, we need to use ∫ the equivalent definition 1.4.2(b) below: ∞ t ∀ t. ∞ F c (u )du = ∫ G c (u)du (< ∞) ∀ t ⇒ Fc(t) = Gc(t) t ◊◊ The monotone convex order is not a complete order since there exist random variables (distributions) that cannot be ordered through the ordering. Equivalent Definitions 1.4.2. (a). X ≤c Y (F ≤c G) ⇔ E[g(X)] ≤ E[g(Y)] for any increasing convex function g with welldefined moments. Proof. “⇐”: g(x-t)+ is increasing, convex. “⇒”: Consider piecewise linear increasing convex function gn(x) defined by 0 = a0 < ... < an < an+1 = ∞ and 0 = s0 <... < sn-1 < sn < ∞ such that gn(x) = gn(aj) + sj(x- aj) for x ∈ [aj, aj+1). n Then gn(x) = ∑ j = 0 ( s j +1 − s j )[ x − a j ] + . X ≤c Y ⇒ E[gn(X)] ≤ E[gn(Y)]. For any increasing convex function g, we can find a sequence of {gn(x)}↓g(x). (b). X ≤c Y (F ≤c G) ⇔ Proof. E[(X-t)+] = (c). ∫ ∞ 0 ∫ ∞ t ∞ F c (u )du ≤ ∫ G c (u)du < ∞ ∀ t. t P[( X − t ) + > u]du = ∞ ∞ 0 t ∫ P( X > u + t )du = ∫ X ≤c Y (F ≤c G) ⇔ E[max(X, t)] ≤ E[max(Y, t)] < ∞ ∀ t. Proof. E[max(X, t)] = t + E[max(X-t, 0)] = t + E[(X-t)+]. 11 F c (u)du. Properties of Monotone Convex Orders 1.4.3. (a). From 1.4.2(a), X ≤c Y, X and Y ≥ 0 ⇒ E(Xr) ≤ E(Yr) for r ≥ 1 (whenever these expectations exist). (b). If a = E(X), then a ≤c X. Proof. g(x) = max(x, t) is an increasing convex function. By Jensen’s inequality, E[g(X)] ≥ g[E(X)], i.e., E[max(X, t)] ≥ max(E(X), t)] ≥ E(X). (c). X ≤c X+c ∀ c > 0. Combining 1.4.3(b) and (c), intuitively smaller in ≤c generally means shorter, less variable, or both. (d). ≤c is weaker than ≤st, i.e., X ≤st Y and E[Y+] < ∞ ⇒ X ≤c Y. Proof. E[Y+] < ∞ guarantee E[Y-t]+ < E[Y]+ + |t| < ∞. The rest follows since g(x) = (x-t)+ is an increasing convex function. (e). Closed under Shift: X ≤c Y ⇒ X+c ≤c Y+c ∀ c. (f). Closed under Scaling: X ≤c Y ⇒ cX ≤c cY ∀ c > 0; (g). Closed under Positive Part: X ≤c Y ⇒ X+ ≤c Y+. Proof. E[max(X+, t)] = E[max(max(0, X), t)] = E[max(max(0, t), X)] ≤ E[max(max(0, t), Y)] = E[max(Y+, t)]; hence follows from (6). (h). Closed under Negation: X ≤c Y and E(X) = E(Y) ⇒ -X ≤c -Y. Proof. X-t = (X-t)+ - (X-t)- = (X-t)+ - max(0, -(X-t)) = (X-t)+ - [-X-(-t)]+. (i). Closed under Negative Part: X ≤c Y and E(X) = E(Y) ⇒ -X ≤c -Y. Proof. E[max(X-, t)] = E[max(max(-X, 0), t)] = E[max(-X, max(0, t))]. (j). Closed under Mixture: (X|Θ = θ) ≤c (Y|Θ = θ) ∀ θ ∈ support of Θ ⇒ X ≤c Y. Proof. E(X) = E[E(X|Θ)] ≤ E[E(X|Θ)] = E(Y). (k). Closed under Convolution: Let {Xi} and {Yi}, Xi ~ Fi and Yi ~ Gi, be two mdimensional random vectors, with independent elements within each vector. Xi ≤c Yi (Fi ≤c m m Gi) ⇒ ∑i =1 Xi ≤c ∑i =1 Yi . 12 Proof. It suffices to prove for two pairs of r.v.’s. Assume that all r.v.’s are independent. From closure under shift, (X1+ X2| X2) ≤c (Y1+ X2| X2), which, by closure under mixture, gives (X1+ X2) ≤c (Y1+ X2). By the same token, (Y1+ X2) ≤c (Y1+ Y2); and result follows transitivity of ≤c. Note that the marginal distributions of the convolution remain unchanged even if the two vectors are dependent. D (l). Closed under Convergence in Distribution: Suppose that Xi (Fi)   → X (F), Yi D + + (Gi)  → Y (G), Xi ≤c Yi ∀ i, E( X i ) → E(X), and E(Yi ) → E(Y). Then X (F) ≤c Y (G). Proof. E[max(Xi, t)+] = E( X i+ ) - ∫F t 0 c ∫F t 0 i c (u)du. By bounded convergence, ∫F t 0 i c (u)du. → (u)du. Note. The conditions E( X i+ ) → E(X) and E(Yi + ) → E(Y) are necessary. Consider Xi = 1; Yi = 0 w.p. (i-1)/i, and = i w.p. 1/i. Then Xi ≤c Yi, but X ≥c Y. (m). Closed under Transform by increasing convex function: X ≤c Y and g is an increasing convex function ⇒ g(X) ≤c g(Y). (n). Let {Xi} be m independent r.v.’s and {Yi} be another m independent r.v.’s; Xi ≤c Yi. ψ:ℜm → ℜ being an increasing and componentwise convex function. Then ψ( X1, ..., Xm) ≤st ψ( Y1, ..., Ym). Proof. It suffices to show the case for m = 2. Just like proving (14), we can assume that all r.v.’s are independent. ψ( ⋅, x) is an increasing function of x. By (16), [ψ( x1, X2)| X1 = x1] ≤c [ψ( x1, Y2)| X1 = x1]; by (11), ψ(X1, X2) ≤c ψ( X1, Y2). A similar argument gives ψ(X1, Y2) ≤c ψ( Y1, Y2), and the result follows from the transitivity of ≤c. Example 1.4.4. (a). If X ~ normal (µ1, σ12 ) , Y ~ normal (µ2, σ 22 ) , µ1 ≤ µ2, σ1 ≤ σ2, then X ≤c Y. (b). GI/G/1 Queue (Wolff, pp 489) Refer to two GI/G/1 queues A1/G1/1 and A2/G2/1 in the example for ≤st (Example 1.1.7). Suppose that A2 ≤c A1 and G1 ≤c G2. Let Dn(i ) be Dn in (1) the (i)th queue. As in Example 1.1.7, D2 ≤c D 2( 2 ) . By induction, Dn(1) ≤c D n( 2 ) . If the queues are stable and the limiting distributions of D∞(i ) ’s exist, we can show that that D∞(1) ≤c D∞( 2 ) . (c). (Ross, pp 273) A random variable X is new better than used in expected (NBUE) if E[X-t|X > t] ≤ E(X) for all t. Suppose that X (~ F) is NBUE and E(X) = 1/µ. Then X ≤c Y ~ exp(µ). 13 Solution. X is NBUE ⇒ E[X-a|X>a] ≤ E(X) ∀ a, i.e., ∫a∞ F c ( y) F c ( a) dy ≤ 1/µ ⇔ µ ≤ . ∞ c F c ( a) ∫a F ( y)dy ∞ Integrating both sides w.r.t. a from 0 to s, and changing variables by defining x = ∫a F c ( y)dy, ∞ c 1 1 we have, sµ ≤ ∫a∫s F ( y )dy − dx ⇔ sµ ≤ ln [ ∞ c ], which, after some algebra, gives x µ ∫s F ( y)dy ∞ c ∞ − sµ -sµ ∫s F ( y)dy ≤ e /µ = ∫s e dµ. (d). Combining the results from Example 2 and 3 (Ross, pp 275) Consider renewal arrivals to a stable single-server queue. If the inter-arrival distribution is NBUE with mean 1/λ, then the mean delay d ≤ λE(S2)/[2(1-λE(S))], where S is the service distribution. The result follows since for Poisson arrivals with rate λ, the mean delay is λE(S2)/[2(1-λE(S))]. ◊◊ Exercise 1.4.5. (a). Closed under Random Sum: Let {Xi} be m independent r.v.’s and {Yi} be another m independent r.v.’s; M and N be two positive integer-valued r.v.’s such that M ≤c N. Then M N ∑i=1 Xi ≤c ∑ Yi . i =1 (b). g. Suppose that X ≤c Y and E(X) = E(Y). Show that E[g(X)] ≤ E[g(Y)] ∀ convex function Note. It follows that X ≤c Y and E(X) = E(Y) ⇒ E(Xr) ≤ E(Yr) for r = 2, 4, ... . (c). A sufficient condition for ≤c: We say that X ~ F and Y ~ G satisfy the cut criterion if E(X) ≤ E(Y) < ∞ and there exists to (< ∞) such that F(t) ≤ G(t) ∀ t < to and F(t) ≥ G(t) ∀ t > to. It is possible that there exists an nondegenerate interval I such that F(t) = G(t) ∀ t ∈ I. Suppose that X and Y satisfy the cut criterion. Then X ≤c Y. (d). Comparing Renewal Process (Ross, pp 257) Let {M(t)} and {N(t)} be renewal processes generated from inter-arrival distributions F and G, respectively. If F is NBUE (NWUE) and G is exponential with the same mean as F. Then M(t) ≤c N(t). ◊◊ 14 1.5. The Notion of Regular Stochastic Convexity Let {Pθ: θ ∈ Θ} be a family of uni-variate distributions, and {X(θ):θ ∈ Θ} be a set of random variables such that X(θ) ~ Pθ. A lot of times we are interested in the monotone properties (increasing or decreasing), the second-order properties (convex or concave), or both the monotone and second-order properties of X(θ) in some stochastic senses as θ increases. As a result, we will mention the following concepts: stochastic increasing (SI), stochastic decreasing (SD), stochastic convex (SCV), stochastic concave (SCV), stochastic increasing convex (SICX), stochastic increasing concave (SICV), stochastic decreasing convex (SDCX), stochastic decreasing concave (SDCV), stochastic increasing linear (SIL), and stochastic decreasing linear (SDL). Example 1.5.1. Exponential Tandem Queue Under Communication Blocking (a) ∞ customers b1 b2 bM-1 (b) ∞ customers b1 b2 bM-1 1 2 ⋅⋅⋅ ⋅⋅⋅ M-1 M µ1 µ2 µM-1 µM Given µ1, ..., µM, the throughput TH is a function of θ = (b1, ..., bM). Knowing the condition under which TH(θ) is concave with respect to θ is useful in designing the line. 1 2 ⋅⋅⋅ ⋅⋅⋅ M-1 M µ1 µ2 µM-1 µM Given b1, ..., bM, and Σµi = c > 0. Will TH is a concave function of θ = (µ1, ..., µM)? ◊◊ There are different notions of stochastic convexity that look very similar. One must be careful to differentiate one notion from another, particularly on how one notion implies (is stronger than) another one. Here we only show one notion, the regular stochastic convexity. Many performance measures of stochastic models, e.g., the throughput of a queueing network, can be expressed in terms of this notion. It is by no means the easiest to prove, and in fact even the regular stochastic convexity is sometime established by stronger notions. The notion is handy summarized by the following table. {X(θ)} SI (SD) SCX (SCV) SICX (SICV) ⇔ {⋅} SI and φ inc. convex (concave) inc. convex ( inc. concave) inc. convex (inc. concave) if E[φ(X(θ))] inc. (dec.) convex inc. convex (inc. concave) SDCX (SDCV) SD SIL SI  convex increasing  concave  convex in θ increasing  concave in θ SDL SD  convex increasing  concave  convex in θ decreasing  concave in θ 15 dec. convex (dec. concave) The table is read in the following way: {X(θ)} is stochastically increasing (SI) in θ if for any increasing function φ, E[φ(X(θ))] is an increasing function in θ. {X(θ)} is stochastically decreasing convex (SDCX) in θ if {X(θ)} is SD and for any increasing convex function φ, E[φ(X(θ))] is decreasing convex in θ. Other items of the table are interpreted in a similar way. Equivalent Definitions 1.5.2. ( {X(θ)} is SICX (SICV) if, and only if {X(θ)} is SI and ∫x∞ FXc ( s, θ)ds ∫−x∞ FXc ( s, θ)ds is increasing (decreasing) convex in θ for all x. (a). (b). (∫ x −∞ {X(θ)} is SDCX (SDCV) if, and only if ) ) {X(θ)} is SD and ∫x∞ FXc ( s, θ)ds FXc ( s, θ)ds is decreasing (decreasing) convex in θ for all x. Properties of regular stochastic convexity 1.5.3. (a). Closed under addition: Let {X(⋅)} and {Y(⋅)} be two independent processes. If {X(θ)} {Y(θ)} is SICX (SICV), then {X(θ)}+{Y(θ)}is SICX (SICV). (b). Closed under composition: If {X(θ)} and {Y(λ)} be SICX (SICV, SIL), then {Y(X(θ))} is SICX (SICV, SIL). If {X(θ)} is SDCX (SDCV, SDL) and {Y(λ)} is SICX (SICV, SIL), then {Y(X(θ))}is SDCX (SDCV, SDL). Read Meester and Shanthikumar [1990] for a stronger notion of stochastic convexity. Meester, Ludolf E., and Shanthikumar, J. George [1990]. Concavity of the throughput of tandem queueing systems with finite buffer storage space. Advances in Applied Probability, 22, 764-767. 16 Tail Properties of (Non-negative) Distributions Let X ∈ [0, b] ~ F (b can be ∞). The remaining life of X at age t Xt ≅ (X-t|X > t). The mean residual life of X at age t E(Xt) = ∞ ∫ P( Xt > u)du = 0 ∞ ∫ P( X − t > u| X > t )du = 0 ∫ ∞ t F c (v)dv /Fc(t). (15) X has bound mean residual life (BMRL) if there exists a γ such that E(Xt) ≤ γ for t < b. If γ = E(X), then X has the property new better than used in expectation (NBUE). Similarly, X has the property new worse than used in expectation (NWUE) if E(Xt) ≥ E(X) ∀ t < b. Similarly, Example 6. Erlang r.v.’s are NBUE; hyper-exponential r.v.’s are NWUE; exponential r.v.’s are both NBUE and NWUE. ♦ If Xt ≤st X ∀ t < b, then X has the property new better than used (NBU); if Xt ≥st X ∀ t < b, then X has the property new worse than used (NWU). An equivalent definition of NBU (NWU) is X is NBU (NWU) ⇔ -ln Fc is superadditive (subadditive) on (0, ∞), i.e., Fc(s+t) ≤ (≥)Fc(s) Fc(t) ∀ s, t > 0 (16) Proof. Directly from definition of NBU (NWU). ♦ X has the property decreasing mean residual life (DMRL) if X is NBUE and E(Xt) is decreasing in t ∀ t < b. X has the property increasing mean residual life (IMRL) if X is NWUE and E(Xt) is increasing in t ∀ t < b. lim ∆t → 0 Let F and f be the distribution and density functions of a continuous r.v. X. λ(t) ≅ = Ffc((tt)) be the failure (hazard) rate of X at age t (> 0). X has increasing failure P ( t < X ≤ t + ∆t ) ∆t rate (IFR) if λ(t) is increasing ∀ t < b; X has decreasing failure rate (IFR) if λ(t) is decreasing ∀ t < b. The failure rate function completely determine the distribution of X, since ∫t Fc(t) = e − 0 λ( u) du . f (u) Proof. ∫0t λ(u)du = ∫0t F c ( u ) du = − lnF c (t ). (17) ♦ An equivalent definition of IFR (DFR) is X has IFR (DFR) ⇔ Xs ≥st Xt if s ≤ t < b (Xs ≤st Xt if s ≤ t < b). (18) Proof. Xs ≥st Xt ⇔ P(X - s|X > s) ≥ P(X - t|X > t) ⇔ Fc(s+v)/ Fc(s) ≥ Fc(t+v)/ Fc(t) ∀ v > 0 ⇔ ∫ s+ v ∫t + v ♦ e − s λ ( u ) du ≥ e − t λ ( u ) du ∀ v > 0. It is clear that from (17) and definitions above that 17 IFR ⇒ (NBU and DMRL) ⇒ NBUE (DFR ⇒ (NWU and IMRL) ⇒ NWUE). (19) We can show that NBU and DMRL (NWU and IMRL) do not imply each other. Exercise 8. Give example to show that NBU and DMRL (NWU and IMRL) do not imply each other. Another equivalent definition of IFR (DFR), which follows directly form differentiating -ln Fc, is X has IFR (DFR) ⇔ -ln Fc is convex (concave) on {t|Fc > 0}. (20) When X and Y are independent r.v.’s with IFR ⇒ min(X, Y) has IFR. (21) Proof. Straightforward calculation shows that failure rate of min(X, Y) = sum of failure rates of X and Y. ♦ IFR is closed under convolution, i.e., X and Y independent r.v.’s with IFR ⇒ X+Y is IFR. (22) The proof is a bit involved. See Barlow and Proschan, pp 100, or Shaked and Shanthikuma, pp 23. DFR is closed under mixture. Let X and Θ be two r.v.’s; (X|Θ=θ) is DFR ∀ θ ⇒ X is DFR. (23) Proof. Let f and fθ be the density function of X and (X|Θ=θ), respectively. λ(t) = f(t)/Fc(t) = E(fΘ)/ Fc(t). By strict forward differentiation, dλ(t ) / dt = [ F c (t ) E( dfΘ / dt ) + E 2 ( fΘ )] / [ F c (t )]2 . The failure rate is decreasing ⇔ Fc(t)E(dfΘ/dt)+E2(fΘ) ≤ 0 ⇔ E2(fΘ) ≤ E( FΘc ) E(dfΘ/dt). Cauchy-Schwartz inequality states that E2(XY) ≤ E(X2) E(Y2), i.e., E( FΘc ) E(-dfΘ/dt) ≥ E 2 ( ( − dfΘ / dt )( FΘc )). Since (X|Θ=θ) has decreasing failure rate, -dfθ/dt Fθc ≥ f θ2 , which establish our result. ♦ Example 7. From exponential r.v.’s, we see that IFR is not closed under mixture, while DFR is not closed under convolution. 18 Stochastic Ordering of Random Vectors and Stochastic Processes Let X = (X1, ..., Xm) and Y = (Y1, ..., Ym) be two random vectors. Defintion. Defintion. X ≤st Y ⇔ E[g(X)] ≤ E[g(Y)] ∀ increasing g:ℜm→ :ℜ. (24) The stochastic processes {X(t); t ≥ 0} ≤st {X(t); t ≥ 0} ⇔ (X(t1), ..., X(tm)) ≤st (Y(t1), ..., Y(tm)) ∀ m, ti ≥ 0. (25) Example 8. Comparing Renewal Process (Ross, pp 257) Let {M(t)} and {N(t)} be renewal processes with interarrival distribution F and G, respective. If F ≤st G, then {M(t)}≥st {N(t)}. Solution. Generate Xi ~ F and Yi ~ G such that Xi ≤ Yi. For all t, M(t) ≥ N(t). ♦ Example 9. Stochastic Monotonicity for a Birth and Death Process (B&D) (Ross, pp 257) Let {X(t); t ≥ 0} be a B&D. Then {X(t)} is stochastically increasing in X(0). Solution. Let {X1(t)} and {X2(t)} be two B&D’s with the same infinitesimal generators such that X1(0) = i < j = X2(0). We will construct a process {X3(t)} = st {X2(t)} and X3(t) ≥ X1(t) ∀ t. Define T ≅ the first time that X1(t) = X2(t); T ≅ ∞ if X1(t) ≠ X2(t) ∀ t. Construct X3(t) ≅ X2(t) for t < T, and ≅ X1(t) for t ≥ T. By the structure a B&D process, if T = ∞, the states of X1(t) and X2(t) never cross; X2(t) ≥ X1(t) and we are done. For finite T, the Markovian property guarantees that {X3(t)} has the same infinitesimal generator as X1(t) and X2(t). ♦ Exercise 9. (Ross, pp 258) For a B&D process, show that P(X(t) ≥ j|X(0) = 0) is increasing in t ∀ j. Note. This result gives an upper bound on P(X(t) ≥ j|X(0) = 0). Example 10. Exponential Convergence in Discrete-Time Markov Chain (DTMC) (Ross, pp 258) For an irreducible finite-state DTMC, the rate of convergence of the n-step transition probability pijn to the stationary distribution πj is exponentially fast. Solution. Suppose that P(X0 = i) = 1, and the chain has N states. We will show that pijn → πj in exponential rate. First construct a process {Yn} which has the same transition probability matrices as {Xn}. Set {Y0} (and hence {Yn}) ~ π. Let T = inf{n: Xn = Yn}. Since {T > mK} ⊂ { XK ≠ YK, X2K ≠ Y2K, ..., XmK ≠ YmK}; P(T > mK) ≤ P(XK ≠ YK)P(X2K ≠ Y2K| XK ≠ YK)...P(XmK ≠ YmK| XK ≠ YK, ..., X(m-1)K ≠ Y(m-1)K). Observe that for an irreducible finite-state DTMC {Xn}, there exists a K such that pijK > 0 for all i and j (and hence pijn > 0 ∀ n ≥ K). Choose an ε > 0 such that pijK > ε ∀ i, j. So in K transitions, P(XK = j) and P(YK = j) > ε; P(XK = YK) > Nε2; P(XK ≠ YK) ≤ (1- Nε2) ≅ (1-α). By the same argument, P(XjK ≠ YjK| ⋅) ≤ (1- Nε2); P(T > mK) ≤ (1-α)m, and hence T is a finite r.v. 19 Now define a second process {Zn} such that Zn ≅ Yn for n ≤ T, and Zn ≅ Xn for n > T. It is clear that {Zn} is a DRMC with the same transition probability matrices as {Xn} and {Yn}. By our initial setting, Z0, and hence {Zn}, ~ π. pijn = P(Xn = j) = P(Xn = j|T ≤ n) P(T ≤ n) + P(Xn = j|T > n) P(T > n); πj = P(Zn = j) = P(Zn = j|T ≤ n) P(T ≤ n) + P(Zn = j|T > n) P(T > n) which means that | pijn - πj| ≤ P(T > n) ≤ (1-α)n/K-1. ♦ Note. The uniqueness of stationary distributions also follows from the argument. 20