III. HOME WORK PROBLEMS
8. GOAL SOFTENING (3)
It is well known that the accuracy of a sampling scheme depends only on
the size of the sample and NOT on the size of the underlying population
which we shall assume to be infinite in this problem. Suppose the distribution
of performances J of a system is normal N(0,sJ)
when plotted against a design variable q. We randomly sample N
designs, q1, q2, ..., qN
in an experiment and observe the system performance, J(qi)
in additive noise N(0,snoise), i.e., J(qi)observed
= J(qi) + noise. We now ask the question what
is
Prob{max[J(qi)observed, i
= 1, ..., N] in top-5% of J(q)} = p = ?
assuming we are interested in maximizing performance.
-
Purely as a test of your probabilistic intuition, what do you think is
the likely value of p for the case sJ = snoise
and N = 100?
A. p <= 0.4
B. 0.4 <= p < 0.75
C. p > 0.75
Choose one alternative and then calculate the answer to see if you are
correct.
-
Whatever the value of p in the above, now consider doing m
independent experiments of N samples each and ask
Prob{at least 1 of max[J(qi)observed,
i = 1, ..., N] of the m experiments in top-5% of J(q)}
= p(m) = ?
-
Calculate p(m) as a function of m, p. What
did you learn from this calculation?
(Try out a couple of your guesses in the previous question for different
values of m).
-
Suppose instead of getting m we did ONE experiment with mN
samples and asks
Prob{max[J(qi)observed, i
= 1, ..., mN] in top-5% of J(q)} = p*
= ?
-
Is p* > p(m) or p(m) > p*?
Can you relate this to the ideas of Ordinal Optimization?
SOLUTION:
-
The correct estimate is that p ~0.52 which is not very good. In
fact if we increase the size of samples, N, to 300, p increases
only slightly (see 3. below). To calculate this probability, the easiest
way is by direct simulation.
-
To have at least ONE of the best of the m separate exepriments,
say m = 3, of 100 samples each belonging to the top 5% is equivalent
to one success in 3 Bernoulli trials with probability of p (obtained
in the first question) for success. This is given by
p(3) = 1 - (1 - p)3
-
For p = 0.4, p(3) = 0.784 which is considerably more interesting.
Note also that once p is determined, we can calculate p(m)
for different m without further simulation.
-
We assert that p(3) > p*. This is achieved
by asking a softer question (vs. asking that the best of 3*100 = 300
samples belong to the top-5%).