II. SIMPLE EXPLANATIONS OF KEY IDEAS IN ORDINAL OPTIMIZATION
II.2. SOME NOTATIONS AND CONCEPTS OF ORDINAL OPTIMIZATION
- OO is a generalization of traditional optimization by relaxing some
concepts. We first introducte the notion of a "Good Enough"
Subset G of the search space. In traditional optimization,
G is a singleton, namely the optimum. In OO, G is
usually a multi-element set, say the top-5%. Similarly, we introduce
the "Selected" subset S of the search space. Again
traditionally, S is a singleton, the estimated optimum, and we
require it to coincide with the true optimum. This is akin to hit one
speeding bullet with another. In OO, we only require that G
intsc S being non-empty which is more like hitting a truck with
a shot gun [Ho-1994, Ho-1995].
- In particular we are interested in the probability that
|G intsc S| > k for a given problem and a
given method of selecting S, i.e.,
Alginment Probability = AP = P(|G intsc
S| > k) = ?
We often let G be the top-n% of the search space,
uniformly sample N points of the search space, and select the
top-m% of the observed performances as S. We called
such method of selection the Horse Race method. Clearly, the
noise/error of the observed performance, sigma2, as well as
the number of samples N can effect the AP. In the limit
of sigma2- > Inf, this method is equivalent to blindly
pick out |S| points in the search space. The AP can be
and has been calculated for a range of N, sigma2,
|G|, |S|, and the category of the problem, C. We
call these the Universal Alignment Probability Curves in the sense
that they can be applied to ANY system optimization problem [Lau-Ho
1995].
- Another concept in OO is that of the Performance
Density/Distribution Function, PDF, of a
problem. Consider the thought experiment of evaluating the
performances ALL possible designs in the search space and plot the
histogram (density function) of the performances as illustrated
Fig.1 The PDF.
A related notion is that of the Ordered Performance
Curve, OPC. This is a plot of the performances ordered
in 1st, 2nd, 3rd, ..., etc., which by definition must be
monotonically increasing if the best (#1) design is identified with
the minimum. If we normalize the best and worst performances between
0 and 1, and turn the OPC sideways, it becomes the Performance
Distribution Function, the integral of the Performance Density
Function.
- Because of the montonic nature of the OPC, only a limited number of
shapes are possible. We submit five major categories are sufficient
to characterize all problems: mostly bad designs, mostly good designs,
either good or bad designs, mostly mediocre designs, and uniformly
distrbuted designs. The typical shape of these five categories are
illustrated in Fig. 2
Fig.2 The Five Categories of Problems
- In practice, of course we cannot observe either the PDF or the OPC of
a problem. We only estimate them via a model using simulation or
approximate computation. We designate such model as surrogate
models. The point here is that these surrogate models can be VERY
crude and totally inadequate to estimate the value of the system
performances. However, because of their simplicity, we can use them
to calculate reasonable estimate of the PDF and OPC via sampling. A
typical assumption is that
Performanceobserved = Performancetrue
+ noise/error
which is known as the Thurstone model. However, the scale of the
observed performance and true performance values may be quite
different in some problem, e.g., the probability of a rare event under
normal condition vs. probability of the same event under stress
condition. In such case, we can postulate that the two set of
probabilities are still ordered approximately the same with resepct to
parametric variations, i.e.,
Orderobserved = Ordertrue
+ Perturbation in Order
In either case, given the model, the category of the probelm,
N, sigma2, |G|, we have calculated the
required |S| in order to have AP = 0.95 for different
values of k. In other words, we can for ANY problem narrow the
search from N to |S| if we wish to guarantee with
probability 0.95 that at least k of the
top-|G|/N% of the design are in the selected set,
S [Lau-Ho 1995].
- With these definitions, the following chess playing paradigm becomes
meaningful. At some midway point in a chess game, suppose you have
move A vs. move B. It is simply impossible for you to determine what
are the actual consequences of A or B. (Note we are brainwashed to
believe that to solve a decision problem we should evaluate and
compare the expected utility of all choices). However, you really
only need to decide whether or not A is better than B. You don't have
to know how much better. The latter is an ordinal concept while the
former a cardinal one. To decide A vs. B we often use a surrogate
model of evaluating move A and B a few moves deep and then use the
resulting board position to decide A vs. B. This decision of course
is not always right. But it is much better than blind choice and
converges to the optimal one exponentially fast as the difference
between A and B grows (corresponding to the size of our good-enough
set G and the selection set S in OO). Many chess
program that play at grand master level and capable of beating world
champion are design this way. This is a perfect example of OO in
action and explains why we can get away with short simulations and
crude models that have large statistical errors.
- OO is also a quantification of heuristic notions such as 80/20
solution, ballpark estimate, and order-vs-value. In fact, OO can be
considered as justification to how we got along in life with not
enough information, not enough time, and not enough resources for most
problems. The point is that for many problems it is sufficient to
know whether or not alternative A is better than B without necessarily
knowing what A will turn out to be eventually, e.g., marriage and
career choices.
To Next Slide
/
To Table of Content