Notations and Concepts

II. SIMPLE EXPLANATIONS OF KEY IDEAS IN ORDINAL OPTIMIZATION

II.2. SOME NOTATIONS AND CONCEPTS OF ORDINAL OPTIMIZATION

OO is a generalization of traditional optimization by relaxing some concepts. We first introducte the notion of a "Good Enough" Subset G of the search space. In traditional optimization, G is a singleton, namely the optimum. In OO, G is usually a multi-element set, say the top-5%. Similarly, we introduce the "Selected" subset S of the search space. Again traditionally, S is a singleton, the estimated optimum, and we require it to coincide with the true optimum. This is akin to hit one speeding bullet with another. In OO, we only require that G intsc S being non-empty which is more like hitting a truck with a shot gun [Ho-1994, Ho-1995].
In particular we are interested in the probability that |G intsc S| > k for a given problem and a given method of selecting S, i.e.,

Alginment Probability = AP = P(|G intsc S| > k) = ?

We often let G be the top-n% of the search space, uniformly sample N points of the search space, and select the top-m% of the observed performances as S. We called such method of selection the Horse Race method. Clearly, the noise/error of the observed performance, sigma², as well as the number of samples N can effect the AP. In the limit of sigma²- > Inf, this method is equivalent to blindly pick out |S| points in the search space. The AP can be and has been calculated for a range of N, sigma², |G|, |S|, and the category of the problem, C. We call these the Universal Alignment Probability Curves in the sense that they can be applied to ANY system optimization problem [Lau-Ho 1995].
Another concept in OO is that of the Performance Density/Distribution Function, PDF, of a problem. Consider the thought experiment of evaluating the performances ALL possible designs in the search space and plot the histogram (density function) of the performances as illustrated

Fig.1 The PDF.

A related notion is that of the Ordered Performance Curve, OPC. This is a plot of the performances ordered in 1st, 2nd, 3rd, ..., etc., which by definition must be monotonically increasing if the best (#1) design is identified with the minimum. If we normalize the best and worst performances between 0 and 1, and turn the OPC sideways, it becomes the Performance Distribution Function, the integral of the Performance Density Function.
Because of the montonic nature of the OPC, only a limited number of shapes are possible. We submit five major categories are sufficient to characterize all problems: mostly bad designs, mostly good designs, either good or bad designs, mostly mediocre designs, and uniformly distrbuted designs. The typical shape of these five categories are illustrated in Fig. 2

Fig.2 The Five Categories of Problems

In practice, of course we cannot observe either the PDF or the OPC of a problem. We only estimate them via a model using simulation or approximate computation. We designate such model as surrogate models. The point here is that these surrogate models can be VERY crude and totally inadequate to estimate the value of the system performances. However, because of their simplicity, we can use them to calculate reasonable estimate of the PDF and OPC via sampling. A typical assumption is that

Performance_observed = Performance_true + noise/error

which is known as the Thurstone model. However, the scale of the observed performance and true performance values may be quite different in some problem, e.g., the probability of a rare event under normal condition vs. probability of the same event under stress condition. In such case, we can postulate that the two set of probabilities are still ordered approximately the same with resepct to parametric variations, i.e.,

Order_observed = Order_true + Perturbation in Order

In either case, given the model, the category of the probelm, N, sigma², |G|, we have calculated the required |S| in order to have AP = 0.95 for different values of k. In other words, we can for ANY problem narrow the search from N to |S| if we wish to guarantee with probability 0.95 that at least k of the top-|G|/N% of the design are in the selected set, S [Lau-Ho 1995].
With these definitions, the following chess playing paradigm becomes meaningful. At some midway point in a chess game, suppose you have move A vs. move B. It is simply impossible for you to determine what are the actual consequences of A or B. (Note we are brainwashed to believe that to solve a decision problem we should evaluate and compare the expected utility of all choices). However, you really only need to decide whether or not A is better than B. You don't have to know how much better. The latter is an ordinal concept while the former a cardinal one. To decide A vs. B we often use a surrogate model of evaluating move A and B a few moves deep and then use the resulting board position to decide A vs. B. This decision of course is not always right. But it is much better than blind choice and converges to the optimal one exponentially fast as the difference between A and B grows (corresponding to the size of our good-enough set G and the selection set S in OO). Many chess program that play at grand master level and capable of beating world champion are design this way. This is a perfect example of OO in action and explains why we can get away with short simulations and crude models that have large statistical errors.
OO is also a quantification of heuristic notions such as 80/20 solution, ballpark estimate, and order-vs-value. In fact, OO can be considered as justification to how we got along in life with not enough information, not enough time, and not enough resources for most problems. The point is that for many problems it is sufficient to know whether or not alternative A is better than B without necessarily knowing what A will turn out to be eventually, e.g., marriage and career choices.

To Next Slide / To Table of Content