Optimal Control Law/Decision Rule/Strategy

II. SIMPLE EXPLANATIONS OF KEY IDEAS IN ORDINAL OPTIMIZATION

II.5. The Search for Optimal "Feedback Control Law", "Decision Rule" and "Strategy"

The Holy Grail of control theory, decision analysis, and game theory is the determination of the optimal choice of decision as a function of the available information, i.e., a mapping from the space of information to the space of decisions g(.)'s belong to G: I -> U. Now the space G is uncountably large in all but the simplest toy problems. For example, if we have three input variables and one output variable which we quantize to 10 divisions each, a very rough division, then the total number of possible strategies, i.e., output as a function of the inputs, |G| is given by 10¹⁰⁰⁰, a number larger than the total estimated number of atoms in the universe! Thus, no search procedure can deal with such Combinatorial space no matter how efficient. Getting within say 0.001% of the best strategy in such a space is no comfort at all. This is where knowledge about the problem and representation of the search scheme comes in. Let us explain this via a simple example.

Suppose we are searching for maximum of J(q) for q in the interval [0,1]. We choose to uniformly sample 1,000 points in [0,1] and select the best observed J(q), denoted as J_max. Simple statistics states that the probability of another sample exceeding J_max is given by

since expected value of J_max is equal to 0.999. However, if by some knowledge about the problem we have learned that the maximum must lie in the region [0,0.5], then performing the same 1,000 samples in this reduced interval will change the probability to

If the PDF of J is uniform then a 50% reduction of the search space produces a 100% improvement in the chance of finding the maximum. Of course, there are many ways to reduce the search region using knowledge other than simple partitioning. Some ways are more efficient than others. Properties such as monotonicity, continuity, linearity, etc., can produce even more significant reductions [Deng-Ho 1995].

Each time we sample from a search space, we of course gain information about the distribution of the J(q)'s, i.e., we "learn". This learned knowledge should be used to direct additional searches leading to an iterative approach to sampling in the tradition of all successive improvement methods such as gradient schemes and TABU search. The major difference here is instead of iterate along a path or a sequence of points, the iteration is done along a sequence of sampling representations. For each representation we sample repeatedly to learn the global feature of the distribution before changing the search representation. Ordinarily, this can be very expensive leading to a huge number of required samples. It is here that Ordinal Optimization makes a contribution rendering the evaluation of J(q) much less costly by using very approximate J(q)'s. If one representation yields a better looking observed PDF, then we should concentrate and refine our search in that representation. This is based on a gneralization of the following simple fact. If two numbers A < B are observed under i.i.d. noise, the the observed PDF of A will dominate that of B as illustrated below:

Now if instead A and B are random variables with distributions of their own, we can still use the observed PDFs to guide our selection of promising representations.

Using this approach of iterating along a sequence of possible representations, we were able to

find the optimal strategy for a linear-quadratic optimal control problem;
find a solution to the famous Witsenhausen problem that is 50% better than anything found so far [Deng-Ho 1995]

To Table of Content