That's a very good idea, in principle. It's a little tricky to implement, since a value of "UNKNOWN" is not a number. Thus, the first time the classifier matches, something will have to detect this fact, and not include its "prediction" in the calculation of the system's prediction. However, that can certainly be done. And then, as you say, the prediction can be set to the first experienced value.

Actually, XCS operates almost this way, as you will see in Section 3.2, using Venturini's "MAM" technique. The only difference is that XCS initializes the prediction with a number, not a warning like "UNKNOWN". In practice, the effect of having an arbitrary number in there during the first match appears to be negligible.

Why do you need an initial prediction estimate? Would it not be more expedient to say "UNKNOWN"? When the classifier system first experiences a value then it is known and the initial prediction is set to the experienced value. In this way the classifier's prediction would move to its 'true' value quicker and would be independent upon arbitrary initial values.