Your questions about the updating order make sense. I have fussed around a lot with that. On a given problem, a particular order seems to turn out best, and on another problem, another order is maybe better. The differences are quite small, however.
At present, I seem to have settled on the order:
epred (using MAM) - pred (MAM) - fit (non-MAM, i.e. regular delta rule).
("epred" is prediction error, "pred" is prediction, "fit" is fitness):
This is a conservative order. You see that, under MAM, the initial value of epred doesn't matter--it is replaced on the first update based on the difference between the first payoff and the initial value of the payoff--and the replacement is probably large. Then the initial pred is replaced by the first payoff. Then the fitness is updated based on that first sample of epred, and that update is gradual, because non-MAM (i.e. MAM with a waiting period of 0) is used.
Sorry about saying in the ECJ paper that fitness was updated first. I didn't stay with that very long. Another alternative is the order: pred, epred, fit--with this you can get a bit faster results on easy problems. I think this is because it is more aggressive. The first pred update produces an immediate zero error (because pred is set to the first payoff, so epred, updated next, is zero), and so fitness is high. If a new offspring is actually fairly good, its effects propagate faster. Offspring will tend to be good in easier problems. However, in hard problems, the value of an offspring takes *many* matches to evaluate, so updating appears to need to be conservative.
I have not used any regimes in which updating is delayed, as you suggest.
Using the MAM technique, the first few estimates are an average of the samples generated so far. However, given the order that parameters are updated in (fitness, prediction error and then prediction) and the way they depend on each other (fitness on prediction error, prediction error on prediction), samples are not available for the first update of prediction or the first two updates of fitness. Only the default values of the parameters they depend on are available. This implies that averaging of the values for prediction and fitness must be delayed until actual samples have been generated for the parameters they depend on. I have tried running XCS with and without this sort of delay and overall it works better with it. Is this the correct way to implement the system? If so, why are fitness, prediction error and prediction updated in that order? Would it not be easier to update them in the reverse order? That way you would not need the delay.