This doesn't seem to be a problem. Watching a run, you see the different actions come in; e.g., the "vector diagram" of Figure 7 gradually fills in with something in each direction.

One mechanism is surely just mutation: if a particular action is missing from a match set, mutation will eventually provide it. But another source may be the fact that, due to don't cares, classifiers get generated by the GA that fall into other match sets besides the match set they were created in. As a side effect, their action will be present in the other matchsets, so that it is available to the GA there, too. I'm suggesting that the mere fact of generalization results in a "diffusion" of action possibilities to other match sets, just as it causes diffusion of condition schemata. Thus there doesn't seem to be a need for the system to incorporate a mechanism that "makes sure" every action is represented.

The empirical evidence suggests that XCS does indeed learn complete maps of its payoff landscape. What isn't clear is how XCS makes sure that each action has a representative in each match set. Are there mechanisms that guarantee this?