Actually, the probability of the GA producing a classifier identical to one of the parents is not so low. For the experiment of Figure 6, the crossover probability is 0.8, the mutation probability is 0.01, and the classifier has 24 condition positions plus 1 action position (the action is coded in decimal). The probability that crossover does not occur and that at least one offspring suffers no mutation is 0.2 x 0.99^25 = 0.156, which is appreciable. If such "clone" classifiers also have high fitness, their number will rise relative to other classifiers, making a macroclassifier representation advantageous from the point of view of efficiency (fewer comparisons required during matching, etc.).

However, as the classifier length goes up, the probability of cloning would appear to fall, as you note. I think there are at least two assumptions that have to be examined here:

(1) Will classifier length go up that much in "real" problems? Suppose you have a robot. Will designers cope with the potential detail of "reality" by indefinitely adding bits to the visual input? Or will they instead use more limited, or heterogeneous, resolution plus some form of active vision in which the robot's view changes in order to gather detail? I am suggesting that the ultimate need for huge classifier lengths is not proven.

(2) It is also not clear that, even if huge lengths are used, the mutation rate should stay the same. Perhaps it should go down, and in a way that is approximately inverse to the length. It is sometimes said that Nature arranges for about one mutation per genotype. If that is a good rule for classifier systems, then the probability of cloning will be independent of length.

I like macroclassifiers because, at least in the present problems, they are effective (i.e., the numerosities of high fitness macros indeed grow quite dramatically). Besides making matching more efficient, the macros allow the researcher to see much more rapidly what is really going on in the population in terms of "winning" classifiers--they are the ones with high numerosities. Finally, the length M of the population, considered as a list of unique macroclassifiers, is a nice measure of the degree of generalization that has occurred, and of the space complexity of the system's model.

The macroclassifier's mechanism seems quite odd in non-toy problems. Even with a genotype length as short as 20, the probability of generating 2 identical rules is nearly null. So what good is it?