A perfectly accurate classifier is one whose prediction is correct in every situation that the classifier matches. Suppose that #01#11:3 is an example of such a classifier. I.e., in the four situations it matches, its prediction exactly equals the payoff received if the system chooses action "3".

Now consider the possible generalizations of this classifier, for instance ##1#11:3, #0##1#:3, etc. Each possible generalization by definition matches in more situations than the first classifier. As a result, I think you can say that the more general (i.e., the more #'s with respect to the first classifier), the less accurate the classifier will tend to be. Conversely, the more specific a classifier, the more accurate it will tend to be with respect to its more generalized "cousins".

Thus increase of specificity tends to parallel increase of accuracy, which in XCS means fitness.

Now consider crossover. It is an operator that recombines parts of classifier conditions. In particular, it will often combine the specified bits of the parent conditions, resulting in an even more specific offspring. Now bring in selection itself. Selection will tend to choose parents that are more accurate, i.e., have more specified bits than other classifiers. Together, selection and crossover will tend, I believe, to rapidly emphasize and generate increasingly specific--and thus increasingly accurate and fit--classifiers.

Thus the GA is in fact highly "genetic" and not random since the search in fact proceeds by selecting fit parents and producing offspring that have a high probability of being more fit. In "schema" terms, the specified bits of less specific classifiers are in effect schemata that are combined to produce the more specific (and usually more fit) schema of the offspring.

But why doesn't the population end up consisting of completely specific classifiers? Answer: because after a certain point, increased specificity does not result in increased accuracy. In fact, the generalization mechanism of XCS will cause the "winning" classifiers to be those that are both accurate and maximally general. But that is the story discussed in Section 4.1.

(By the way, it's not true that the GA is "only used to remove inaccurate rules". You may be referring to the second deletion mechanism of Section 3.3, in which deletion probability can depend inversely on fitness. This is a nuance, and the first mechanism, which does not depend on fitness, works nearly or equally as well.)

*
In your GA with fitness based on accuracy, why should the offspring
be more accurate? The search you are proposing seems more random
than genetic, because the GA is only used to remove inaccurate rules.
*