The answer is yes, reinforcement (updates) did occur during the last 2500 problems (but the GA didn't). Updates also occur on test problems that may be intermixed with learning problems during the learning phase. In general, we've done updates during testing on multi-step problems because if the animat gets stuck in a loop, updating will make the predictions for the loop actions fall to the point where another action is chosen and the system breaks out of the loop. This reason is absent on single-step problems and in that case updating is usually not done on test.

I have a quick question: In the experiments described in "Toward Optimal Classifier System Performance in Non-Markov Environments," did reinforcement continue during the last 2500 test problems?