The answer is yes, reinforcement (updates) did occur during the
last 2500 problems (but the GA didn't). Updates also occur on
test problems that may be intermixed with learning problems
during the learning phase. In general, we've done updates
during testing on multi-step problems because if the animat
gets stuck in a loop, updating will make the predictions for
the loop actions fall to the point where another action is
chosen and the system breaks out of the loop. This reason
is absent on single-step problems and in that case updating
is usually not done on test.
I have a quick question: In the experiments described in "Toward Optimal
Classifier System Performance in Non-Markov Environments," did
reinforcement continue during the last 2500 test problems?