Another possibility for a decaying adaptation rate has been proposed by Ritter et al. (1991) in the context of self-organizing maps. They propose an exponential decay according to
In figure 4.7 this kind of learning rate is compared to
the harmonic series for a specific choice of parameters. In particular
at the beginning of the simulation the exponentially decaying learning
rate is considerably larger than that dictated by the harmonic series.
This can be interpreted as introducing noise to the system which is
then gradually removed and, therefore, suggests a relationship to
simulated annealing techniques (Kirkpatrick et al., 1983). Simulated
annealing gives a system the ability to escape from poor local minima
to which it might have been initialized. Preliminary experiments
comparing k-means and hard competitive learning with a learning rate
according to (4.18) indicate that the latter method is less
susceptible to poor initialization and for many data distributions
gives lower mean square error. Also small constant learning rates
usually give better results than k-means. Only in the special case that only one
reference vector exists () it is completely impossible to
beat k-means on average, since in this case it realizes the optimal
estimator (the mean of all samples occurred so far). These
observations are in complete agreement with Darken and Moody (1990) who
investigated k-means and a number of different learning rate schedules
like constant learning rates and a learning rate which is the square
root of the rate used by k-means (
). Their
results indicate that if k is larger than 1, then k-means is inferior to
the other learning rate schedules. In the examples they give the
difference in distortion error is up to two orders of magnitude.
Figure 4.7: Comparison of the exponentially decaying learning function and the harmonic series g(t) = 1/t for a particular set of parameters (
). The displayed difference between the two learning rates can be interpreted as noise which in the case of an exponentially decaying learning rate is introduced to the system and then gradually removed.
Figure 4.8 shows some stages of a simulation for a simple ring-shaped data distribution. Figure 4.9 displays the final results after 40000 adaptation steps for three other distribution.
The parameters used in both examples were: and
.
Figure 4.8: Hard competitive learning simulation sequence for a ring-shaped uniform probability distribution. An exponentially decaying learning rate was used. a) Initial state. b-f) Intermediate states. g) Final state. h) Voronoi tessellation corresponding to the final state.
Figure: Hard competitive learning simulation results after 40000 input signals for three different probability distributions (described in the caption of figure 4.4). An exponentially decaying learning rate was used.