Home Computing Reservoir-computing based associative memory and itinerancy for complex dynamical attractors

Reservoir-computing based associative memory and itinerancy for complex dynamical attractors

Index-based reservoir-computing memory

In the “location-addressable” or “parameter-addressable” scenario, the stored memory states within the neural network are activated by a specific location address or an index parameter. The stimulus that triggers the system to switch states can be entirely unrelated to the content of the activated state, and the pair linking the memory states and stimuli can be arbitrarily defined. For instance, the stimulus can be an environmental condition such as the temperature, while the corresponding neural network state could represent specific behavioral patterns of an animal. An itinerancy among different states is thus possible, given a fluctuating environmental factor. (The “parameter-addressable” scenario is different from the “content-addressable” scenario that requires some correlated stimulus to activate a memory state.)

Storage of complex dynamical attractors

The architecture of our index-based reservoir computing memory consists of a standard reservoir computer (i.e., echo state network)11,12,13 but with one feature specifically designed for location-addressable memory: an index channel, as shown in Fig. 1(A). During training, the time series from a number of dynamical attractors to be memorized are presented as the input signal u(t) to the reservoir machine, each is associated with a specific value of the index p so that the attractors can be recalled after training using the same index value. This index value p modulates the dynamics of the RC network through an index input matrix Windex that projects p to the entire hidden layer with weights defined by the matrix entries. The training process is an open-loop operation with input u(t) and the corresponding index value p through the index channel. To store or memorize a different attractor, its dynamical trajectory becomes the input, together with the index value for this attractor. After training, the reservoir output is connected with its input, generating closed-loop operation so that the reservoir machine is now capable of executing self-dynamical evolution starting from a random initial condition, “controlled” by the specific index value. More specifically, to retrieve a desired attractor, we input its index value through the index channel, and the reservoir machine will generate a dynamical trajectory faithfully representing the attractor, as schematically illustrated in Fig. 1(A) with two examples: one periodic and another chaotic attractor. The mathematical details of the training and retrieval processes are given in Supplementary Information (SI).

Fig. 1: Memorizing attractors with index-based reservoir computing.
figure 1

A A schematic illustration of index-based reservoir-computing memory. B Retrieval of six memorized chaotic and periodic attractors using a scalar index pi (one integer value for each memorized dynamical attractor), where the blue and red trajectories in the top and bottom rows, respectively, represent the original and retrieved attractors. Visually, there is little difference between the original and memorized/retrieved attractors. C Memorizing 16 chaotic attractors with two indices. The successfully retrieved attractors are presented in SI. All the memorized attractors are dynamically stable states in the closed-loop reservoir dynamical system.

To demonstrate the workings of our reservoir computing memory, we generate a number of distinct attractors from three-dimensional nonlinear dynamical systems. We first conduct an experiment of memorizing and retrieving six attractors: two periodic and four chaotic attractors, as represented by the blue trajectories in the top row of Fig. 1(B) (the ground truth). The index values for these six attractors are simply chosen to be pi = i for i = 1, …, 6. During training, the time series of each attractor is injected into the input layer, and the reservoir network receives the specific index value, labeling this attractor to be memorized so that the reservoir-computing memory learns the association of the index value with the attractor to be memorized. To retrieve a specific attractor, we set the index value p to the one that labeled this attractor during training and allow the reservoir machine to execute a closed-loop operation to generate the desired attractor. For the six distinct dynamical attractors stored, the respective retrieved attractors are shown in the second row in Fig. 1(B). The recalled attractors closely resemble the original attractors and for any specific value of the index, the reservoir-computing memory is capable of generating the dynamical trajectory for an arbitrarily long period of time. The fidelity of the recalled states, in the long run, can be assessed through the maximum Lyapunov exponent and the correlation dimension. In particular, we train an ensemble of 100 different memory RCs, recall each of the memory states, and generate outputs continuously for 200,000 steps. The two measures are calculated during this generation process and compared with the ground truth. As an example, for the Lorenz attractor with the true correlation dimension of 2.05 ± 0.01 and largest Lyapunov exponent of about 0.906, 96% and 91% of the retrieved attractors have their correlation dimension values within 2.05 ± 0.02 and exponent values in 0.906 ± 0.015, indicating that the memory RC has learned the long-term “climate” of the attractors and is able to reproduce them (see SI for more results).

For a small number of dynamical patterns (attractors), a scalar index channel suffices for accurate retrieval, as illustrated in Fig. 1(B). When the number of attractors to be memorized becomes large, it is useful to increase the dimension of the index channel. To illustrate this, we generate 16 distinct chaotic attractors from three-dimensional dynamical systems whose velocity fields consist of different combinations of power-series terms53, as shown in Fig. 1(C). For each attractor, we use a two-dimensional index vector \({p}_{i}=({p}_{i}^{1},\, {p}_{i}^{2})\) to label it, where \({p}_{i}^{1},\, {p}_{i}^{2}\in \{1,2,3,4\}\). The attractors can be successfully memorized and faithfully retrieved by our index-based reservoir computer, with detailed results presented in SI.

Given a number of dynamical patterns to be memorized, there can be many different ways of assigning the indices. In addition to using one and two indices to distinguish the patterns, as shown in Fig. 1(B) and (C), we studied two alternative index-assignment schemes: binary and one-hot code. For the binary assignment scheme, we store K dynamical patterns using \(\ulcorner {\log }_{2}K \urcorner\) number of channels. The index value in each channel can either be one of the two possible values. For the one-hot assignment scheme, the number of index channels is the same as the number K of attractors to be memorized. The index value in each channel is still one of the two possible values, for instance, either 0 or 1. For the one-hot assignment, each attractor is associated with one channel, where only this channel can take the number one for the attractor, while the values associated with all other channels are zero. Similarly, for each channel, the index value can be one if and only if the attractor being trained/recalled is the state with which it is associated. Arranging all the index values as a matrix, one-hot assignment leads to an identity matrix. For a given reservoir hidden-layer network, different ways of assigning indices result in different index-input schemes to the reservoir machine and can thus affect the memory performance and capacities. The effects are negligible if the number of dynamical attractors to be memorized is small, e.g., a dozen or fewer. However, the effects can be pronounced if the number K of patterns becomes larger. One way to determine the optimal assignment rule is to examine, under a given rule, how the memory capacity depends on or scales with the size of the reservoir neural network in the hidden layer, which we will discuss later.

Transition matrices among stored attractors

We study the dynamics of memory toggling among the stored attractors. As each memorized state si is trained to be associated with a unique index value pi, the reservoir network’s dynamical state among different attractors can be switched by simply altering the input index value. Several examples are shown in Fig. 2(A) and (B), where p is switched among various values multiple times. The dynamical state activated in the reservoir network switches among the learned attractors accordingly. For instance, at step 800, for a switch from p = 1 to p = 6, the reservoir state switches from a periodic Lissajous state to a chaotic Hindmarsh-Rose (HR) neuron state. However, not all such transitions can be successful – a failed example is illustrated in the bottom panel of Fig. 2(C). In this case, after the index switching, the reservoir system falls into an undesired state that does not belong to any of the trained states. The probability of failed switchings is small and asymmetric between the two states before and after the switching. A weighted directed network can be defined among all the memorized states where the weights are the success probabilities, which can be represented by a transition matrix. Making index switches at many random time steps and counting the successful switches versus the failed switches, we numerically obtain the transition matrix, as exemplified in Fig. 2(D), from two different randomly generated indexed RCs. Figure 2(E) shows the average transition matrix of an ensemble of 25 indexed RCs. Figure 2(F) shows that the variance of the success rate across different columns (i.e., different destination states) is much larger than the variance across different rows (i.e., different starting states), implying that the success rate is significantly more dependent on the destination attractor than on the starting attractor.

Fig. 2: Transitions among different memorized attractors in an indexed memory RC.
figure 2

A, B By changing the index value p as in panel (A), one can toggle among different states shown in panel (B) that have been memorized as attractors in the indexed memory RC (Dimensions vx and vz are not shown). C An instance of a failed switch, where the memory RC evolves to an untrained state after changing p. D Two typical transition matrices where six different states [as the ones in Fig. 1(B)] were memorized. E The average of transition matrices from 25 memory RCs with different random seeds and the same settings as in panel (D). F Variances in the success rate with respect to the starting and destination attractors, revealing a stronger dependence on the latter. The result is from the ensemble of 25 RC networks, each trained with ten different chaotic attractors. G A deliberately chosen memory RC with 16 memory states has relatively low success rates among many states for visualizing the basin structures shown below. H Attracting regions of two different destination states on the manifold of a starting state. The memory RC originally operates on this shell-shaped starting attractor, and the index p is toggled at different time steps when the memory RC is at different positions on this attractor. Such a position can determine whether the switching is successful, where the successful positions are marked blue and failed positions are marked lighter orange. I Local two-dimensional slices of the basin structures of the two destination states from panel (H) in the high-dimensional RC hidden space. Again, the blue and lighter orange regions mark the basins of the memorized state and of some untrained states leading to failed switches, respectively. The quantities ϵ1 and ϵ2 define the scales of perturbations to the RC hidden space in two randomly chosen perpendicular directions. The ϵ1 = ϵ2 = 0 origin points are the RC hidden states taken from random time steps while the memory RC operates at the corresponding destination states.

What is the origin of the asymmetric dependence in the transition matrices and the dynamical mechanism behind the switch failures? Two observations are helpful. The first concerns the dynamic consequence of a switch in the index value p. Incorporating this term of p into the reservoir computer is equivalent to adding an adjustable bias term to each neuron in the reservoir hidden layer. Different values of p thus directly result in different bias values on the neurons and different dynamics. The same indexed RC under different p can be treated as different dynamical systems. Second, note that, during a switch, one does not directly interfere with the state input u or the hidden state r of the RC network, but only changes the value of p. Consequently, a switch in p as in Fig. 2(A) switches the dynamical equations of the RC network while keeping the RC states rlast and the output vlast at the last time step, which is passed from the previous dynamical system to the new system after switching. This pair of rlast and vlast becomes the initial hidden state and initial input under the new dynamic equation of RC.

The two observations suggest examining the basin of attraction of the trained state in the memory RC hidden space under the corresponding index value, which typically does not fully occupy the entire phase space. Figure 2(I) shows the basin structure of two arbitrary states in an indexed RC trained with 16 attractors. The blue regions, leading to the trained states, leave some space for the orange regions that lead to untrained states and failed to switch. If the RC states at the last time step before switching lay outside the blue region, the RC network will evolve to an unwanted state, and a switch failure will occur. We further plot the points from the attractor before switching in blue (successful) and orange (unsuccessful), as shown in Fig. 2(H). They are the projections of the basin structure of the new attractors onto the previous attractor. This picture provides an explanation for the strong dependence of success rates on the destination states. In particular, the success rate is determined by two factors: the relative size of the basin of the new state after switching and the degree of overlap between the previous attractor and the new basin. While the degree of overlap depends on both the starting and destination states, the relative size of the new state basin is solely determined by the destination state, resulting a strong dependency on the switching success rate of the destination state. [Further illustrations of (i) the basin structure of the memorized states in an indexed RC, (ii) projections of the basin structure of the destination state back to the starting state attractor, and (iii) how switch success is strongly affected by the basin structure of the destination state are provided in SI.]

Control strategies for achieving high memory transition success rates

For a memory system to be reliable, accessible, and practically useful, a high success rate of switching from one memorized attractor to another upon recalling is essential. As this switching failure issue is rooted in the problem of whether an initial condition is inside the attracting basin of the destination state, such type of failure can be universal for models for memorizing not exactly the (static) training data but the dynamical rules of the target states and re-generating the target states by evolving the memorized dynamical rules. Even if one uses a series of networks to memorize the target states separately, the problem of properly initializing each memory network during a switch or a recall persists. A possible solution to this initial state problem is to optimize the training scheme or the RNN architecture to make the memorized state globally attractive in the entire hidden space – an unsolved problem at present. Here we focus on noninvasive control strategies to enhance the recall or transition success rates without altering the training technique or the RC architecture, assuming a memory RC is already trained and fixed. The established connections or weights will not be modified, ensuring that the memory attractors already in place are preserved.

The first strategy, named “tactical detour,” utilizes some successful pathways in the transition matrix to build indirect detours to enhance the overall transition success rates. Instead of switching from an initial attractor to a destination attractor directly, going through a few other intermediate states can result in a high success rate, as exemplified in Fig. 3(A). This method can be relatively simple to implement in many scenarios, as all needed is switching the p value a few more times. However, this strategy has two limitations. First, it is necessary to know some information about the transition matrix to search for an appropriate detour and estimate the success rate. Second, the dynamical mechanism behind failed switching suggests that the success rate mostly depends on the relative size of the attracting basin of the destination state, implying that a state that has a small attracting basin is difficult to reach from most other states. As a result, the improvement of the success rate from a detour can fall below 100%. As exemplified in Fig. 3(B), the success rate with detours, which can significantly increase the overall success rate within just a few steps, can saturate at some levels lower than one.

Fig. 3: Control strategies for achieving high memory transition success rates.
figure 3

A Achieving higher switching success rates using detours, where the switching from attractor 2 to attractor 6 via a detour through attractor 4 significantly increases the success rate. The right panel illustrates the improved transition matrix with detours as compared to the original one on the left. B Average success rate with different maximum detour lengths. A minimal detour is typically needed – just one single step of detour can reduce more than two-thirds of the failing probability, obtained from the same ensemble of memory RCs as in Fig. 2(F). C An instance of feedback control to correct a failed switch, following the scheme in panel (D). The switch at the 400th time step leads to an untrained static state. Since the classifier RC cannot recognize the memory RC output as the desired state, a short random signal is applied to perturb the memory RC, which adjusts the state of the memory RC to achieve a successful retrieval. D Illustration of the controlled memory retrieval by a classifier RC and random signals, where a classifier RC is used to identify the output from the memory RC. The outcome of this identification decides whether to apply a random perturbation or to use the memory RC output as the final result. E The rates of successfully corrected failed switches by the control mechanism illustrated in (C) and (D). The denominator of this correction success rate is the number of failed switches, not all switches. Each perturbation is implemented by ten steps of standard Gaussian noise. Just one “perturb and classify” cycle can correct about 80% of the failed switches. The results of this approach with different parameter choices are shown in SI. F Performance of the cue-based method to correct failed switches with respect to cue length. Within 25 steps of the cue, the correction success rate reaches 99%. Panels (E) and (F) are generated from the same ensemble of 25 memory RCs, each trained to memorize the six attractors as in Fig. 1.

Our second strategy is of the feedback control type with a classifier reservoir computer and random signals to perturb the memory reservoir computer until the latter reaches the desired memory state, as illustrated and explained in Fig. 3(C, D). This trial-and-error approach is conceptually similar to the random memory search model in psychology54. A working example is shown in Fig. 3(C), where the input index p first activates certain outputs from the memory RC, which are fed to a classifier reservoir computer trained to distinguish among the target states and between the target states and non-target states. As detailed in SI, a simple training of this classifier RC can result in a remarkably high accuracy. The output of the classifier RC is compared with the index value. If they agree, the output signal represents the correct attractor; otherwise, a random signal is injected into the memory RC to activate a different outcome. The feedback loop continues to function until the correct attractor is reached. This control strategy can lead to a nearly perfect switch success rate, at the cost of a possibly long switching time. Let Tn be the time duration that the random perturbation lasts, Tc be the time for making a decision about the output of the classifier, and ηi,j be the switching success rate from a starting attractor si to a destination attractor sj. The estimated time for such a feedback control system to reach the desired state is Treach,i,j = (Tn + Tc)/ηi,j, which defines an asymmetric distance between each pair of memorized attractors.

To provide a quantitative estimate of the performance of this strategy, we run 90,000 switching in 25 different memory RCs trained with the six periodic and chaotic states as illustrated in Fig. 1(B). The size of the reservoir network is 1000, and other hyperparameter values are listed in SI. Among the 90,000 switchings, 8110 trials (about 9%) failed, which we used as a pool to test the control strategies. To make the results generic, we test 12 different control settings with different parameters. We test four different lengths of each noisy perturbation period, including 1 step, 3 steps, 10 steps, and 30 steps, with Gaussian white noise at three different levels (standard deviations: σ = 0.3, σ = 1, and σ = 3). We find that with an appropriate choice of the control parameters, 10 periods of 10 steps of noise perturbations (so 100 steps of perturbations in total) can eliminate 99% of the failed switchings, as illustrated in Fig. 3(E). The full results of the 12 different control settings are demonstrated and discussed in SI.

Our third control strategy is also motivated by daily experience: recalling an object or an event can be facilitated by some relevant, reminding cue signals. We articulate using a cue signal to “warm up” the memory reservoir computer after switching the index p to that associated with the desired attractor to be recalled. The advantage is that the memory access transition matrix is no longer needed, but the success depends on whether the cue is relevant and strong enough. The cue signals are thus state-dependent: different attractors require different cues, so an additional memory device may be needed to restore the warming signals containing less information than the attractors to be recalled. This leads to a hierarchical structure of memory retrieval: (a) a scalar or vector index p, (b) a short warming signal, and (c) the whole memorized attractor, similar to the workings of human memory suggested in ref. 55. We also examine if state-independent cues (uniform cues for all the memory states) can be helpful, and find that in many cases, a simple globally uniform cue can make most memorized attractors almost randomly accessible, but there is also a probability that this cue can almost entirely block a few memory states.

With this third control strategy with the cues, we again employ the 8,110 failed switching pool to test the strategy performance. As shown in Fig. 3(F), with the increase of the cue length, the success rate increases nearly exponentially. Almost 90% of the failed switchings can be eliminated with just 8 steps of cue, and 24 steps of cue can eliminate more than 99% of the failed switchings.

Scaling law for memory capacity

Intuitively, a larger RNN in the hidden layer would enable more dynamical attractors to be memorized. Quantitatively, such a relation can be characterized by a scaling law between the number K of attractors and the network size N. Our numerical method to uncover the scaling law is as follows. For a fixed pair of (K, N) values, we train an ensemble of reservoir networks, each of size N, using the K dynamical attractors. The fraction of the networks with validation errors less than an empirical threshold gives the success rate of memorizing the patterns, which in general increases with N. A critical network size Nc can be defined when the success rate is about 50%. (Here, we use 50% as a threshold to minimize the potential error in Nc due to random fluctuations. In SI, we provide results with 80% as the success rate threshold. The conclusions remain the same under this change.)

We use three different types of validation performance measures. The first is the prediction horizon defined by the prediction length before the deviation of the prediction time series reaches 10% of the oscillation amplitude of the real state (half of the distance of the maximum value minus the minimum value in a sufficiently long time series of that target state). The oscillation amplitude and the corresponding deviation rate are calculated for each dimension of the target state, and the shortest prediction horizon is taken among all the dimensions of the target state. The prediction horizon is rescaled by the length of the period of the target system to facilitate comparison. For a chaotic state, we use the average period defined as the average shortest time between two local maximums in a sufficiently long time series of that target state. The threshold of this prediction horizon for a successful recall is set to be two periods of each memory state. The second performance measure is the mean square root error (RMSE) with a validation time of four periods (or average periods). The third measure is defined by the time before the predicted trajectories go beyond a rectangular-shaped region surrounding the real memory state to an undesired region in the phase space, which is set to be 10% larger than the rectangular region defined by the maximum and minimum values of the memory state in each dimension. For these three measures, the thresholds for a successful recall can be different across datasets but are always fixed within the same task (the same scaling curve). The thresholds for the prediction-horizon-based measures and region-based measures are several periods of oscillation (or average periods for chaotic states) of the target state.

Figure 4(A) shows the resulting scaling law for the various tasks, with different datasets of states, different training approaches, and different coding schemes for the indexed RCs. The first dataset (dataset #1) consists of many thousands of different periodic trajectories. With index-based RCs, we use one-hot coding, binary coding, and two-dimensional coding with this dataset, as shown in Fig. 4(A). They all lead to similar algebraic scaling laws NcKγ that are close to a linear law: γ = 1.08 ± 0.01 for both the one-hot coding and binary coding, and γ = 1.17 ± 0.02 for the 2D coding. In all the three cases, Nc all grow slightly faster than a linear law. A zoom-in picture comparing the three coding schemes is shown in Fig. 4(B). It can be seen that the one-hot coding is more efficient than the other two. Figure 4(B) also demonstrates how the data points in the scaling laws in Fig. 4(A) are gathered. Similar success rates versus N curves are plotted for each K value for each task and the data point from that curve which is closest to a 50% success rate is taken to be Nc.

Fig. 4: Scaling behaviors of the memory capacity.
figure 4

A Algebraic scaling relation between the number K of dynamical patterns to be memorized and the required size Nc of the RC network for various tasks and different memory coding schemes. Details of each curve/task are discussed in the main text. The “separate Wout” and “bifurcation” tasks are shown with prediction-horizon-based measures, while all the other curves are plotted with the region-based measure. B Examples of how the success rate increases with respect to N and comparisons among different coding schemes, where the success rate of accurate memory recalling (using the region-based performance measure) of three coding schemes on the dataset #1 with K = 64 is shown. All three curves have a sigmoid-like shape between zero and one, where the data points closest to the 50% success rate threshold correspond to Nc. All data points in the scaling plots are generated from this type of curve (success rate versus N) to extract Nc. The one-hot coding is the most efficient coding of the three we tested for this task. C Comparisons among different recalling performance measures. The three different measures on the dataset #1 with a one-hot coding have indistinguishable scaling behavior with only small differences in a constant factor.

Why is the one-hot coding more efficient than the other coding scheme we test? In our index-based approach to memorizing dynamical attractors, an index value pi is assigned for each attractor si and acts as a constant input through the index channel to the reservoir neural network, which effectively modulates the biases to the network. Since the index input pi is multiplied by a random matrix before entering the network, each neuron receives a different random level of bias modulation, affecting its dynamical behavior through the nonlinear activation function. In particular, the hyperbolic tangent activation function has a number of distinct regions, including an approximately linear region when the input is small, and there are two nearly constant saturated regions for large inputs. With one specific pi value injected into the index channel, different neurons in the hidden layer can be distributed to/across different functional regions (detailed demonstration in SI). When the index value is switched to another one, the functional regions for each neuron are redistributed, leading to a different dynamical behavior of the reservoir computer. Among the three coding schemes, the one-hot coding scheme utilizes one column of the random index input matrix, so the bias distributions for different attractors are not related to each other, yielding a minimal correlation among the distributions of the functional regions of the neurons for different values of pi. For the other two coding schemes, there is a certain degree of overlap among the input matrix entries for different attractors, leading to additional correlations in the bias distributions, thereby reducing the memory capacity.

To make the stored states more realistic, we apply a dataset processed from the ALOI videos56. This dataset contains short videos of rotating small everyday objects, such as a toy duck or a pineapple. In each video, a single object rotates horizontally through a full 360-degree turn, returning to its initial angle. Repeating the video generates a periodic dynamical system. To reduce the computational loads due to the high spatial dimensionality of the video frames, we perform dimension reduction to the original videos through the principal component analysis, noting that most pixels are black backgrounds and many pixels of the object are highly correlated. We take the first two principal components of each video as the target states. The results are shown in Fig. 4(A) as the “ALOI videos” task. Similar to the previous three tasks, we obtain an algebraic scaling law NcKγ that is slightly faster than a linear law with γ = 1.39 ± 0.01.

The “bifurcation” task is an example to show the potential of our index-based approach, where Nc  grows much slower than a linear scale with respect to K and can even decrease in certain cases. It consists of 100 dynamical states gathered from the same chaotic food chain system but with different parameter values. We sweep an interval in the bifurcation diagram of the system with both periodic states of different periods and chaotic states with different average periods. For index coding, we use a one-dimensional index and assign the states by sorting the system parameter values used to generate these states. The index values are in the interval [− 2.5, 2.5], and are evenly separated on this interval. As shown in Fig. 4(A), the scaling behavior of this “bifurcation” task is much slower than a linear law, suggesting that when the target states are correlated, and the index values are assigned in a “meaningful” way, the reservoir computer can utilize the similarities among target states for more efficient learning and storing. Moreover, we notice that Nc can even decrease in a certain interval of K, as the amount of total training data increases with larger K, making it possible for the reservoir computer to reach a similar performance with a smaller value of Nc. In other tasks with other training sets, training data from different target states are independent of each other. In summary, training with the index-based approach on this dataset is more resource-efficient than having a series of separate reservoir computers trained with each target state and adding another selection mechanism to switch the reservoir computer while operating switching or itinerary behaviors.

One of the most important features of reservoir computing is that the input layer and the recurrent hidden layer are generated randomly and, thus, are essentially independent of the target state. One way to utilize this convenient feature is to use the same input and hidden layer for all the target states but with a separate output layer trained for each target state. In such an approach, an additional mechanism of selecting the correct output layer (i.e., Wout) during retrieval is necessary, unlike the other approaches studied with the same integrated reservoir computer for all the different target states. However, training with separate Wout can lower the computational complexity in some scenarios, as shown by the “separate Wout” task in Fig. 4(A), where Nc increases as a power law with K that is much slower than a linear law. However, training with separate Wout still makes the overall model complexity grow slower than linear, as the same number of independent Wout is needed as the number of K. For a scenario such as in the “bifurcation” task, the vanilla index-based approach is more efficient. As the input layer and hidden layer are shared by all the different target states, the issue of failed switching still exists, so control is required.

We further demonstrate the efficiency of our index-based approach compared to the index-free approach. In the “index-free” task shown in Fig. 4(A), dataset #1 was used. While the index-based approach has an almost linear scaling, the scaling of the index-free approach grows much faster than a power law. For instance, the critical network size Nc with K = 20 is about Nc = 2.4 × 104, which is almost as large as the Nc for K = 256 with the index-based memory RC on the same dataset. Moreover, the critical network size Nc for K = 32 is larger than 105. The comparison of the two approaches on the same task reveals that, while the index-free memory RC has the advantage of having content-addressable memory, it is computationally costly compared with the index-based approach, due to the severe overlapping among different target states for large K values, thereby requiring a large memory RC to distinguish different states.

To test the genericity of the scaling law, we compare the results from the same task (the “one-hot coding” task) with the three different measures, as shown in Fig. 4(C). All three curves have approximately the same scaling behavior, except a minor difference in a constant factor.

Index-free reservoir-computing based memory – advantage of multistability

The RC neural network is a high-dimensional system with rich dynamical behaviors49,50,51,52,57,58, making it possible to exploit multistability for index-free memory. In general, in the high-dimensional phase space, various coexisting basins of attraction can be associated with different attractors or dynamical patterns to be memorized. As we will show, this coexisting pattern can be achieved with a similar training (storing) process to that of index-based memory but without an index. The stored attractors can then be recalled using the content-addressable approach with an appropriate cue related to the target attractor. To retrieve a stored attractor, one can drive the networked dynamical system into the attracting basin of the desired memorized attractor using the cues, mimicking a spontaneous dynamical response characteristic of generalized synchronization between the driving cue and the responding RC50,59,60.

While the possibility of index-free memory RC was pointed out previously50,51,52, a quantitative analysis of the mechanism of the retrieval was lacking. Such an analysis should include an investigation into how the cue signals affect the success rate, what the basin structure of the reservoir computer is, and a dynamical understanding connecting these two. A difficulty is how to efficiently and accurately determine, from a short segment of RC output time series, which memory state is recalled and if any target memory has been recalled at all. For this purpose, short-term performance measures such as the RMSE and prediction horizon are not appropriate, as the ground truth of a specific trajectory of the corresponding attractor is not available, especially for the chaotic states. Moreover, a measure based on calculating the maximum Lyapunov exponents or the correlation dimensions may not be suitable either, due to the requirement of long time series. In realistic scenarios, persisting the memory accurately for dozens of periods can be sufficient, without the requirement for long-term accuracy.

Our solution is to train another RC network (classifier RC) to perform the short-term classification task to determine which memorized states have been recalled or, if no successful recall has been made at all, with high classification accuracy. We have tested its performance on more than a thousand trials and found only 6 trials out of 1,200 trials of deviation from a human labeler (details in SI), which are intrinsically ambiguous to classify. The classifier RC is also used in the control scheme for index-based memory RC, as shown in Fig. 3(D). The classifier RC provides a tool to understand the retrieval process, e.g., the type of driving signals that can be used to recall a desired memorized attractor. The basic idea is that the cues serve as a kind of reminder of the attractor to be recalled, so a short period of the time series from the attractor serves the purpose. Other cue signals are also possible, such as those based on partial information of the desired attractor to recall the whole or using noisy signals for warming.

We use the six different attractors in Fig. 1(B) as the target states, all located in a three-dimensional phase space. They do not live in distinct regions but have significant overlaps. However, the corresponding hidden states of the reservoir neural network, because of its much higher-dimensional phase space, can possibly live in distinct regions. Using short-term time series of the target attractor as a cue or warming signal to recall it, we can achieve a near 100% success rate of attractor recalling when the length of the warming signal exceeds some critical value, as shown in Fig. 5(A). Before injecting a warming signal, the initial state of the neural network is set to be random, so the success rate should be about 1/6 without the cue. As a cue is applied and its length increases, a relatively abrupt transition to near 100% success rate occurs for all six stored attractors. The remarkable success rate of retrieval suggests that all the six attractors can be trained to successfully coexist in this one high-dimensional dynamical system (i.e., the memory RC) with each of its own separate basin of attraction in the phase space of the neural network hidden state r. The finding that various attractors, despite their very different properties—ranging from being periodic or chaotic, with varying periods or maximal Lyapunov exponents, among others – all share nearly identical thresholds for warming length, also suggests that this threshold is more characteristic of the reservoir computer itself rather than the target states.

Fig. 5: Index-free memory recalling.
figure 5

A1A6 Success rate of memory retrieval of the six attractors in Fig. 1(B) versus the length of the warming cue. In the 3D phase space of the original dynamical systems, these attractors are located in approximately the same region and are overlapping. B A 2D projection of a typical basin structure of the reservoir dynamical network, where different colors represent the initial conditions leading to different memorized attractors. The central regions have relatively large contiguous areas of uniform color, while the basin structure is fragmented in the surrounding regions. C Success rate for the memory reservoir network to keep the desired attractor versus the magnitude of random perturbation. The two plateau regions with ~100% and 1/6 success rates correspond to the two typical basin features shown in panel (B). D Distance (averaged over an ensemble) between the dynamical state of the reservoir neural network and that of the target attractor versus the cue duration during warming. The cyan curves are obtained from 100 random trials, with the red curve as the average. The two horizontal black dashed lines correspond to the perturbation magnitude of 100.5 and −1 in (C), and their intersection points with the red curve represent the warming lengths of 10 and 50-time steps, respectively, which are consistent with the transition regions in (A), suggesting a connection between the basin structure and the retrieval behavior.

It is possible to arbitrarily pick up a “menu” of the attractors and train them to coexist in a memory RC as one single dynamical system. A question is: what are the typical structures of the basins of attraction of the memorized attractors? An example of the 2D projection of the basin structure is shown in Fig. 5(B). In a 2D plane, there are open areas of different colors, corresponding to the basins of different attractors, which are critical for the stability of the stored attractors. More specifically, when the basin of attraction of a memorized attractor contains an open set, it is unlikely for a dynamical trajectory from the attractor to be “kicked” out of the basin by small perturbations, ensuring stability. Figure 5(C) shows the probability (success rate) for a dynamical trajectory of the reservoir network to stay in the basin of a specific attractor after being “kicked” by a random perturbation. For a small perturbation, the probability is approximately one, indicating that the trajectory will always stay near the original (correct) attractor, leading essentially to zero retrieval error in the dynamical memory. Only when the perturbation is sufficiently large will the probability decrease to the value of 1/6, signifying random transition among the attractors and, consequently, failure of the system as a memory. We note that there are quite recent works on basin structures in a high-dimensional Kuramoto model where a large number of attractors coexist61, which bears similar patterns to the ones observed in our memory RCs. Further investigation is required to study if these patterns can be truly generic across different dynamical systems.

The echo state property of a reservoir computer stipulates that a trajectory from an attractor in its original phase space corresponds to a unique trajectory in the high-dimensional phase space of the RNN in the hidden layer11,62,63,64. That is, the target attractor is be embedded in the RC hidden space. Figure 5(D) shows how the RC state approaches this embedded target state during the warming by the cue. This panel shows the distances (cyan curves) between each of the 100 trajectories of the reservoir neural network and the embedded memorized target attractor versus the cue duration during warming. (The details of how this distance is calculated are given in SI.) The ensemble-averaged distance (the red curve) decreases approximately exponentially with the cue length, indicating that the RC rapidly approaches the memory state’s embedding.

The results in Fig. 5 suggest the following picture. The basin structure in Fig. 5(B) indicates that, when a trajectory approaches the target attractor, it usually begins in a riddled-like region that contains points belonging to the basins of different memorized attractors before landing in the open area containing the target attractor. The result in Fig. 5(C) can be interpreted, as follows. The recall success rate when the RC is away from the open areas and still in the riddled-like region is almost purely random (1/K). In an intermediate range of distance where two types of regions are mixed (as the open areas do not appear to have a uniform radius), the recall success rate increases rapidly before entering the pure open region and reaches 100% success rate. This is further verified by Fig. 5(D), where the two horizontal black dashed lines indicate the two distances that equal the magnitudes of perturbation in Fig. 5(C) under which the rapid decline of success rate begins and ends. That is, the interval between these two black dashed lines in Fig. 5(D) is the interval where the RC dynamical state traverses the mixed region. The cue lengths at the two ends of the curve in Fig. 5(D) in this interval are 10 and 50 time steps, which agree well with the transition interval in Fig. 5(A). Taken together, during the warming by the cue, the memory RC begins from the riddled-like region travels through a mixed region and finally reaches the open areas. This journey is reflected in the changes in recall success rate in Fig. 5(A).

It is worth studying if partial information can also successfully recall memorized states in an index-free memory RC. In particular, for the results in Fig. 5(A1–A6), the cue signals used to retrieve any stored chaotic attractor have the full dimensions as that of the original dynamical system that generates the attractor. What if the cues are partial with some missing dimension? For example, if the attractor is from a 3D system, then a full cue signal has three components. If one is missing, the actual input cue signal is 2D. However, since the reservoir network still has three input channels, the missing component can be compensated by the corresponding output component of the memory RC as a feedback loop. (This rewiring scheme was suggested previously50.) Figure 6 shows, for the six attractors in Fig. 5(A1–A6), the success rate of retrieval versus the cue length or warming signal, for three distinct cases: full 3D cue signals, partial 2D cue signals with one missing dimension, and partial 1D cue signals with two missing dimensions. It can be seen that 100% success rates can still be achieved in most scenarios, albeit longer signals are required. The thresholds of cue length, where the success rate begins to increase significantly larger than a random recall (around 1/6), are postponed to a similar degree for all the 2D cases among different attractors. These thresholds are also postponed to another similar degree for all the 1D cases. This result further suggests that the threshold of cue length is a feature of the RC structure and properties rather than a feature of the specific target attractor, with the same way of rewiring feedback loops yielding the same thresholds. (A discussion taking into account all the possible combinations of missed dimensions is provided in SI.)

Fig. 6: Attractor retrieval with partial cues in index-free reservoir-computing memory.
figure 6

AF Shown is the success rate of memory retrieval of the six different chaotic attractors arising from 3D dynamical systems, respectively, with (i) full 3D cues, (ii) 2D cues with one dimension missing, and (iii) 1D cues with two dimensions missing. For 2D cues, a longer warming time is needed to achieve 100% success rate. For 1D cues, in most cases 100% success rate can be achieved with a longer warming time, but it can also occur that the perfect success rate cannot be achieved, e.g., (A), (F).

A further question is: what if the warming signals for the index-free reservoir-computing memory contain random errors or are noisy? Figure 7 shows the retrieval success rate versus the cue length for three different levels of Gaussian white noise in the 3D cue signal. For all six chaotic attractors, a 100% success rate can still be achieved for relatively small noise, but the achievable success rate will decrease as the noise level increases. Different from the scenarios with partial dimensionality, the threshold of required cue length where the success rate jumps significantly from random memory retrieval (1/6) remains the same with different noise levels. We conclude that noisy cues can lower the saturated success rate, while partial information (with rewired feedback loops) can slow down the transition process from random retrieval to saturation in success rate and sometimes lower the saturated success rate.

Fig. 7: Attractor retrieval with noisy cues from index-free reservoir memories.
figure 7

AF The success retrieval rate of the six distinct chaotic attractors in Fig. 1(B) (from left to right), respectively, versus the cue length.

 

Reference

Denial of responsibility! TechCodex is an automatic aggregator of Global media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, and all materials to their authors. For any complaint, please reach us at – [email protected]. We will take necessary action within 24 hours.
DMCA compliant image

Leave a Comment