Latching dynamics as a basis for short-term recall

We discuss simple models for the transient storage in short-term memory of cortical patterns of activity, all based on the notion that their recall exploits the natural tendency of the cortex to hop from state to state—latching dynamics. We show that in one such model, and in simple spatial memory tasks we have given to human subjects, short-term memory can be limited to similar low capacity by interference effects, in tasks terminated by errors, and can exhibit similar sublinear scaling, when errors are overlooked. The same mechanism can drive serial recall if combined with weak order-encoding plasticity. Finally, even when storing randomly correlated patterns of activity the network demonstrates correlation-driven latching waves, which are reflected at the outer extremes of pattern space.

memory function. In practice, for the sake of a fair comparison among the mechanisms, in the simulations we 85 change each parameter the other way around, as explained in 2.4 below. 87 The first mechanism models increased depth of the attractors in the patches of cortex where any of the L 88 patterns is active, which could reflect a generic short-term potentiation of the synaptic connections among 89 pyramidal cells in those patches, what in the Potts network is summarily represented by the parameter w 90 ([8, 9]). In the model, each of the L items is active over aN Potts units, and their active states are shared 91 with many other items not intended to go into STM. This is the coarseness that leads to limited capacity of

Model 1: Stronger local feedback for the items held in STM
where ξ µ i is the state of pattern ξ µ at the unit i, Θ(·) is the Heaviside step function and δ ξ µ i ,k is the Kronecker's 96 delta symbol. In the second mechanism, a parameter regulating firing rate adaptation is reduced selectively for the neurons 101 that are active, in those patches, in the representation of the L items. That is, we decrease adaptation, by 102 subtracting from the adapted threshold (θ k i ) a term ∆θ, for the Potts states that are active in any one of the 103 L patterns,

105
The third mechanism we consider is the one acting on the long-distance synaptic connections between neurons, 106 represented in the Potts model [8] by the tensor connections between Potts units. We model short-term 107 potentiation of the synaptic connections by stronger tensor connections. Since the latter connect separate 108 Model 3a: Model 3 with only autoassociative connections in short-term memory 115J kl where J kl ij is the strength of connections that do not belong to any one of L patterns in STM, given in Eq. (6). 116 Here we say that a connection belongs to a pattern when the two states that are paired by the connection 117 participate in the representation of the pattern.
where J kl ij is again given in Eq. (6). In this model, we potentiate extra connections in addition to those that We find that for some of the mechanisms, latching dynamics are effectively constrained to the L items, 134 but only up to a given value of L (Fig. 1b). We can understand this limitation as being due to interference 135 from the LTM items.

136
Let us consider the proportion of elements (units, states and connections for Model 1, 2 and 3, respectively) 137 that are kicked for a given number L. If we randomly pick one unit (state or connection), then the probability 138 of it belonging to one of L patterns in STM can be written as Model 3b (Fig. 1d). All of these quantities approach 1 when L becomes very large. As a rough estimation, 141 we can set a criterion of P L = 1 − 1 e , above which one cannot discriminate STMs from LTMs. We can then for Model 3b given the parameters for which we run the simulations, S = 7, a = 0.25).

145
When we increase L, there are two factors that affect M corr . The first one is that the seemingly quasi-  with Models 2 and 3.

161
In contrast, Models 2 and 3 appear broadly equivalent in terms of ∆M corr , except that the mechanism 162 acting on the long-range connections, in its variant 3a, is not affected by interference until much higher 163 L values. This is because in this case, the kick is more selectively affecting a subset of the very many 164 N × C × S × (S − 1)/2 tensor connection values in the model.

165
There is, however, a major difference between Models 2 and 3 if we consider their overall propensity 166 to latch. To measure this propensity, we first cue the network with one of the memorised patterns, after 167 which we count the total number of latches that occur until the dynamics stop on their own (Fig. 1c). 168 We can see that with Model 2, constraining the dynamics to be among the L items actually enhances the mimicked by θ k i , a pattern cannot be revisited soon after it is first visited. Therefore we see a slow increase (with respect to L) in latching propensity for Model 3 (Fig. 1c). But for Model 2, the "refractory" effect of θ k i is greatly screened due to the direct manipulation on θ k i . At L = 4 for Model 2 we observe clean latching, 178 in the sense that the condensed pattern is almost perfectly retrieved whereas non-condensed patterns have 179 virtually zero overlap (Fig. S5).

180
In summary, when considering the quantity ∆M corr as well as the propensity to latch, Model 2 expresses 181 a functionality more plausibly associated with what we would expect from short-term memory. First, the 182 quantity ∆M corr does not increase indefinitely upon increasing L, and has a limit. Moreover, the basic 183 propensity to latch also falls off with increasing L, reminiscent of the slowing down of retrieval from memory 184 as the set size increases [7]. Therefore, in the remainder of this work, we focus on Model 2. Having discussed three different models for short-term recall, we study in detail Model 2, that uses a lower 188 adaptive threshold as a mechanism to hold items in STM. We run extensive simulations and see how it 189 functions, comparing it also with experimental data. the putative mechanisms that hinder recall, a conceptual model for memory recall has been proposed [6,11].

194
In this model, a quantity R is defined as the number of recalled items until the searching trajectory enters  In contrast, the dynamics in our model are not deterministic (we will discuss this point in Section 5), and 201 we hardly ever observe a loop in the network trajectories. However we can still compute a similar measure 202 to R, labeled as M it , which is the number of retrieved patterns until the network repeats one transition.

203
Compared to the measure R computed in [6], our measure M it has a steeper slope, though again lower than 1 204 (Fig. 2a). Thus, M it still grows sublinearly with respect to L but it does not match the R measure, whose 205 slope is 0.5. Alternatively, we have computed the number of retrieved items until the network simply revisits 206 one of the already-visited items, labeled as M i1 . In contrast to M it , this latter quantity shows a slope lower 207 than 0.5, which again does not match the R trend in [6] (Fig. 2a). For this reason, we define a new quantity 208 called M i (L), which is the number of recalled items until one item is repeated twice. This somewhat contrived 209 quantity has a behaviour indeed similar to that theoretically expected from the quantity R, that is, a slope of 210 0.5 (Fig.2b).

211
In computing these three measures, we have ignored errors (extra-list items) in order to compare with 212 [6,11]. Note that errors are not discussed in their conceptual model and experiment, in which retrieval of  Clearly, the parameters of the experimental protocol can be expected to affect recall, including the 242 amount of time allocated for recall. However, in our experiment, participants only need to click on the correct 243 locations (as opposed to typing in the words they recall [6]), and setting a fixed recall time may seem ad hoc. 244 As an alternative, and to further explore the validity of latching dynamics as a model for this experiment, we

If limited by errors, the network cannot recall beyond its STM capacity 255
The measure M corr was introduced and discussed in section 2 for comparing three different models. Here we 256 compute the same quantity with a slight modification; in order to compare with our experimental data, we 257 consider sequences of variable length that depends both on list length L and time. We consider lengths of    284 We have found that repetition-limited and duration-limited measures of performance in free recall in the Potts 285 model endowed with short term memory function can express quasi-square-root behaviour in the number of 286 items in the list. One question that naturally arises is whether the same model can express behaviour similar 287 to serial recall, a paradigm very similar to free recall, but with a crucial difference. Here, participants are 288 instructed to recall items in the same order as they have been presented, making the task more difficult. 289 We have performed serial recall experiments with three different types of materials. We asked the increased, until a memory capacity limit for this stimulus type and presentation time was reached. In this 300 way we measure the memory capacity for serial recall by computing the Area Under the Curve (Fig. 6). 301 Figure 6: Short-term memory capacity for serial recall does not markedly depend on stimuli type. Memory capacity for serially presented stimuli for different presentation times: bars correspond to the average across participants of the longest correctly recalled sequence, while the distance from the bar to the dot of the same colour corresponds to the standard deviation of the mean. We performed the experiment for three different stimulus types, shown in different colours.
Our experiment yields two main results (Fig. 6). The first is that the type of stimulus does not affect the 302 recall probability, except for a slight disadvantage in the discrete Locations condition, suggesting a universal 303 mechanism for recall independent of the material, which manifests itself at the systems level. consider the result with digits in order to establish a comparison with our model. 309 We used Model 2 (lower adaptive threshold for items held in STM) to constrain the dynamics into a 310 subset of L = 6 patterns intended as the 6 digits of our experiment. In addition to that, we introduced 311 heteroassociative weights, similar to Model 3, to provide the sequential order of presented digits (see Eq.

312
(13)). 313 We find a good agreement between our experimental data and the model (Fig. 7). In addition, we find 314 that human subjects perform better if the to-be-memorised digit series include ABA or AA (Figs. 7a, 7c The heteroassociative component of the learning rule (Eq. (13)) provides "instructions" to the network 320 regarding the sequential order of recall, allowing it to perform serial recall (this is to be contrasted with the 321 model with a purely autoassociative learning rule, performing free recall). The strength of such instructions 322 is expressed through the parameter λ. We find that this parameter plays a role similar to that of presentation 323 time in our experiments; increasing it enhances performance, just as increasing the presentation time increases 324 the performance of human subjects (Fig. 7). However, values of λ that are too large again make performance 325 worse and deteriorate the quality of latching (Fig. 7d). The dynamics is a stereotyped sequences of patterns, 326 see Fig. S9, without any nonlinear convergence towards attractors, and the sequence itself is progressively 327 harder to decode. Therefore, the most functional scenario is when the heteroassociative instruction acts as a 328 bias or a perturbation to the spontaneous latching dynamics rather than enforcing strictly guided latching 329 in the Potts model. This is in sharp contrast with the mechanism for sequential retrieval envisaged in the 330 model considered in [17], where the heteroassociative connections are the main and only factor driving the 331 sequential dynamics; in that case, without it, there are no dynamics but rather, at most, the retrieval of only 332 the first item. The effect of lower adaptive threshold (expressed by ∆θ) on latching sequences is to constrain 333 the dynamics to a subset of presented items among p patterns, but values of ∆θ that are too high degrade 334 the performance as well as the quality of latching (Fig. 7b, 7d). ones. We find that the capacity of the model (denoted as AUC in the legend in Fig. 7e) increases by as much 344 as 1 item for the congruous case relative to the incongruous case.

345
These results together with those from the previous two sections indicate that intrinsic latching dynamics, 346 similar to a random walk, can serve short-term memory (e.g., free recall). Furthermore latching dynamics can 347 also serve serial recall, if supplemented by biases that modify the random walk trajectory; the modification 348 (or perturbation) should be a quantitative one, which biases the random walk character of the trajectories, 349 rather than an all-or-none, or qualitative one, that inhibits it. This is consistent with our recent experimental result, where "guided" serial recall leads to poorer performance than a non-guided control (O. Soldatkina, in we define a quantity, called d, which is an index of "semantic" distance between two patterns in their where C as and C ad measure the correlation between two patterns (see Eqs. (??)). 385 We consider the distribution of d(µ n , µ n+z ) for 6 values of z (Fig. 8b). At z = 1, latching occurs mostly 386 between highly correlated patterns as expected, where the higher correlation is expressed by lower d. At with the initially retrieved one at the third and fourth step. So we can say that the network reaches the 392 most "distant" pattern from its "initial" pattern around z = 3.5, which is the "reflection" point of the wave 393 (Fig. 8c). As z increases further to reach 6, the density curve is getting closer to the curve for z = 1, thus 394 approaching the periodicity mentioned above. This periodicity is confirmed by Figs. S12, S13.

395
These results indicate that latching trajectories by Potts networks have a quasi-random walk character,  for quasi-discrete sequences of states, with properties that turn out to be similar to those of random walks.

414
This happens, however, only within a specific parameter range, and only to a partial extent, so that often one 415 has in practice several intertwined sequences, with simultaneous activation of multiple patterns, as well as 416 pathological transitions, all characteristics with potential to account for psychological phenomena, and which 417 are lost in a more abstract purely symbolic model. We have thus discussed three generic neural mechanisms 418 that may contribute to restrict the random walk, approximately, from p to L items. Although not exclusive, 419 we have argued that the second such mechanism is the one most relevant to account for the recall of list of 420 unrelated items.

421
To model the recall of ordered lists, an additional heteroassociative mechanism can be activated, which 422 biases the random walk, but again approximately, resulting in frequent errors and limited span. We have 423 observed that, at least in the Potts network, if the heteroassociations, which amounts to specific instructions, 424 dominate the dynamics, the random character is lost, but with it the entire latching dynamics -which cannot 425 be harnessed to just passively follow instructions. close repetitions, as a result of the adaptation-based mechanism. Interestingly, a tendency to perceive random 449 processes as less prone to repetition than they really are is a hallmark of human cognition [27]. Beyond evidence has been difficult to obtain, because a process that appears to involve two items active together, 467 might in fact rapidly alternate between them. Recently, however, the genuinely concurrent activation of 468 two items has been reported with a model-based analysis of EEG data [29]. In that study, holding on to 469 the two items meant better performance in the task, so it reflects a capability, not a flaw of the short-term 470 mechanism. If extended to sequences of endogenously generated states, as the Potts model indicates would 471 occur, at least in certain regimes, it would mean that not only the focus of attention when performing a 472 similar task need not be unique, but also that parallel streams of thoughts can be entertained along partially  connections, whose values are pre-determined by a Hebbian-learning rule, where c ij is either 1 if there is a connection between unit i and j or 0 otherwise. The average number of  The time evolution of the network is governed by the equations where the variable θ k i is a specific threshold for unit i, specific also for state k, varying with time constant τ 2 , 491 and intended to model adaptation, i.e. synaptic or neural fatigue selectively affecting the neurons active in 492 state k, and not all neurons subsumed by Potts unit i. The "current" that the unit i in state k receives is Here w is another parameter, the "local feedback term", first introduced in [9], to model the stability of 494 local attractors in the full model, as justified with a semi-analytical derivation later [8]. It helps the network 495 converge towards an attractor, by giving more weight to the most active states, and thus it effectively deepens 496 the attractors.

497
The activation along each state is given by where r 0 i ≡ U + θ 0 i .

499
Combining slow and fast inhibition 500 Latching dynamics, in the model equipped with either slow or fast inhibition, are studied in [9,18]. Here, we 501 consider a more realistic case in which both slow and fast inhibition are taken into account. Formally, it is 502 based on replacing the inhibitory or non-specific threshold θ 0 i with the sum θ A i + θ B i (to denote fast, GABA A 503 and slow, GABA B inhibition, respectively) and writing separate equations where, instead of τ 3 either short or long, one sets τ A < τ 1 << τ 2 << τ B and γ A as the parameter setting

517
The Potts network has four different phases in the parameter space considered here (Fig. 9). The first 518 one is the trivial no-latching phase, where the network dynamics stop without a retrieval of patterns other 519 than the cued one. The second one is what we are interested in as a model for short-term memory, i.e., the 520 finite-latching phase, where the quality of latching is good. The third phase is infinite-latching one, where 521 the dynamics go on endlessly once being cued. In the fourth phase the retrieved pattern is not destabilized 522 by adaptation, and remains as a steady state. We call this phase stable attractor phase.

523
The average dwell time in a retrieved pattern is continuously increasing with growing w in the infinite 524 latching phase until the network enters the stable attractor phase. This is observed, though, only for γ A < 0.4, 525 when other parameters are set as in Table 1. Except for this property, no major qualitative difference appears 526 among the 3 regimes: slow, fast and intermediate inhibition regime.

527
In [9], these phases and their phase transitions are studied in a semi-analytical way by using an energy supporting the model of short-term memory as an activated portion of long-term memory can be found in [1].

548
When a subject is performing a task of immediate recall, we posit the following assumptions. network.

553
The facilitation of attractors for STM items can be done by changing some parameters of the network. We 554 propose in the main text three different models for short-term memory function that can constrain latching 555 dynamics to a small subset of activity patterns that represent items in long-term memory.
Quality of latching and correlation between patterns 569 The quality of latching is evaluated by means of d 12 − Q. d 12 is the difference between the largest overlap 570 and the next largest one, averaged over time and over so called quenched variables [18], while where q(t) ≡ 1 m µi is the overlap of the network activity with a pattern µ i and µ 1 , ..., µ L−1 572 are the L − 1 patterns having largest overlaps excluding the maximum overlap. This quantity is a kind of 573 measure on how "condensed", i.e., partially recalled, the non-recalled patterns are.

574
The correlation between patterns is measured by two quantities [9,33], The average values of C as and C ad over different realizations of randomly-correlated patterns are given by Network parameters 578 Network parameters used in this study are set as in Table 1

Experiments of free recall and serial recall 580
Both of the experiments were conducted online, with participants recruited through https://www.prolific.co/.

582
The 36 participants were instructed to watch a sequence appear on the computer screen and repeat the 583 sequence just after, by clicking on the screen. They have to repeat sequences of L stimuli (L starting from 3).

584
In one of the conditions, 5 trials for each length L, with L growing until 3 out of 5 trials are incorrect; the last 585 L is then called the limit capacity for this participant in this condition. For each participant the sequences 586 were of all three stimulus variants: -(D) Digits out of {1, 2, 3, 4, 5, 6} on a black screen, presented one at The same hexagonal grid as in serial recall is used (Fig. 10). In this experiment, the sets of stimuli were 597 presented all at once, and the participants (N = 40) were instructed to repeat as many as they could recall, where γ is the lower incomplete Gamma function, which for L → ∞ grows as a square root, 611 M (π/2)(L − 1) + 1.
One way to approximate this expression for L large is to assume 1 − k L−1 e − l L−1 , so that the product Keeping only the first term in the exponent of the integral To derive its asymptotic behaviour for large L, it is convenient to separate one term and write and then use Stirling's approximation for the factorial to evaluate the sum as half an indefinite integral for 620 −∞ < l < ∞, which can be evaluated at its saddle point near l = 1/2, yielding again, to leading order, 621 M (π/2)(L − 1) + 1.