Amoeba-Inspired Heuristic Search Dynamics for Exploring Chemical Reaction Paths

We propose a nature-inspired model for simulating chemical reactions in a computationally resource-saving manner. The model was developed by extending our previously proposed heuristic search algorithm, called “AmoebaSAT [Aono et al. 2013],” which was inspired by the spatiotemporal dynamics of a single-celled amoeboid organism that exhibits sophisticated computing capabilities in adapting to its environment efficiently [Zhu et al. 2013]. AmoebaSAT is used for solving an NP-complete combinatorial optimization problem [Garey and Johnson 1979], “the satisfiability problem,” and finds a constraint-satisfying solution at a speed that is dramatically faster than one of the conventionally known fastest stochastic local search methods [Iwama and Tamaki 2004] for a class of randomly generated problem instances [http://www.cs.ubc.ca/~hoos/5/benchm.html]. In cases where the problem has more than one solution, AmoebaSAT exhibits dynamic transition behavior among a variety of the solutions. Inheriting these features of AmoebaSAT, we formulate “AmoebaChem,” which explores a variety of metastable molecules in which several constraints determined by input atoms are satisfied and generates dynamic transition processes among the metastable molecules. AmoebaChem and its developed forms will be applied to the study of the origins of life, to discover reaction paths for which expected or unexpected organic compounds may be formed via unknown unstable intermediates and to estimate the likelihood of each of the discovered paths.

As the value of N increases, the total number of possible assignments grows exponentially as 2 N and no polynomial-time algorithm for finding a solution is known. SAT belongs to the particularly difficult class of problems known as NP (nondeterministic polynomial time). Moreover, SAT was the first problem shown to be NP-complete; this means that any problem in NP may be reduced to SAT in polynomial time [Garey and Johnson 1979]. For this reason, fast algorithms and systems capable of solving SAT may be applied to solve an extremely large number of problems. Many of these problems are closely related to applications that span a wide range of fields, including automatic inference, software/hardware verification, information security, and bioinformatics. Aono et al. (2013) formulated the AmoebaSAT algorithm, which utilizes the spatiotemporal dynamics of a coupled system of 2N units corresponding to pseudopod-like branches of an amoeba, to solve the N-variable SAT problem. Each unit is assigned a variable name i∈{1,2, ⋯,N} and a true/false value v∈{0,1} and is associated with two variables X i,v and R i,v . If, at discrete time step t, a resource is supplied to unit (i,v) (corresponding to the elongation of the amoeba branch), we denote this by R i;v t ð Þ ¼ 1, and we interpret this as meaning that the system is considering the assignment x i = v. If no resource is supplied, we write this as R i;v t ð Þ ¼ 0: We define a variable X i,v ∈{−1,0,1} to represent the accumulated value of the resourcesupply variable R i,v : The quantity X i,v may be understood as an abstract representation of the displacement from the equilibrium volume of the amoeba branches with one of the three values {−1,0,1}. In each step, the variables X=(X 1,0 ,X 1,1 ,X 2,0 ,X 2,1 ,⋯X N,0 ,X N,1 ) are transformed into the variable as- Þaccording to the following rule: We put S i,v (t)=1orS i,v (t)=0 to indicate the application or non-application, respectively, of a signal (stimulus) that "bounces back" the supply of resources to R i;v (corresponding to an repulsive stimulus inhibiting the elongation of the amoeba branch). For now, we wish to focus on the leftmost clause (¬x 1 ∨x 3 ∨¬x 4 ) in f. If we have both x 1 =1 and x 3 =0, then we require that x 4 should not be 1 (i.e., x 4 ≠1) in order for this clause to be true; indeed, otherwise we find (¬(x 1 =1)∨x 3 =0∨¬(x 4 =1))=0. For this reason, if at step t we have both X 1,1 (t)=1andX 3,0 (t)= 1, then at step t+1 we apply a bounceback signal to R 4,1 (t) (i.e., we determine S 4,1 (t+1)=1, so that a resource is supplied to unit (4,1) with a certain low probability P 4,1 (t+1). We call this rule a "bounceback rule". Similarly, from the leftmost clause we can read off the bounceback rules X 1,1 (t)=1∧X 4,1 (t)=1⇒S 3,0 (t+1)=1andX 3,0 (t)=1∧X 4,1 (t)=1⇒S 1,1 (t+1)=1. We proceed similarly to investigate all clauses in f to analyze mutual interdependencies between the variables and determine a set of all bounceback rules. On the other hand, for each unit (i,v) in which the bounceback signal is not applied (i.e., S i,v (t)=0), the resource supply occurs (i.e., Under the bounceback rules defined in Aono et al. (2013), if a system state X=(X 1,0 ,X 1,1 , , then the system is "stable". If this stability criterion is not satisfied, there is a high probability that the sign of X i,v (t+1) differs from that of X i,v (t) depending on S i,v (t), and the system state X is unstable. Figure 1 shows an example of time evolution of AmoebaSAT.
To evaluate the solution-searching performance of AmoebaSAT, we focused on a group of problems known as Uniform Random-3-SAT, in which all clauses are formed from three literals from the benchmark problems offered by the online SATLIB library [http://www.cs. ubc.ca/~hoos/5/benchm.html]. We selected 100 instances with N=75 variables and 100 instances with N=100 variables. For performance comparison, we considered WalkSAT, one of the categories of local search algorithms presently known to be the fastest heuristic methods for randomly generated 3-SAT instances [Iwama and Tamaki 2004]. WalkSAT configures its initial state by assigning all variables to random true or false values. Then the algorithm selects at random one clause from among the clauses that are not satisfied (i.e., are false) with the variable assignments at a given time and then chooses at random a single variable from within that clause to flip (changing 0 to 1 or 1 to 0). The algorithm then iterates this basic behavior. For each problem instance, we ran 500 Monte Carlo simulations of both the AmoebaSAT and WalkSAT algorithms and compared the average number of time steps (number of iterations) t required to arrive at the solution. We found that AmoebaSAT is able to find the solution with a speed orders of magnitude greater than that of WalkSAT .
Understanding the origins of the high performance exhibited by AmoebaSAT is a subject of current investigation. Whereas WalkSAT only updates one variable in each step, AmoebaSAT incorporates many processes, which collectively update multiple variables and evolve Analytical results have been obtained that suggest that this unique "concurrent search" feature of the algorithm is the source of its high performance.
As shown in Fig. 1, AmoebaSAT not only stabilized in a first-found solution but also exhibited probabilistic transition behavior among a number of solutions. The duration for which a solution is maintained, which corresponds to the time spent in one of metastable states, could be seen as representing with the concept of "thermodynamic stability". The transition probabilities between two solutions vary depending on each pair of solutions, suggesting that the transition probability may contain information on "kinetics". For example, the transition probability from a solution (x 1 ,x 2 ,x 3 ,x 4 )= (1,1,1,1)to(1,1,1,0) is higher than that to (0,1,1,0), because the former occurs with a bit flip whereas the latter requires two simultaneous bit flips that occur only infrequently.
Here we propose a new model, called "AmoebaChem," that is an extended form of AmoebaSAT. In this article, we give only a brief explanation on AmoebaChem due to space limitations, and its detailed descriptions and results will be reported elsewhere. AmoebaChem considers a molecule as a (meta) stable solution in which several types of bounceback rules, which represent physical and chemical constraints including Lewis's "octet rule," are satisfied by all the atoms in the molecule. Figures 2 and 3 show the bounceback rules that are generated automatically when we input two nitrogen atoms and six hydrogen atoms. As shown in Fig. 4, each unit of AmoebaChem, which is depicted as a box marked with a 0 or 1 in Figs. 2 and 3, represents a bonding state of valence electrons owned by two atoms. Fig. 2 Bounceback rules in AmoebaChem, where two nitrogen atoms and six hydrogen atoms are introduced. The table shows that an N 2 molecule and three H 2 molecules are formed A major difference between AmoebaChem and AmoebaSAT is that the bounceback signal intensities of the former (S Type (t)) are given as parameters taking continuous values in a real interval [0.0,1.0] whereas that of the latter are binarized as S(t)∈{0,1}. With smaller bounceback signal intensities, the behavior of AmoebaChem becomes more "unstable" since  (perturbations) are introduced, implying that the equivalent of "temperature" can be raised by lowering the parameters. Moreover, by changing the parameters, the equivalent of "kinetics" can be modulated. In fact, Fig. 5 shows that, for example, the transition probability from the initially given 1-N 2 -3-H 2 state (Fig. 2) to a 2-NH 3 state (Fig. 3) changed significantly as the parameter set was altered. It would be important to establish a methodology to tune the parameters for making the behavior of AmoebaSAT more realistic so that it can be consistent with experimentally observed reaction data.
Although each unit of AmoebaChem introduced in this article represents a bonding state of valence electrons owned by two atoms, it can be replaced with other representations depending on purposes, for example, the number of bondings between two atoms. We can also simulate more realistic aspects in organic chemistry such as polarization, ionization, and radical reactions.
These models will be useful for finding unknown intermediate states that may exist before expected or unexpected organic products are formed.
If a variety of reaction paths can be found, we will be able to estimate the likelihood of each path by analyzing the durations and transition probabilities of the discovered intermediates.
We can also develop a simulator for RNA (secondary) structure prediction by considering each unit as representing a nucleotide and by introducing different bounceback rules that emulate complementary base-pairing rules and their related constraints. Furthermore, replacing nucleotides with amino acids, this simulator may be extended to protein (tertiary) structure prediction, if appropriate bounceback rules that represent physical and chemical constraints of protein folding such as hydrophobic-hydrophilic interactions can be introduced. Further investigations and developments will be devoted to advancing the study of the origins of life.