A Tutorial on Analysis and Simulation of Boolean Gene Regulatory Network Models

Driven by the desire to understand genomic functions through the interactions among genes and gene products, the research in gene regulatory networks has become a heated area in genomic signal processing. Among the most studied mathematical models are Boolean networks and probabilistic Boolean networks, which are rule-based dynamic systems. This tutorial provides an introduction to the essential concepts of these two Boolean models, and presents the up-to-date analysis and simulation methods developed for them. In the Analysis section, we will show that Boolean models are Markov chains, based on which we present a Markovian steady-state analysis on attractors, and also reveal the relationship between probabilistic Boolean networks and dynamic Bayesian networks (another popular genetic network model), again via Markov analysis; we dedicate the last subsection to structural analysis, which opens a door to other topics such as network control. The Simulation section will start from the basic tasks of creating state transition diagrams and finding attractors, proceed to the simulation of network dynamics and obtaining the steady-state distributions, and finally come to an algorithm of generating artificial Boolean networks with prescribed attractors. The contents are arranged in a roughly logical order, such that the Markov chain analysis lays the basis for the most part of Analysis section, and also prepares the readers to the topics in Simulation section.


INTRODUCTION
In most living organisms, genome carries the hereditary information that governs their life, death, and reproduction. Central to genomic functions are the coordinated interactions between genes (both the protein-coding DNA sequences and regulatory non-coding DNA sequences), RNAs and proteins, forming the so called gene regulatory networks (or genetic regulatory networks).
The urgency of understanding gene regulations from systems level has increased tremendously ever since the early stage of genomics research. A driving force is that, if we can build good gene regulatory network models and apply intervention techniques to control the genes, we may find better treatment for diseases resulting from aberrant gene regulations, such as cancer. In the past decade, the invention of high throughput technologies has made it possible to harvest large quantities of data efficiently, which is turning the quantitative study of gene regulatory networks into a reality. Such study requires the application of signal processing techniques and fast computing algorithms to process the data and interpret the results. These needs in turn have fueled the development of genomic signal processing and the use of mathematical models to describe the complex interactions between genes. Making it possible to analyze or intervene in the network through signal processing methods.
Among various mathematical endeavors are two Boolean models, Boolean networks (BNs) [1] and probabilistic Boolean networks (PBNs) [2], in which each node (gene) takes on two possible values, ON or OFF (or 1 and 0), and the way genes interact with each other is formulated by standard logic functions. They constitute an important class of models for gene regulatory networks, in that they capture some fundamental characteristics of gene regulations, are conceptually simple, and their rule-based structures bear physical and biological meanings. Moreover, Boolean models can be physically implemented by electronic circuits, and demonstrate rich dynamics that can be studied using mathematical and signal processing theory (for instance, Markov chains [2,3]).
In practice, Boolean models have been successfully applied to describe real gene regulatory relations (for instance, the drosophila segment polarity network [4]), and the attractors of BNs and PBNs have been associated with cellular phenotypes in the living organisms [5]. The association of network attractors and actual phenotypes has inspired the development of control strategy [6] to increase the possibility of reaching desirable attractors ("good" phenotypes) and decrease the likelihood of undesirable attractors ("bad" phenotypes such as cancer). The effort of applying control theory to Boolean models is especially appealing in the medical community, as it holds potential to guide the effective intervention and treatment in cancer.
The author would like to bring the fundamentals of Boolean models to a wider audience in light of their theoretical value and pragmatic utility. This tutorial will introduce the basic concepts of Boolean networks and probabilistic Boolean networks, present the mathematical essentials, and discuss some analyses developed for the models and the common simulation issues. It is written for researchers in the genomic signal processing area, as well as researchers with general mathematics, statistics, engineering, or computer science backgrounds who are interested in this topic. It intends to provide a quick reference to the fundamentals of Boolean models, allowing the readers to apply those techniques to their own studies. Formal definitions and mathematical foundations will be laid out concisely, with some in-depth mathematical details left to the references.

PRELIMINARIES
In Boolean models, each variable (known as a node) can take two possible values, 1 (ON) and 0 (OFF). A node can represent a gene, RNA sequence, or protein, and its value (1 or 0) indicates its measured abundance (expressed or unexpressed; high or low). In this paper, we use "node" and "gene" interchangeably.
A state in Boolean models is a binary vector of all the gene values measured at the same time, and is also called the gene activity (or expression) profile (GAP). The state space of a Boolean model consists of all the possible states, and its size will be n 2 for a model with n nodes. Definition 1 [2,7] A Boolean network is defined on a set of n binary-valued nodes (genes) {0,1} }, , , where each node i x has i k parent nodes (regulators) chosen from V , and its value at time t + 1 is determined by its parent nodes at t through a Boolean function i f , , the state transition is governed by f , written as In Boolean networks, genetic interactions and regulations are hard-wired with the assumption of biological determinism. However, any gene regulatory network is not a closed system and has interactions with its environment and other genetic networks, and it is also likely that genetic regulations are inherently stochastic; therefore, Boolean networks will have limitations in their modeling power.
Probabilistic Boolean networks were introduced to address this issue [2,7], such that they are composed of a family of Boolean networks, each of which is considered a context [8]. At any given time, gene regulations are governed by one component Boolean network, and network switchings are possible such that at a later time instant, genes can interact under a different context. In this sense, probabilistic Boolean networks are more flexible in modeling and interpreting biological data.
, and consists of r . At any time, genes are regulated by one of the BNs, and at the next time instant, there is a probability q (switching probability) to change network; once a change is decided upon, we choose a BN randomly (from r BNs) by the selection probabilities. Let p be the rate of random gene perturbation (flipping a gene value from 0 to 1 or 1 to 0), the state transition of PBN at t (assuming operation under ! j ) is probabilistic, namely [3], where ! is bit-wise modulo-2 addition, ) , , , and x(t ) ! " denotes a random perturbation on the state x(t ) (one or more genes are flipped). Let the set of network functions be } , , and we denote the PBN by G(V , F, c, p) (see Remark 1).
Alternatively, the PBN can be represented as In this representation, each node i x is regarded as being regulated by a set of l(i) Boolean functions } , with the corresponding set of function The two representations are related such that any network function j f is a realization of the regulatory functions of n genes by choosing one function from the function set i ! for each gene i x , and we can write Moreover, if it is an independent PBN, namely S is a set of 2 n vertices, each representing a possible state of a Boolean network; E is a set of n 2 edges, each pointing from a state to its successor state in state transition. If a state transits to itself, then the edge is a loop. The state transitions are computed by evaluating x(t + 1) = f (x(t )) exactly n 2 times, each time x(t ) being 00!0, 00!1,!,11!1 respectively. Fig. (1) is an example of state transition diagram of a three-node BN. Like BNs, a PBN also has finite state space. Although state transitions in a PBN are not deterministic, they can be represented probabilistically. We will show how to construct the state transition diagram of a PBN in the Simulation section.
With the help of state transition diagram, such as the one in Fig. (1), we can easily visualize that in a BN, any state trajectory in time x(0) ! x(1) ! x(2) ! ! must end up in a "trap", and stay there forever unless a gene perturbation occurs. Similarly, if neither gene perturbation nor network switching has occurred, a time trajectory in a PBN will end up in a "trap" in one of the component BNs too; however, either gene perturbation or a network switch may cause it to escape from the trap. In spite of this, when gene perturbation and network switching are rare, a PBN is most likely to reach a "trap" before either occurs and will spend a reasonably long time there.
In a BN, different basins of attraction are depicted in the state transition diagram as disjoint subgraphs. In Fig. (1),

S .
We are interested in the attractors of a Boolean model for at least two reasons: (1) Attractors represent the stable states of a dynamic system, thus they are tied to the long term behavior of Boolean models; (2) Earlier researchers demonstrated the association of cellular phenotype with attractors [5], thus giving a biological meaning to the attractors. Intuitively, when an attractor has a large basin of attraction, the corresponding phenotype is more likely than that of an attractor with much smaller basin of attraction. To develop intervention strategies that change the long term behavior of Boolean models, it is important to study the attractors.

ANALYSES OF BOOLEAN MODELS
Although analysis and simulation are two parallel subjects with Boolean models, the former includes some essential results that lay a foundation for the latter. In this section, we visit Boolean model analysis first.
One of the central ideas with Boolean models is their connection with Markov chains (subsection 3.1). Because of this, Boolean models, under certain conditions, possess steady-state distributions. The steady-state probabilities of attractors, which indicate the long-run trend of network dynamics, can be found analytically via Markov chain analysis (subsection 3.2). Moreover, the relationship between PBNs and Bayesian networks (another class of gene regulatory network models) can be established in a similar manner (subsection 3.3). Lastly, a subsection will be dedicated to structural analysis, which opens a door to other topics beyond this tutorial (such as control of genetic networks).

Markov Chain Analysis
As readers will find out soon, the transition probability matrix introduced below is not only a convenience in Markov chain analysis, but also finds itself useful in simulation, to be discussed in Section 4.

Transition Probability Matrix
On a Boolean model of n nodes, a transition probability matrix n n ij t T 2 2 ] [ = ! can be defined where ij t indicates the probability of transition from one state (which is equal to i ! 1 if we convert the binary vector to an integer) to another state (which corresponds to j ! 1 ).
In a Boolean network !(V, f ) , ij t can be computed by where dec(!) converts a binary vector to an integer, for instance, dec(00101) = 5 . Since BN is deterministic, T contains one 1 on each row, and all other elements are 0's.
In a PBN consisting of r BNs ! 1 (V, f 1 ),!, ! r (V, f r ) , ij t can be computed as follows [2,3]. Note that p (random gene perturbation rate) and ! are defined as in Definition 2, and k c is the selection probability of ! k .
, and l indicates the Hamming distance between s and w .
When taking a closer look at Eq. (6), we find that T is the sum of a fixed transition matrix T and a perturbation matrix ! T , where j T and j c are the transition probability matrix and the network selection probability of the j -th Boolean network, respectively; ij ! is the Hamming distance between states s and w , with dec(s) = i ! 1 and dec(w) = j ! 1 . Thus, the computational complexity for T is O(n ! r ! 2 n ) [9].

Boolean Models are Markov Chains
Given the definition of T matrix in Section 3.1.1, we can see that a Boolean model with n genes is a homogeneous Markov chain of A state x (a binary vector of length n ) in Boolean model has one-to-one correspondence with the i -th state ( What is the use of matrix T ? Let n T W = , we can show that the (i, j) -th element of W is equal to the probability of transition from the i -th state to the j -th state of the Markov chain in n steps, The proof is left as an exercise to the reader.
An N -state Markov chain possesses a stationary distribution (or invariant distribution) if there exists a probability distribution ) , , . Thus in a Markov chain with stationary distribution ! , if we start from the i -th state with probability i ! , the chance of being in any state j after an arbitrary number of steps is always j ! .
An N -state Markov chain possesses a steady-state distribution ) , , it means that regardless of the initial state, the probability of a Markov chain being in the i -th state in the long run is ! i " .
A Markov chain possessing a stationary distribution does not necessarily possess a steady-state distribution.
Why should it be of our concern if the Markov chain has a steady-state distribution or not? This is because we are interested in the Boolean model associated with the Markov chain, and would like to know how it behaves in the longrun. As a reminder, the attractors of a Boolean model are often associated with cellular phenotype, and by finding out the steady-state probabilities of a given attractor, we can have a general picture of the likelihood of a certain phenotype. When a Boolean model possesses (namely, its Markov chain possesses) a steady-state distribution, we can find those probabilities by simulating the model for a long time, starting from an arbitrary initial state x(0) . In fact, this implies the equivalence of "space average" and "time average", as is a common concept in stochastic processes.
When will a Markov chain possess a steady-state distribution? It turns out that an ergodic Markov chain will do. A Markov chain is said to be ergodic if it is irreducible and aperiodic [10].

Definition 5
A Markov chain is irreducible if it is possible to go from every state to every state (not necessarily in one move).

Definition 6
In a Markov chain, a state has period d if starting from this state, we can only return to it in n steps and n is a multiple of d . A state is periodic if it has some period > 1 . A Markov chain is aperiodic if none of its state is periodic.
A Boolean network possesses a stationary distribution, but not a steady-state distribution unless it has one singleton attractor and no other attractors. Here we show how to find a stationary distribution. Assume a BN has m singleton attractors, m a a , , 1 ! , or an attractor cycle } , , is a stationary distribution (the proof is left as an exercise to the reader). If a BN has a combination of singleton attractors and cycles, ! can be constructed such that the probabilities corresponding to the singleton attractors are equal, the probabilities corresponding to the states within each attractor cycle are equal, and 1 = . When there is only one attractor in the BN, the stationary distribution is unique.
When p, q > 0 , a PBN possesses steady-state distribution, because the Markov chain corresponding to the PBN is ergodic. Interested readers can find the proof in [3]. Now that PBN has a steady-state distribution, we can obtain such distribution in two ways: (2) solving the linear and interested readers can consult books on linear algebra; (2) using the empirical methods in Section 4.3. If we are interested in the steady-state probabilities of the attractors only, an analytic method exists, to be discussed next.

Analytic Method for Computing the Steady-State Probabilities of Attractors
Recall from Section 2 that attractors are important to the long-term behavior of Boolean models because they are associated with cellular phenotypes; now we also know that PBNs possess steady-state distributions, which means that a PBN has a unique long-term trend independent of initial state. Therefore, we would naturally ask the question, how can we find the long-term probabilities of these attractors which are so important to us?
In the following, we will present a Markov chain based analytic method that answers this question, and more details, including proofs, can be found in [11].

Steady-State Distributions of Attractors in a BN with Perturbations
First consider a special case of PBN, Boolean network with perturbations (BNp), in which any gene has a probability p of flipping its value. A BNp inherits all the attractors and corresponding basins of attraction from the original BN. Because of the random gene perturbations, BNps possess steady-state distributions (the proof is similar to that of PBN, and it is left as an exercise to the reader). ).
Therefore, we can compute the steady-state probability of any attractor k A by the following two steps: (1) the steadystate probability of basin k B , ) ( k B ! , and (2) the conditional probability of attractor k A given its being in k (I). Obtaining the Steady-State Probability of Basin, ) ( k B ! Define a random variable ! (t ) which measures the time elapsed between the last perturbation and the current time t . ! (t ) = 0 means a perturbation occurs at t . For any starting and define the conditional probability of being in state x ! B given that the system is inside a set B , prior to a perturbation, The following theorem represents the steady-state distribution of the basins as the solution of a group of linear equations, where the coefficients are ) ( k i B B P ! 's. The lemma that follows gives the formula for the coefficients.
is the probability that state transition goes from y to x in one step by gene perturbation.

Now the only unknown is
. When p is small, the system spends majority of the time inside an attractor, and we can use the following approximation, is the number of iterations of f needed to reach the attractor k A from the state x , then for Applying the two lemmas and letting , we can obtain the steady-state probability of attractor k A .
When p is small, using the approximation in Eq. (14),

Steady-State Distributions of Attractors in a PBN
In a PBN, we represent the pair ) , ( f x as the state of a homogeneous Markov chain, ) , ( t t F X , and the transition probabilities are defined as Assume the PBN is composed of r BNs From steps (1) and (2), we can obtain ) , ( ). , is unknown when k ! l , we use the following approximation when p is small, Thus,

Relationship Between PBNs and Bayesian Networks
Bayesian networks (BaN) are graphic models that describe the conditional probabilistic dependencies between variables, and have been used to model genetic regulatory networks [12]. An advantage of BaNs is that they involve model selection to optimally explain the observed data [2]; BaNs can use either continuous or discrete variables, which is more flexible for modeling. In comparison, Boolean models have explicit regulatory rules that carry biological information, which can be more appealing to biologist than the statistic representation of BaNs. Although Boolean models use binary-quantized variables which sets a limitation on the data usage, they are computational less complex than BaNs when learning the network structure from data (see Section 3.3 of [2] for a more detailed discussion and references). Since network structure learning is out of scope of this article, interested readers can refer to [12] for Bayesian learning, [13] for Boolean network learning, and [8,14] for PBN learning.
While BNs are deterministic, PBNs and BaNs are related by their probabilistic nature; like PBNs, dynamic BaNs can be considered as Markov chains too. In the following analysis, we will show that equivalence between PBNs and BaNs can be established under certain conditions [15]. In this analysis, the random gene perturbation rate p in PBN is assumed to be 0.
In a PBN G (V , F, (30) and we have . ). We should understand that those notations are only used as a convenience.

A Binary-Valued DBN as an Independent PBN
is a disjunction of conjunctions, and Define the corresponding function selection probabilities, , it can be verified that Therefore, a binary DBN can be represented as a PBN It should be noted that the mapping from a binary DBN to an independent PBN is not unique, and the above representation is one solution.
Summarizing subsections 3.3.1 and 3.3.2, we have the following theorem [15]. Ba are assumed to have only within and between consecutive slice connections, respectively, can represent the same joint distribution over their common variables.

Structural Analysis
Boolean models, like any other networks, have two issues of interest: Is the model robust? Is the model controllable? From the standpoint of system stability, we require the model be robust, namely, resistent to small changes in the network; from the standpoint of network intervention, we desire that the network be controllable, such that it will respond to certain perturbation. There needs to be a balance of the two properties. These two questions encourage researchers to do the following, (1) Find structural properties of the network that are related to robustness and controllability; (2) Seek ways to analyze the effect of perturbations and to design control techniques.
In 3.4.1, (1) is addressed. We review some structural measures of Boolean models that quantify the propagation of expression level change from one gene to others (or vice versa). In 3.4.2, (2) is partly addressed, where we review structural perturbations, and present a methodology that analyzes the perturbation on Boolean functions. Since the control techniques are out of the scope of this paper, interested readers can find more information in the review articles [16,17].

Quantitative Measures of the Structure
In gene regulatory networks, the interactions among genes are reflected by two facts: the connections among genes, and the Boolean functions defined upon the connection. No matter it is the robustness or the controllability issue we are interested in, it all boils down to one central question: how a change in the expression level of one gene leads to changes in other genes in the network and vice versa. Here, we introduce three measures of the structural properties that are related to the question: canalization, influence and sensitivity.
When a gene is regulated by several parent genes through function f , some parent genes can be more important in determining its value than others. An extreme case is canalizing function, in which one variable (canalizing variable) can determine the function output regardless of other variables.
In gene regulatory networks, canalizing variables are also referred to as the master genes. Canalization is commonly observed in real organisms, and it plays an important role in the stability of genetic regulation, as discussed in [19,20]. Mathematically, researchers have shown that canalization is associated with the stability of Boolean networks. For more theoretical work, see [21][22][23].
Other than canalization, the degree of gene-gene interaction can be described in more general terms, and we define two quantitative measures, influence and sensitivity, as follows.

Consider a Boolean function f with input variables
Note that the partial derivative of f with respect to i x is |, " (35) in which ) , , ,1 , In a BN, since each node i x has one regulatory function . In a PBN, let the set of regulating Related to influence, we define the sensitivity of a function, Then the average sensitivity of f with respect to distribution D is The meaning of average sensitivity is that, on average, how much the function f changes between the Hamming distance one neighbors (i.e., the input vectors differ by one bit). For PBNs, the average sensitivity of gene i x is (cf. Eq. (37)) Biologically, the influence of a gene indicates its overall impact on other genes. A gene with high influence has the potential to regulate the system dynamics and its perturbation has significant downstream effect. The sensitivity of a gene measures its stability or autonomy. Low sensitivity means that other genes have little effect on it, and the "house-keeping" genes usually have this property [2]. It is shown that such quantitative measures (or variants) can help guide the control of genetic networks [24] and aid in the steady-state analysis [25].

Structural Perturbation Analysis
There are two types of perturbation on Boolean models: perturbation on network states and perturbation on network structure. The former refers to a sudden (forced or spontaneous) change in the current state from x to x' , which causes the system dynamics to be disturbed temporarily. Such disturbance is transient in nature, because the network nodes and connections are intact, and the underlying gene regulation principles do not change. Therefore, the network attractors and the basins of attraction remain the same. However, if the perturbed Boolean model has multiple attractors, state perturbations may cause convergence to a different attractor than the original one, and may change the steady-state distribution of the network. This type of perturbation has been studied extensively (e.g. [26]), and finds its use in network control (e.g. [6]).
Perturbation on network structure refers to any change in the "wiring" or functions of the network. For instance, we may remove or add a gene to the network, change connections among genes, change the Boolean functions, or even change the synchronous Boolean network to an asynchronous model (where not all the genes are updated at the same time). Structural perturbation is more complex and less studied, compared to state perturbation. When network structure is perturbed, the network attractors and basins of attraction will be impacted, therefore the long-term consequence is more difficult to gauge than that of state perturbation.
The reasons for studying structural perturbation are: (1) modeling of gene regulatory networks is subject to uncertainty, and it is desirable to study the effect of small difference in network models on the network dynamic behavior; (2) it is likely that gene regulations, like other biological functions, have intrinsic stochasticity, and it is of interest to predict the consequence of any perturbation in regulation; (3) changing the network structure can alter the network steady-state distribution, thus structural perturbation can be an alternative way (with respect to state perturbation) of network control [25,27,28].
In [8], the authors developed theories to predict the impact of function perturbations on network dynamics and attractors, and main results are presented below. For more applications, see [28]. For further analysis in terms of steady-state distribution and application in network intervention, see [25].

Problem formulation. Given a Boolean network
. If we flip the output on row j , then we call it a one-bit function perturbation on i f , and denote it Any state transition s ! w contains n mappings, . We define ) , , , ( = ) ( In , which is a sub-vector of s that corresponds to the regulators of i x .
The following proposition and corollaries state the basic effects of one-bit function perturbation on the state transitions and attractors. Proofs and extensions to two-bit perturbations can be found in [28].

Proposition 1 A state transition s ! w is affected by one-bit perturbation
We use the following toy example to demonstrate the above results. From these results, more applications can be derived, such as controlling the network steady-state distribution through function perturbation, or identifying functional perturbation by observing phenotype changes [28].

Example 1 Consider a BN with
where the truth table of 3 f is shown below and the state transition diagram is shown in Fig. (2). states 100 and 101 no longer transit to 001 and 101 but to 000 and 100 respectively. Because of that, attractor cycle {001, 100} will be affected. Moreover, Corollary 2 predicts that the singleton attractor 000 is robust to the perturbation while 101 is not. The predictions are confirmed by the new state transition diagram shown in Fig. (3).  Finally, the author would like to remind the readers that other works on (various types of) structural perturbation are available. For instance, in [29], the authors added a redundant node to Boolean network, such that the bolstered network is more resistent to a one-bit function perturbation (as defined above). In [30], the effect of asynchronous update of a drosophila segment polarity network model is examined in terms of the phenotypes (steady-states). In [25], the authors derived analytical results of how function perturbations affects network steady-state distributions and applied them to structural intervention. In [31], the author modeled gene knockdown and broken regulatory pathway in Boolean networks, and analyzed the effects.

SIMULATION ISSUES WITH BOOLEAN MODELS
Recall from Section 2 that a Boolean model of n genes has a finite state space, and a BN has deterministic dynamic behavior which can be fully captured by the state transition diagram. A PBN is probabilistic in nature, therefore its state transition is also probabilistic. For both BNs and PBNs, attractors are characteristic of their long-term behavior. Given the above knowledge, if we would like to know anything about a Boolean model, we should find out its state transition diagram and attractors first. This is to be discussed in Section 4.1.
For Boolean models, the most commonly encountered simulation issues include: (1) how to generate the time sequence data of a network, (2) how to find the network steady-state distribution if it exists; and (3) how to produce artificial Boolean models with prescribed attractors to facilitate other studies. Among them, (1) is a basic practice that can be utilized in (2) and (3), and we will deal with them in Sections 4.2, 4.3 and 4.4 respectively. Note that the techniques in 4.1 is crucial to all the three issues.

Generating State Transition Diagram and Finding Attractors
To obtain the state transition diagram of a BN, we first compile a state transition  Fig. (4). State transition diagram of a Boolean network. transitions shown in Table 2, and their selection probabilities are 1 c and ! respectively, then when 0 = p , the PBN's state transition diagram is shown in Fig. (5). When 0 > p , a state transition can either be driven by some network function or by random gene perturbations, and we may refer to its state transition matrix T when constructing the state transition diagram. It should be noted that the sum of probabilities of all the edges exiting a vertex should always be 1.
The following is a simple algorithm for finding the attractors of BN based on the state transition table (using integer representation of the states).
One other issue of simulating a PBN is the choice of parameters p and q . As stated in Section 2, network switching probability q does not affect the probability of being at any constituent BN, and in theory we can choose any value for q ; however, we prefer to choose small q because in a biological system, switching network corresponds to the change of context (reflecting a change of regulatory paradigm, either caused by environment change or internal signals), which should not occur very often.
Moreover, if q is large, or even 1 = q , then network switching is frequent, and a short time sequence of data ) are more likely to come from several BNs instead of from one single BN. This may pose a difficulty if we try to identify the underlying PBN and its component BNs from the sequence data [32]. On the other hand, if q is too small, and the number of BNs in the PBN is large, it will take too long a time to obtain the steady-state distribution by simulation method. Usually p should be small to reflect the rarity of random gene perturbation, and we let q p << . Also, small p is helpful if the generated sequence data will be used as artificial time-series data for the identification the underlying PBN and its component BNs. However, if p is too small, it will take longer to obtain the steady-state distribution. Usually, we can choose q = 0.01 ! 0.2 and p = 0.01% ! 0.5% .

Power Method
As discussed in Section 3.1, a PBN possesses a steadystate distribution when 0 > , p q [3]. By definition, this distribution ! " is the solution to linear equations , is unique and can be estimated by iteration, given the transition probability matrix T (assuming n genes, and n N 2 = ).
) and T are defined as in Eqs. (7) and (8). If Tˆ is used in place of T , and the solution for T= ! ! is ! "ˆ, the expected relative error in steady-state distribution is shown to be bounded by The following is an alternative method of obtaining steady-state distribution. If we are interested in the attractors only, knowing that the majority of the steady-state probability mass is on attractors if p is small, we may apply the Markov chain based analytic method in Section 3.2.

Monte Carlo Simulation Method [34]
This method requires generation of a long time sequence of data, ) ( , (1), (0), T x x x ! , such that the frequencies of all the possible n 2 states approach the steady-state distribution.
In a given n -gene PBN with gene perturbation rate p , the smaller p is and the larger n is, the longer it takes to converge to the steady-state distribution. In general, we need to simulate at least To estimate when the PBN has converged to its steadystate distribution, we can use the Kolmogorov-Smirnov test. The basic idea of Kolmogorov-Smirnov test is to measure the closeness of an empirical probability distribution to the theoretical distribution. Since the latter (steady-state distribution) is unknown in this case, we will test the closeness of two empirical distributions.
To get two quasi-i.i.d (independently and identically distributed) samples in PBN, we select two , and the output equals 0 otherwise.

Generating Artificial BNs with Prescribed Attractors [35]
In a simulation study of Boolean models, it is often necessary to create artificial networks with certain properties. Of special interest is the problem of generating artificial BNs with a given set of attractors, since attractors are hypothesized to correspond to cellular phenotypes and play an important role in the long term behavior of Boolean models.
First, note that the state transition diagram can be partitioned into level sets, where level set j l consists of all states that transit to one of the attractors in exactly j steps, and the attractors belong to the level set 0 l .
Problem formulation [35] Given a set of n nodes } , , , a set A of d states (binary vectors of n bits), and integers l, L satisfying 0 < l < L , we will construct a BN defined on V , which satisfies the following constraints: the set of regulators of node Specifically, if we are interested in constructing a BN with only singleton attractors, its state transition diagram will be a d -forest (containing d single-rooted trees) if the BN has d singleton attractors. The following theorem gives the number of all possible state transition diagrams that only contain singleton attractors (the proof can be found in [35]).

Theorem 4
The cardinality of the collection of all forests on N vertices is Assuming only singleton attractors are allowed, the following algorithm is for solving the search problem formulated above. A second algorithm is also given in [35], but shown to be less efficient.
Algorithm 3 (Generating artificial Boolean network) 1. Randomly generate or give in advance a set A of d states (as singleton attractors).
2. Randomly generate a predictor set P , where each i P has k to K nodes. If Step 2 has been repeated more than a prespecified number of times, go back to Step 1.
3. Check if the attractor set A is compatible with P , i.e. only the attractors (each transits to itself) of the state transition diagram are checked for compatibility against P .
If not compatible, go back to Step 2.
4. Fill in the entries of the truth tables that correspond to the attractors generated in Step 1. Using the predictor set P and randomly fill in the remaining entries of the truth table. If Step 4 has been repeated more than a pre-specified number of times go back to Step 2.
5. Search for cycles of any length in the state transition diagram D based on the truth table generated in Step 4. If a cycle is found go back to Step 4, otherwise continue to Step 6.
6. If D has less than l or more than L level sets go back to Step 4.
7. Save the generated BN and terminate the algorithm.

CLOSING WORDS
This paper has presented the following analysis and simulation issues of Boolean networks and probabilistic Boolean networks, which are models for gene regulatory networks. • Analysis. An important aspect of Boolean models is that they can be viewed as homogeneous Markov chains; for a PBN, when the network switching probability q > 0 and gene perturbation rate p > 0 , it possesses a steady-state distribution. Markov analysis serves as a basis for finding the steady-state probabilities of attractors and for proving the equivalence of PBN and dynamic Bayesian networks. Finally, a structural analysis is provided, where quantitative measures of gene-to-gene relationships are introduced, and the effect of perturbation on Boolean functions are analyzed. • Simulation. Central to the simulation of Boolean models is the use of state transition diagram and transition probability matrix. In network simulation, different methods are presented and simple guidelines of parameter selection are provided. To test the convergence of a simulated PBN to its steady-state distribution, we can employ Kolmogorov-Smirnov statistic. Lastly, an algorithm for generating artificial BNs with prescribed attractors is presented.
To find more references on Boolean models, and obtain a MATLAB toolbox for BN/PBN, readers can go to the following website, http://personal.systemsbiology.net/ilya/ PBN/PBN.htm. Another online source of papers is http://gsp.tamu.edu/Publications/journal-publications.

ACKNOWLEDGEMENT
The author thanks Yuping Xiao for giving valuable critique on the initial manuscript. The anonymous reviewers' insightful comments have helped the author's revision considerably.