Logical Reduction of Biological Networks to Their Most Determinative Components

Boolean networks have been widely used as models for gene regulatory networks, signal transduction networks, or neural networks, among many others. One of the main difficulties in analyzing the dynamics of a Boolean network and its sensitivity to perturbations or mutations is the fact that it grows exponentially with the number of nodes. Therefore, various approaches for simplifying the computations and reducing the network to a subset of relevant nodes have been proposed in the past few years. We consider a recently introduced method for reducing a Boolean network to its most determinative nodes that yield the highest information gain. The determinative power of a node is obtained by a summation of all mutual information quantities over all nodes having the chosen node as a common input, thus representing a measure of information gain obtained by the knowledge of the node under consideration. The determinative power of nodes has been considered in the literature under the assumption that the inputs are independent in which case one can use the Bahadur orthonormal basis. In this article, we relax that assumption and use a standard orthonormal basis instead. We use techniques of Hilbert space operators and harmonic analysis to generate formulas for the sensitivity to perturbations of nodes, quantified by the notions of influence, average sensitivity, and strength. Since we work on finite-dimensional spaces, our formulas and estimates can be and are formulated in plain matrix algebra terminology. We analyze the determinative power of nodes for a Boolean model of a signal transduction network of a generic fibroblast cell. We also show the similarities and differences induced by the alternative complete orthonormal basis used. Among the similarities, we mention the fact that the knowledge of the states of the most determinative nodes reduces the entropy or uncertainty of the overall network significantly. In a special case, we obtain a stronger result than in previous works, showing that a large information gain from a set of input nodes generates increased sensitivity to perturbations of those inputs. Electronic supplementary material The online version of this article (doi:10.1007/s11538-016-0193-x) contains supplementary material, which is available to authorized users.

signal transduction network of a generic fibroblast cell. We also show the similarities and differences induced by the alternative complete orthonormal basis used. Among the similarities, we mention the fact that the knowledge of the states of the most determinative nodes reduces the entropy or uncertainty of the overall network significantly. In a special case, we obtain a stronger result than in previous works, showing that a large information gain from a set of input nodes generates increased sensitivity to perturbations of those inputs.
Keywords Boolean networks · Biological information theory · Mutual information · Sensitivity · Network reduction · Linear operators · Numerical simulations

Introduction
The past few decades have generated a large influx of data and information regarding a variety of real or artificial networks. The necessity to understand and use the data in a meaningful way has lead to various modeling approaches. In particular, Boolean network (BN) models introduced by Kauffman (1969) have acquired a significant importance in modeling networks where the node activity can be described by two states, 1 and 0, "ON and OFF," "active and nonactive," and where each node is updated based on logical relationships with other nodes. These models incorporate Boolean functions that are relevant to particular types of applications, such as signal transduction in cells (e.g., Helikar et al. 2008), genetic regulatory networks or other biological networks (e.g., Kauffman 1993;Shmulevich and Kauffman 2004;Shmulevich et al. 2002;Klemm and Bornholdt 2000;Albert and Othmer 2003), or neural networks (e.g., Huepe and Aldana 2002).
However, a simplification of the reality to binary states of the nodes does not ease the difficulty of studying large, complex networks for which the existing data may offer only partial information on the real interactions in the network and for which the dynamics are hard to study even under a deterministic approach. As a matter of fact, even smaller networks of only a few hundred nodes or less can pose serious difficulties in assessing the dynamics, given the exponential dependence of the state space on the number of nodes. Consequently, a number of approaches aiming at simplifying the computational difficulty of analyzing the dynamics have been proposed in recent years. For example, Goles et al. (2013) reduce the network and the number of updates needed to analyze the dynamics by generating sub-configurations of states that remain fixed regardless of the values of the other nodes and by identifying sets of updating schedules which have the same dynamics. They show that such networks are minimal in the sense that no edge can be deleted because every one of them represents a real interaction in the respective network. Various methods for reducing the network to a fairly small subset of nodes that are relevant for the long-term dynamics have been proposed. The definitions of "relevant" and "irrelevant" nodes differ depending on the actual approach. Some methods are related to eliminating stable nodes that end up in an attractor after a transient period and thus considered irrelevant. This may be paired with removing leaf nodes that do not contribute to the evolution of any other node, that is, with zero out-degree (outputs) like in Bilke and Sjunnesson (2001) or Richardson (2004), or with merging or collapsing mediator nodes with one in-degree (input) and one out-degree in Saadatpour et al. (2013). Other methods are based on eliminating irrelevant nodes that are frozen at the same value on every attractor together with nodes whose outputs go only to irrelevant nodes in Socolar and Kauffman (2003), Kaufman et al. (2005), or Kaufman and Drossel (2006). The basis of these methods is to reduce the "structure" of the network using some rules on the Boolean functions and then prove that such a reduction simplifies the identification of attractors. As expected, they may carry an intrinsic numerical burden. Furthermore, alternative methods for eliminating the need for complete enumeration of the states have been considered. For example, Devloo et al. (2003) propose another formalism which permits calculation of the steady states as solutions of a system of steady-state equations, via an image function which identifies a state by its image under a certain function. The authors use constraint programming to solve the derived system of equations in an efficient way from a computational point of view.
In addition to the previously mentioned methods for network reduction, the entropy of the relevant components of the network which are comprised of relevant nodes that eventually influence each other's state is used as a measure of uncertainty of the future behavior of a random state of the network by Krawitz and Shmulevich (2007a, b). The entropy is a measure of uncertainty that has been used also by Ribeiro et al. (2008) to find the average mutual information of a random Boolean model of regulatory network as a way to quantify the efficiency of information propagation through the entire network. In this context, one needs to consider pairs of connected nodes and the intrinsic Boolean functions that govern the node updates, as opposed to evolving the networks in order to identify the attractors. Further research by some of the authors of Ribeiro et al. (2008), in particular Lloyd-Price et al. (2012), uses mutual information to test for a relationship between the robustness to perturbations of an attractor in a random BN and the amount of information propagated within the network when in that attractor. They found that there is a trade-off between robustness and information propagation and that at the edge of chaos, robustness is not correlated with information propagation.
On the other hand, the notions of entropy and mutual information have been long used as measures of complexity of dynamical systems, such as BNs, as described, for example, by Feldman and Crutchfield (1998) or by Sole and Luque (1997). Luque and Ferrera are concerned with the mutual information contained in random BNs and its behavior as the networks undergo their order-disorder phase transition, showing that the mutual information stored in the network has a maximum at the transition point.
Only recently the mutual information has been used as a method for identifying the most powerful and therefore relevant nodes in a BN, thus offering an efficient alternative approach to network reduction to a relevant subset of nodes Heckel et al. (2013). The mutual information, as a basic concept in information theory, allows one to represent the reduction in the uncertainty or entropy of the state of a node due to the knowledge of any of its inputs. A summation of all mutual information quantities over all nodes having a common input can be viewed as the determinative power of that input node. The more powerful the node, the more the information gain provided by the knowledge of its state. In Heckel et al. (2013), the authors use harmonic analysis to compare the determinative power of a set of inputs to the sensitivity to perturbations to those inputs showing that an input with large sensitivity need not have a large determinative power. On the other hand, large information gain from a set of inputs generates large sensitivity to perturbations of those inputs. Moreover, by considering the feedforward regulatory network of E. coli, it is shown that the knowledge of the states of the most determinative nodes reduces the uncertainty of the overall network significantly. Thus, one could focus on the dynamics of the reduced network of the nodes with the most determinative power.
In Heckel et al. (2013), the mutual information formula is obtained in terms of Fourier coefficients expressed in the Bahadur basis which assumes independence of the inputs of a Boolean function. In a subsequent paper by Klotz et al. (2014), it is shown that canalizing Boolean functions maximize the mutual information under the same assumption as in Heckel et al. (2013). This assumption is strong, since in a BN, there are correlations between inputs that build up as the network evolves in time. Our goal is to relax this assumption and allow dependence of inputs, while exploring the impact of a (necessarily) different basis on the results regarding the mutual information and the sensitivity to perturbations. We notice that some results still hold; however, not all are independent of the basis. At the same time we are interested to see the impact of our approach on the network reduction based on most determinative nodes of a specific biological network. In particular, we use a Boolean model of the signal transduction network of a generic fibroblast cell and we obtain results similar to Heckel et al. (2013).
In Sect. 2, we provide the basic definitions, the mathematical setup, and we use elements of operator theory to generate formulas for finding the sensitivity to perturbations of the nodes of the network, quantified by the concepts of influence, average sensitivity, and strength (to be defined in that section). We also discuss the computational aspects of using those formulas in applications. In Sect. 3, we provide formulas and estimates for mutual information, determinative power, and strength, paired with simulations, and estimates that link the mutual information and the sensitivity to perturbations. We also consider a special case that allows us to compare our analytical results to those in Heckel et al. (2013). Conclusions and further directions of research are in Sect. 4.

Influence, Sensitivity, and Strength
In this section, we provide analytical formulas for the sensitivity to perturbations using a complete orthonormal basis that does not assume independence of the Boolean inputs. We pair this with some computational aspects regarding the application of the formulas to an actual biological network.

Analytical Approach
Let n = {0, 1} n and the random vector X valued in n . If P denotes the probability measure on the domain of definition of X , then denote P X −1 the (cumulative) distribution of X . A BN is modeled as the set [n] := {1, 2, . . . , n} of n nodes, each node being ON (that is in state 1) or OFF (that is in state 0). Then any ω ∈ n is a possible state of the network. Each node i ∈ [n] has an associated Boolean function f i : n → that governs the dynamics of the node. We are usually interested in how the network evolves by iterating the map F = ( f 1 , f 2 , . . . , f n ) a large number of times. Although the measures used in this paper are discrete, we use notation typical for measure theoretical arguments to write shorter and more elegant proofs.
Given a node i with Boolean function f i : n → , the influence of the jth input on f i has been formulated in various ways in the literature. Following the authors of Shmulevich and Kauffman (2004), Kahn et al. (1988), and Ghanbarnejad and Klemm (2012), we recall the following:

Definition 1
The influence, I j ( f i ), of variable x j on the Boolean function f i , or equivalently, the activity of node j on node i, is defined as follows: where X ⊕ e j is the random vector obtained by flipping the jth slot of X from 1 to 0 or viceversa. The average sensitivity, avs( f i ), of f i is the sum of its incoming activities (2) The strength, σ ( f i ), of f i is the sum of the outgoing activities Alternatively, I j ( f i ) can be regarded as the average of the Boolean partial derivative ∂ ( j) f i (X ) = δ f i (X ), f i (X ⊕e j ) with respect to the probability measure P as specified in Ghanbarnejad and Klemm (2012). Here δ is Kronecker's delta function which is equal to one if the two variables are equal and zero otherwise. The definition is originally introduced in the context of assuming the state of the BN as a random vector X = (X 1 , . . . , X n ) with independently distributed coordinates, but this property plays no role in it. However, the aforementioned property plays an essential role in papers like Heckel et al. (2013), Klotz et al. (2014), or Kahn et al. (1988) where it is essential in obtaining formulas for I j ( f i ) (and other related quantities relevant in the study of BNs) in terms of the Fourier coefficients of f i [see, e.g., (Kahn et al. 1988, Lemma 4.1)]. The Hilbert space where those formulas are obtained is L 2 ( n , d P X −1 ). The complete orthonormal basis used in Kahn et al. (1988) is the, so-called, Bahadur basis (see Heckel et al. 2013). In order for that family of functions to form a complete orthonormal basis of L 2 ( n , d P X −1 ) it is necessary that X 1 , . . . , X n be independently distributed. However, the nodes of a BN may mutually influence each other, so independence is a restrictive assumption.
A composition operator is an operator on a linear space S of functions defined on a set E. For any fixed self-map ϕ of E, the operator is called the composition operator with symbol ϕ or induced by ϕ. It is necessarily linear. We will use such operators on S = L 2 ( n , d P X −1 ). Let ϕ j be the jth slot flip map. This means that, for all ω ∈ n , ϕ j (ω) is the Boolean vector in n obtained by flipping the jth coordinate of ω. Observe that ϕ j are obviously self-inverse and hence so are the composition operators they induce, a fact which will be used without further comments throughout this paper. In the following, , denotes the usual inner product of L 2 ( n , d P X −1 ) and the norm induced by that inner product. Also, T * is the notation used for the adjoint of any operator T , whereas I denotes the identity operator. With these notations we prove: Proposition 1 For all Boolean functions f i , the following formulas hold: Proof Using a well-known change in measure formula, for any Boolean function f (we drop the index for simplicity of notation) one can write Proposition 2 For all j = 1, 2, . . . , n, let j denote the largest eigenvalue of T j = (I − C ϕ j ) * (I − C ϕ j ), respectively, whereas is the largest eigenvalue of T . The following estimates hold: Hence: where F = n j=1 f j .
Proof Indeed, both the operators T j and T are nonnegative operators, and hence, their numerical range is equal to the line interval with endpoints the least, respectively, the largest eigenvalue. On the other hand, formulas (4) and (5) show that I j ( f )/ f 2 and avs( f )/ f 2 belong to the numerical range of the operator T j , respectively, T .
Combining all that with the fact that f , being a Boolean function, satisfies condition E[ f (X )] = f 2 , proves (7) and (8).
The space L 2 ( n , d P X −1 ) has a simple and natural complete orthonormal basis, We assume all states are possible, so p ω > 0. Checking the fact that B is a complete orthonormal basis of L 2 ( n , d P X −1 ), whether X 1 , . . . , X n are independently distributed or not is left to the reader.
Proposition 3 For all j = 1, . . . , n, the operator T j = (I − C ϕ j ) * (I − C ϕ j ) has a matrix with respect to B whose entries are Hence, the entries in the matrix of the operator T are: Proof Given ω, η ∈ n , the entry a ω,η (T j ) in the matrix of ( Note that and hence Equalities (12) and (13) combine into establishing by a straightforward computation Eq. (10). Therefore, one can state the following: Corollary 1 Given a Boolean function f , the following practical formulas hold: and Hence: Indeed, (14) and (15) are immediate consequences of the matricial description of operators T j and T combined with the following computation of the Fourier coefficients c ω , ω ∈ n of f relative to B: Example 1 In this example, we show that formula (14) agrees with the definition of I j ( f ). To this goal, assume a product distribution, which is the basic assumption of Heckel et al. (2013), such that every state of the network is equally likely. Thus, the probability of any state is 1/2 n . Then where supp f is the support f −1 (1) of the function f . Let f be the Boolean function with support supp f = {(0, 1, 1), (1, 1, 1)}. Then obviously a flip of x 1 does not generate a flip in the output, so I 1 ( f ) = 0. Similarly, a flip in x 2 generates a flip of the output only for (x 1 , x 2 , x 3 ) = (0,1,1), (0,0,1), (1,0,1) or (1,1,1), so, by definition, I 2 ( f ) = 4/2 3 = 1/2. By symmetry, I 3 ( f ) = 1/2. On the other hand, and similarly so one can see that formula (14) agrees with the definition of I j ( f ).
In order to compute the influence of each node in the network on all its output nodes (those to which the node under consideration is an input), we generate MATLAB codes that involve nested "for loops." Note that, if we denote by k i the actual number of inputs to node i (its connectivity), for formula (14) there are i × k i × 2 2k i such loops. This exponential number of loops can easily make the computations prohibitive. As a matter of fact, in the actual simulations, even a single connectivity of at least 12 nodes turned out to be excessive for the capabilities of MATLAB. Thus, one has to rely either on easier estimates, such as the upper bounds of Proposition 2, or find alternative exact formulas. However, a quick analysis indicates that to compute the upper bounds one would still need exponentially many "for loops," since the procedure would require again the construction of the matrices T j and T . For this reason, in the sequel we use the following equivalent formula which follows from the definition.
Recall that, for any Boolean function f, supp f denotes the support of f , that is, supp f = f −1 (1).

Remark 1 The following formula holds
where the sum above is considered Hence: be the Boolean update function of node i. Then Observe that: (18) is computationally efficient because it identifies the influential nodes, namely those in the support of the given Boolean function which get mapped outside of support by that function. In particular, the variable x j has null influence on the Boolean function f , that is This agrees perfectly with the definition (1) of the influence as the probability of a change in the output of f when its jth input is flipped.
To understand the computational efficiency of formula (18), or equivalently the definition of the influence, we run MATLAB codes for both formula (14) and (18) on the same network, and compare the processing times. The results are included in Supplementary Material, Section 1. We present one more example.
Example 2 We call a BN: "Dominant states network (DSN)" if the update Boolean functions of nodes are characteristic functions of distinct states (called dominant states), that is if there is a set of n states S = {ω 1 , . . . , ω n }, so that f 1 = χ ω 1 , . . . , f n = χ ω n .
Our previous considerations and formulas show by straightforward computations that, in the case of a DSN, one has that: and

Determinative Power and Strength
In this section, we are comparing the impact of node strength to the so-called determinative power of nodes defined and explored in Heckel et al. (2013) under the assumption of a BN with product distribution of states. We recall the main definitions and concepts from Heckel et al. (2013) and Cover and Thomas (2006). These include the notion of entropy of random variables, which is a measure of uncertainty, and the mutual information which is a measure of dependence between two random variables and is defined in terms of the entropy.
Definition 2 Let X and Y be discrete random variables. The entropy of X is defined as which in binary reduces to the function The conditional entropy of Y conditional on the knowledge of X is The mutual information (MI) is the reduction in uncertainty of the random variable Y due to the knowledge of X . That is In principle, the mutual information is a measure of the "gain of information," or the determinative power (DP) of X over Y . The authors of Heckel et al. (2013) use the MI to construct the DP of a node j over the states of a BN, namely which represents a summation of all "information gains" obtained from node j over its outputs (i.e., nodes that have j as an input). Besides providing a number of related mathematical results to which we will refer below, the authors identify the nodes with the largest determinative power in a feedforward E. coli network, assuming a product distribution of the input states. The goal is to be able to reduce the network to a smaller sub-network whose knowledge can provide sufficient information about the entire network; in other words the entropy of the network conditional on the knowledge of this sub-network is small enough. The authors show that by considering the nodes with most DP one can reduce the network to less than a half of its original size, so that for larger sub-networks the entropy does not improve significantly once an approximate (threshold) network size is reached. As specified in Introduction, network reduction is an important topic in the literature, since many real networks have sizes that lead to prohibitive computations and manipulations as we can see in Supplementary Material with the computation of the influence. For instance, signal transduction networks such as that of a generic fibroblast cell, which we will use as an example, can have thousands of nodes that are interconnected. The analysis and dynamical study of such networks becomes prohibitive due to the computational burden despite the advances in technology and data science. Thus, finding meaningful ways to reduce the network to a significant "core" or "relevant component" has been of interest for a number of authors, and a number of procedures have been proposed. Ultimately, all of them generate a clear trade-off between accuracy and computational burden.
We are interested in comparing the effect of network reduction applied to the fibroblast network, by considering the nodes with largest DP on the one hand, and the nodes with largest strength values on the other hand. Before we compare them, let us focus on some theoretical results that supplement some of the formulas in Heckel et al. (2013) for the less restrictive case we consider in this paper, that is, not requiring a product distribution of the input values.
If X is the state of the network with values in n , let X A taking values in |A| be the collection of states of the nodes in set A ⊆ [n]. So X can be written as ( Theorem 1 The following formula for conditional entropy holds where E A refers to expected value with respect to the marginal distribution of X A .
Proof By the definition of the conditional entropy which in our binary case reduces to and the obvious equality But which is a number in [0, 1] and we can substitute it in (23) to get formula (22).
Formula (22) is exactly the analog of Theorem 1 of Heckel et al. (2013) where it is written for a system with states −1 and 1 as opposed to 0 and 1.

Proposition 4 The mutual information formula M I ( f (X ); X A ) can be written as
Proof The formula for MI follows again directly from the definition of the mutual information with the argument of h being in [0, 1] as needed. Substituting this and formula (22) in the definition of the mutual information we obtain formula (24). Now we focus on two special (extreme) cases considered in Heckel et al. (2013), to see if we can identify an analog of the results in that paper, where the authors use the additional assumption of independence paired with the Bahadur basis (a family of functions which form a complete orthonormal basis of L 2 ( n , d P X −1 ) provided that X 1 , X 2 , . . . , X n are independent).
Special cases: Then A c = ∅ and = P( f (X ) = 1) = 1/2. Hence, we are dealing with a nonbiased function. So the closer the function f is to a nonbiased function, the larger the MI between its output and all of its inputs. This is similar to what was observed in Heckel et al. (2013).
The mutual information can be written as In comparison with Heckel et al. (2013), this formula does not allow for a simplification to a small subset of Fourier coefficients in order to find the MI. In their formula, the authors of Heckel et al. (2013) identify three coefficients that act as independent variables. Based on them they manage to obtain some information on the behavior of the MI, which is not the case for our approach. The reason for considering this special case is that the quantity MI( f (X ); X i ) is also known as information gain, or informativeness, and is common in information theory. Thus, we note that a change in the underlying basis can induce different results and situations.
Now that we have a deeper understanding of the MI as used in formula (21), let us turn to the network under consideration, namely the signal transduction network of a generic fibroblast cell which consists of several main signaling pathways, including the receptor tyrosine kinase, the G-protein coupled receptor, and the integrin signaling pathway. A Boolean representation of this network has been provided in Helikar et al. (2008), and has been studied further in Kochi and Matache (2012) and Kochi et al. (2014). The fully annotated signal transduction model is freely available for simulations and/or download via the Cell Collective software from www.thecellcollective. org Helikar et al. (2012) and Helikar et al. (2013). Each node in the model represents a signaling molecule (mainly protein). The network has 130 nodes with connectivities that vary between 1 and 14 nodes.
Using formula (18), one can quickly determine the average sensitivity and the strength of all nodes in the fibroblast network. In Fig. 1, we plot them against the nodes (in alphabetic order), together with two more plots on the connectivity and the bias, i.e., P( f (X ) = 1), for an overall view of the main numerical characteristics of the fibroblast nodes.
We compute the DP of the nodes in the fibroblast network and compare them with the node strength σ . The results are shown in Fig. 2. For the network under consideration, the strength values seem to be slightly larger than the DP values. We have conducted a statistical analysis related to DP and σ values for the fibroblast network. In summary, there is enough statistical evidence that the average DP-σ is negative with a p value of basically zero. The paired test gives an upper bound of −0.14208 for a 95 % confidence interval for the difference DP−σ . On the other hand, a linear regression analysis indicates a fairly strong linear relationship between the two variables with a 75.1 % coefficient of determination (COD), and a higher COD of 82.4 % for the linear relationship between the average σ and the number of outlinks corresponding to the nodes. The average values are computed over all nodes with a given number of outlinks. This relationship is weaker for average DP versus number of outlinks with a COD of 60.3 %. We also note that the outliers occur mostly for nodes with a larger number of outlinks. In other words, fewer outlinks generate a stronger correlation between the DP or σ and the number of outlinks. For example, there is one particular node in the network, namely EGFR, that generates the maximum DP and is the only node with 13 outlinks. If we eliminate this node from the correlation analysis, the COD for average DP versus outlinks increases from 60.3 to 81.3 %. Notably, mutations of the EGFR, epidermal growth factor receptor, are known to be related to lung cancer, interfering with the signaling pathways within the cell triggered to promote cell growth and division (proliferation) and cell survival. The second node in the order of DP is ASK1, apoptosis signal-regulating kinase 1, and plays important roles in many stress-related diseases, including cancer, diabetes, cardiovascular, and neurodegenerative diseases. The third node is Src, proto-oncogene tyrosine-protein kinase, which is involved in the control of many functions, including cell adhesion, growth, movement, and differentiation. The fourth node is PIP3_345, phosphatidylinositol (3,4,5)-trisphosphate that functions to activate downstream signaling components, while the fifth node is PKC, protein kinase C, involved in receptor desensitization, in modulating membrane structure events, in regulating transcription, in mediating immune responses, in regulating cell growth, and in learning and memory. The DP procedure managed to capture the importance of these nodes in relationship to the rest of the network. Four of the top five DP nodes are also among the five strongest nodes which are: Src, PIP3_345, PKC, PIP2_45, and EGFR. Thus, the strength also captures biologically important nodes. Moreover, higher DP and strength values are correlated with a larger number of outlinks as seen from the figures, which means that this procedure can identify hubs in the network. It is also apparent from the figures that the COD increases when considering smaller DP and σ values. We have included relevant figures in Supplementary Material, Sect. 2 We would also like to point out at this time that the MI has been used as way to identify relevant pairs of genes in genetic expression data sets by Kohane (2000, 2003), and Jiang et al. (2009). Those authors identify relevance networks by selecting pairs of genes whose MI surpasses a given threshold. For example, in Butte  (2000) it is shown that the relevance networks are of the following types: those that link identical genes, those linking genes with similar functions, those linking genes in the same biological pathway, and combinations of these. In our work, the MI is directed from the node toward its outputs via the outlinks, so we do not involve the bidirectional aspect of the previous studies mentioned here. Moreover, we do not use genetic expression data sets, but actual Boolean functions. Nevertheless, it will be of interest to explore in the future the main types or classes of nodes/sub-networks identified by the DP procedure in a variety of cellular networks. Next, let us compute the network entropy generated by sub-networks chosen based on top DP or strength values of nodes. If we denote by A l the collection of the top l nodes in order of DP or σ , then we compute for l = 1, 2, 3, . . . , n.
We plot the values of the larger quantity in (25) against l and obtain the graph in Fig. 3, where we note that the entropy decreases with increased sub-network size l, and that the two curves are very close, with a slightly better result for σ which provides lower entropy values for most values of l, as seen from the bottom panel where we plot also the differences of entropy values by l. As the sub-network size increases, the strength provides a somewhat tighter upper bound for the entropy of the network. Note that the actual entropy will be smaller than the upper bound graphed in Fig. 3, and that sub-networks of sizes 60 or more (with approximation), do not yield a significant improvement of the entropy. Thus, it suffices to consider less than half of the original network to be able to predict the overall network behavior with fairly low uncertainty/entropy levels.
Knowing that the DP method allows a network reduction without a significant increase in entropy, one could use a tested approach for node elimination that preserves essential dynamical properties to reduce the network to the sub-network of the top DP nodes. For example, in Naldi et al. (2009b) the authors introduce a general method for eliminating nodes one by one by basically connecting directly the inputs of a removed node to its output nodes. Of course, one needs to either keep or consider some extra decisions on autoregulated nodes, that is nodes that are or may become self-inputs upon elimination of other nodes. One also needs to understand the impact of the order in which nodes are removed. In our case, the natural order is provided by the sorted DP values. In Naldi et al. (2009b), it is shown that with their approach the attractors and stable states are preserved. We note here that there are nodes whose removal may have no impact on the dynamics, like those with no outputs. Moreover, the authors of Naldi et al. (2009b) have developed a Java software Naldi et al. (2009a) that allows one to apply the reduction algorithm and analyze attractors. Alternative methods have been proposed by Veliz-Cuba and collaborators in Veliz-Cuba (2011); Veliz-Cuba et al. (2015) that could be used in conjunction with the DP method. Future work will explore those methods.
One could actually go one step further and provide the following estimates for the conditional entropy H ( f (X )|X A ), which are the analog of Theorem 2 of Heckel et al. (2013).

Theorem 2
The following estimates of the conditional entropy hold: where Proof We use the following inequality (found in Topsoe 2001 andused in Heckel et al. 2013) that provides lower and upper bounds on the binary entropy function h( p) For the upper bound, we use the fact that Then the double inequality (26) is immediate. Now, LB can be expressed as follows where the first term is obtained by the law of total probability. We are interested in identifying possible relationships or inequalities between the mutual information and the influence of a subset of nodes of the network. In Heckel et al. (2013), the authors show that for a given collection of nodes A ⊆ [n], the following holds where (x) = x (1/ ln 4) − x takes positive values that are very close to zero, and σ 2 i = V ar(X i ), under the assumption of independence of the random variables. Let us explore an alternative inequality using the results of this paper. For this purpose, we consider a special case that allows us to compare directly our results with inequality (28). In this special case, we consider a uniform distribution of inputs, which is actually a case of product distributions or independent inputs.

Special Case
Consider a network with n nodes, such that for each ω ∈ n , we have p ω = 1 2 n , so that we consider a uniform distribution of the inputs. Let |supp f | = K and A ⊆ [n]. Then Using formula (24), we can write the following for the MI, We immediately notice that 0 ≤ K ω A ≤ K for all ω A ∈ |A| , and that ω A ∈ |A| K ω A = K , so that we create a partition of supp f . Therefore, in the sum of (29), some of the terms could be zero, since not all ω A ∈ |A| need to be represented in supp f , leading to K ω A = 0, which in turn leads to h(0) = 0.
Let us focus on I A ( f ) = j∈A I j ( f ). By formula (18) we obtain and consequently Notice that 0 ≤ m j ≤ K and that m j is an even number since ϕ j is its own inverse. Observe that if in formula (28) of Heckel et al. (2013) we consider a uniform distribution of random variables, then all σ 2 i = 1, and thus the influence bounds above the MI minus a small positive quantity. We would like to check a similar inequality for our case. To this end, we first generate some simulations in which we plot both MI( f (X ); X A ) and I A ( f ) versus K for various values of |A| in Fig. 4, using formulas (29) and (30). Not only is I A ( f ) an upper estimate of M I ( f (X ); X A ), but also its values are significantly larger. So in our case, the inequality becomes  (29) and (30) versus K for a network with n = 8 nodes and |A| = 1, 2, . . . , n as specified in the subplots. The MI curve is very close to zero for most values of |A|, while the influence has much larger values in all cases. Recall also that MI is always a number in [0, 1]. The actual values increase with the increase in |A|. Notice also the expected symmetry as K crosses from values less than to values greater than 2 n /2 which is stronger than the corresponding inequality of Heckel et al. (2013). Besides, the quantity MI( f (X ); X A )− (Var( f (X ))) used in Heckel et al. (2013) takes on also negative values. For example, one can easily check that this is true for n = 8, K = 1, |A| = 3, 4, 5 under the assumptions of the uniform distribution (which leads to independence of random variables). Then the inequality becomes superfluous.
Observe that in this case, if A = { j}, and f i is the Boolean function associated with node i of the network, then the inequality (31) becomes MI( f i (X ); , or in other words DP( j) ≤ σ ( f j ). However, when looking at the relationship between the DP and strength values in Fig. 2, we observe that this inequality does not hold for all nodes of the fibroblast network. That is most likely due to the actual dependencies between the states of the nodes. Thus, dependent inputs may lead to different results. However, we notice in Fig. 2, top right graph, that the magnitudes of the positive differences DP−σ are generally smaller than the magnitudes for negative differences. Thus, it may be possible that a version of inequality (28) is still valid. We have not been able to find such an alternative inequality so far. On the other hand, all the examples involving dependent inputs we have looked at, support inequality (31). We provide one such example before returning to the Special Case 3.1. x 2 , x 3 ), represented by the truth table shown below and the corresponding probabilities of states. It is easy to check that this is not a product distribution, so the variables are dependent; for example, P(X 1 = 0, X 2 = 0) = 3/10 while P(X 1 = 0)P(X 2 = 0) = 1/4. One can also check that I 1 ( f ) = 3/5 and I 2 ( f ) = I 3 ( f ) = 1/2. In this case, for any A ⊆ [3] of cardinal at least two, we get automatically that MI( f (X ); X A ) ≤ 1 ≤ I A ( f ). On the other hand, for A = {1} we obtain via formula (24), that MI( f (X );
The inequality (32) takes on the particular form where |A| = k.
One obtains the following consequence, which is valid no matter the cardinality of supp f . This one holds since one can write h 1 2 n−k ≤ (n − k) 2 n−k−1 = n2 k − k2 k 2 n−1 ≤ 2 k n − n j=1 m j 2 n−1 .

Final Comments
The main conclusions of this work are that operator theory can offer computationally efficient ways to find or estimate important quantities used in assessing the sensitivity to perturbations of BNs, and to quantify the relevance of nodes using elements of information theory, in particular MI. We conclude that MI is an excellent tool for identifying a subset of relevant nodes in the network that offer the most information gain and whose knowledge reduces the entropy of the whole network significantly. Moreover, the MI provides a lower estimate for the influence of nodes in various scenarios. It would be of interest to continue this exploration under various scenarios of dependent nodes in the network, as well as to refine further some of the results of this paper. For example, could one strengthen inequality (32) and prove it in general or for different scenarios?
On the other hand, in Klotz et al. (2014) it is shown that MI is maximized for canalizing functions. However, real networks do not consist of a single type of Boolean function. Therefore, it would be of interest to explore a possible hierarchy of various types of functions regarding the information gain they provide. That could offer more information regarding the inequality between the MI and the influence. Besides, most of the functions in real networks, such as cellular networks, need not be strictly canalizing as considered in Klotz et al. (2014) (i.e., one value of one of the inputs forces the output to take on a certain fixed value regardless of the other inputs). In reality, functions may be only partially canalizing, allowing for multiple, but not necessarily all inputs, to be canalizing in a cascading fashion as discussed, for example, in Layne et al. (2012) or Dimitrova et al. (2015). Partially nested canalizing functions have been considered recently as a more realistic alternative to canalizing functions. Another type of function that is common in applications is a threshold function that turns a node ON provided that a sufficient number of inputs are ON. This type of function is typical for neural networks. This opens the door for a variety of directions of research that stem from the work in this paper.
Furthermore, as specified before, it is our intention to explore the Java software Naldi et al. (2009a) to actually perform network reduction preserving attractors and stability and use it to analyze dynamics of various signal transduction networks found in Cell Collective Helikar et al. (2012) and Helikar et al. (2013).
Finally, it would be of great interest to look further into other real networks to identify the possibility of reducing them to the most determinative nodes. For example, it would be interesting to compare the results of the fibroblast network to other networks such as the Boolean model of the influenza-host interactions during infection of an epithelial cell Madrahimov et al. (2013), to identify possible similarities and differences that may occur. At the same time, identifying possible classes of biological nodes/sub-networks obtained with the DP sub-network procedure in a variety of other networks could bring further clarifications on the advantages of the DP method. Even more, it is important to assess the degree to which the reduced network provides accurate information on specific tasks typical for the whole network, such as pattern recognition or decision making. At the same time, exploring the impact of considering DP versus strength might provide new insights.