Validation of Inference Procedures for Gene Regulatory Networks

The availability of high-throughput genomic data has motivated the development of numerous algorithms to infer gene regulatory networks. The validity of an inference procedure must be evaluated relative to its ability to infer a model network close to the ground-truth network from which the data have been generated. The input to an inference algorithm is a sample set of data and its output is a network. Since input, output, and algorithm are mathematical structures, the validity of an inference algorithm is a mathematical issue. This paper formulates validation in terms of a semi-metric distance between two networks, or the distance between two structures of the same kind deduced from the networks, such as their steady-state distributions or regulatory graphs. The paper sets up the validation framework, provides examples of distance functions, and applies them to some discrete Markov network models. It also considers approximate validation methods based on data for which the generating network is not known, the kind of situation one faces when using real data.


INTRODUCTION
The construction of gene regulatory networks is among the most important problems in systems biology [1][2]. Network models provide quantitative knowledge concerning gene regulation and, from a translational perspective, they provide a basis for mathematical analyses leading to systems based therapeutic strategies [3]. Network models run the gamut from coarse-grained discrete networks to the detailed description of stochastic differential equations. The availability of high-throughput genomic data has motivated the development of numerous inference algorithms. The performance, or validity, of these algorithms must be quantified. An inference algorithm takes a sample set of data as input and outputs a model network. Its validity must be evaluated relative to its ability to infer a model network close to the ground-truth network from which the data have been generated. Given a hypothetical model, generate data from the model, apply the inference procedure to construct an inferred model, and compare the hypothetical and inferred models via some objective function.
This paper mathematically formulates validation in terms of the distance between two networks, or the distance between two structures of the same kind deduced from the networks, such as their steady-state distributions. As a function from a sample set to a class of network models, an inference procedure is a mathematical operator and its performance must be evaluated within a mathematical framework, in this case, distance functions. The paper sets up the validation framework in general terms, provides examples of t 2 < < t s < t is equal to the probability of X(t) conditioned on X(t s ). We also assume the process is homogeneous, meaning that the transition probabilities depend only on the time difference, that is, for any t and u, the u-step transition probability, p jk (u) = P(X(t + u) = x k | X(t) = x j ) (1) depends only on u. We are not asserting that the Markov property and homogeneity are necessary assumptions for gene regulatory networks. We make these assumptions to facilitate mathematically tractable modeling for the current study. Under these assumptions, we need only consider the one-step transition probability matrix,  (2) where the one-step transition probability, p jk , is given by p jk = p jk (1). We refer to P simply as the transition probability matrix. For t = 0, 1, 2,..., the state probability structure of the network is given by the t-state probability vector p(t) = (p 1 (t), p 2 (t),…, p N (t)) (3) where p j (t) = P(X(t) = x j ). p(0) is the initial-state probability vector.
Besides the state one-step probabilities, we can consider the gene one-step probabilities, p i (j, r) = P(X i (t + 1) = r| X(t) = x j ) (4) If, given the GAP at t, the conditional probabilities of the genes are independent, then p jk = p 1 (x j1 , x k ) p 2 (x j2 , x k )… p n (x jn , x k ) (5) Suppose that gene X i at time t + 1 depends only on values of genes (predictors) in a regulatory set, R i V, at time t, the dependency being independent of t. Then the gene one-step probabilities are given by In this form, we see that the Markov dependencies are restricted to regulatory genes.
The network has a regulatory graph consisting of the n genes and a directed edge from gene There is also a state-transition graph whose nodes are the N state vectors. There is a directed edge from state x j to state x k if and only if x j = X(t) implies x k = X(t + 1).
A homogeneous, discrete-time Markov chain with state space {x 1 , x 2 ,…, x N } possesses a steady-state distribution ( 1 , 2 ,…, N ) if, for all pairs of states x k and x j , p jk (u) k as u . If there exists a steady-state distribution, then, regardless of the state x k , the probability of the Markov chain being in state x k in the long run is k . In particular, for any initial distribution p(0), p k (t) k as t . Not all Markov chains possess steady-state distributions.

Rule-Based Networks
A basic type of regulatory model occurs when the transition X(t) X(t + 1) is governed by a rule-based structure, meaning there exists a state function f = (f 1 , f 2 ,…, f n ) such that X i (t + 1) = f i (R i (t)). A classical example of a rule-based network is a Boolean network (BN), where the values are binary, 0 or 1, and the function f i can be defined via a logic expression or a truth table consisting of 2 n rows, with each row assigning a 0 or 1 as the value for the GAP defined by the row [4][5]. As defined, the BN is deterministic and the entries in its transition probability matrix are either 0 or 1. The connectivity of the BN is the maximum number of predictors for a gene. If each has the same number of predictors, then we say that the network has uniform connectivity.
The model becomes stochastic if the BN is subject to perturbation, meaning that at any time point, instead of necessarily being governed by the state function f = (f 1 , f 2 ,…, f n ), there is a positive probability p < 1 that the GAP may randomly switch to another GAP. There are more refined ways of characterizing perturbations, such as defining perturbations at the gene level rather than the state level, but statelevel perturbation is easy to describe and is sufficient for our purposes here. For a BN with perturbation, the corresponding Markov chain possesses a steady-state distribution.
The long-run behavior of a deterministic BN depends on the initial state and the network will eventually settled down and cycle endlessly through a set of states called an attractor cycle. The set of all initial states that reach a particular attractor cycle forms the basin of attraction for the cycle. Attractor cycles are disjoint. With perturbation, in the long run the network may randomly escape an attractor cycle, be reinitialized, and then begin its transition process anew.

Distance Functions
To discuss validity, we must first discuss the manner in which we are to compare two networks. Given networks H and M, we need a function, μ(M, H), quantifying the difference between them. We require that μ be a semi-metric, meaning that it satisfies the following four properties:  As a semi-metric, μ is called a distance function. If μ should satisfy a fifth condition, then it is a metric. A distance function is often defined in terms of some characteristic, by which we mean some structure associated with a network, such as its regulatory graph, steady-state distribution, or probability transition matrix. This is why we do not require the fifth condition, μ(M, H) = 0 M = H, for a network distance function.
If we want to approximate one network by another, say for reasons of computational complexity, then a distance function can be used to measure the goodness of the approximation. If M 1 and M 2 are two approximations of network H, then M 1 is a better approximation than M 2 relative to μ if μ(M 1 , H) < μ(M 2 , H).
Because a network distance function need only be a semi-metric, one must be careful in applying propositions from the theory of metric spaces. For instance, in a metric space, if a sequence of points in the space is convergent, then the limit of the sequence is unique. When the points are networks, this is not necessarily true. A sequence of networks can converge to two distinct networks: {H i } can converge to both M and N, with M N.

Rule-Based Distance
For Boolean networks (with or without perturbation) possessing the same gene set, a distance is given by the proportion of incorrect rows in the function-defining truth tables. Denoting the state functions for networks H and M by f = (f 1 , f 2 ,…, f n ) and g = (g 1 , g 2 ,…, g n ), respectively, since there are n truth tables consisting of 2 n rows each, this distance is given by x x (7) where I denotes the indicator function, I[A] = 1 if A is a true statement and I[A] = 0 otherwise [6]. If we wish to give more weight to those states more likely to be observed in the steady state, then we can weight the inner sums in Eq. 7 by the corresponding terms in the steady-state distribution, = ( 1 , 2 ,…, N ). For Boolean networks without perturbation, μ fun is a metric. If there is perturbation, then μ fun is not a metric because two distinct networks may be identical with regard to the rules but possess different perturbation probabilities.

Topology-Based Distance
If one's focus is on the topology of a network, then a straightforward approach is to construct the adjacency matrix. Given an n-gene network, for i, j = 1, 2,…, n, the (i, j) entry in the matrix is 1 if there is a directed edge from the ith to the jth gene; otherwise, the (i, j) entry is 0. If A = (a ij ) and B = (b ij ) are the adjacency matrices for networks H and M, respectively, where H and M possess the same gene set, then the hamming distance between the networks is defined by Alternatively, the hamming distance may be computed by normalizing the sum, such as by the number of genes or the number of edges in one of the networks, for instance, when one of the networks is considered as representing ground truth. The hamming distance is a coarse measure since it contains no steady-state or dynamic information. Two networks can be very different and yet have μ ham (M, H) = 0.
If one of the networks in Eq. 8 is considered as ground truth, then the hamming distance can be reformulated in terms of the numbers of false-negative and false-positive edges. If H is the ground-truth network, then a false-negative edge is a directed edge not in M that is in H and a falsepositive edge is directed edge in M that is not in H. Letting FN and FP be the numbers of false-negative and falsepositive edges, respectively, the hamming distance is given by FN + FP. Because we are considering directed graphs, an incorrectly oriented edge in M between two genes is both a false-negative and false-positive edge, although one can slightly alter the definitions to avoid this kind of double counting. If we were to consider undirected graphs, then this anomaly would not occur because an edge would either be present or absent. In this case, the hamming distance is still defined by Eq. 8 but the adjacency matrix is symmetric.
Since our interest is measuring the closeness of an inferred network to the network generating the data, we concentrate on distance functions, in particular, the hamming distance, which has been used for this purpose [7,8]. Nondistance measures related to the hamming distance have been used in the context of regulatory graphs. Again let H denote the ground-truth network. A true-positive edge is a directed edge in both H and M, and a true-negative edge is a directed edge in neither H nor M (with analogous definitions holding for undirected graphs). Let TP and TN be the numbers of true-positive and true-negative edges, respectively. The positive predictive value is defined by TP/(TP + FP), the sensitivity is defined by TP/(TP + FN), and the specificity is defined by TN(TN + FP). These kinds of measures have been used in several regulatory-graph inference papers [8][9][10][11][12] and a study using these measures has been performed to evaluate a number of inference procedures [13].

Transition-Probability-Based Distance
Distances for Markov networks can be defined via their probability transition matrices by considering matrix norms. A norm is a function ||•|| on a linear (vector) space, L, such that: (1) ||v|| 0, Given a norm on L, a metric is defined on L by ||v w||.
For an n n matrix and r 1, the r-norm is defined by The supremum norm is defined by These norms are well-studied in linear algebra. Each yields a metric defined by ||P Q|| r . If P = (p ij ) and Q = (q ij ) are the probability transition matrices for networks H and M, respectively, then a network distance function is defined by Whereas ||•|| r defines a matrix metric, r prob μ is only a network semi-metric because two distinct networks may have the same transition probability matrix.

Long-Run Distance
Since steady-state behavior is of particular interest, for instance, being associated with phenotypes, a natural choice for a network distance is to measure the difference between steady-state distributions [14]. If = ( 1 , 2 ,…, N ) is a probability vector, then its r-norm is defined by for r 1, and its supremum norm is defined by If = ( 1 , 2 ,…, N ) and = ( 1 , 2 ,…, N ) are the steadystate distributions for networks H and M, respectively, then a network distance is defined by Other norms can be used to define the distance function.
Not all networks possess steady-state distributions. The long-run behavior of a deterministic rule-based network, such as a Boolean network, depends on the initial state. A rule-based finite-value network possesses attractor cycles that characterize its long-run behavior and we can consider comparing this long-run behavior. This can be done by considering the proportion of time spent in a state once an attractor cycle has been entered. For any initial state x k , the network eventually enters the attractor cycle, C k , whose basin contains x k . An arbitrary state x j either lies in C k or it does not. Let m k denote the number of states in C k and p k be the probability that the initial state is x k . We define the longrun probability of x j by Suppose all attractor cycles are singletons, so that m k = 1. Moreover, suppose we do not know the initial-state prob-abilities and we set p k = 1/N. If x k is an attractor, let b k denote the number of states in its basin; if x k is not an attractor, let b k = 0. Then Eq. 15 reduces to j = b j /N. To this point, = ( 1 , 2 ,…, N ) describes a probability density because its components sum to 1. Now suppose we ignore the basin sizes so that j = 1/N if x j is an attractor and j = 0 otherwise. If = ( 1 , 2 ,…, N ) and = ( 1 , 2 ,…, N ) correspond to networks H and M, respectively, then the network distance induced by the 1-norm is given by (16) where AH and AM are the attractor sets for H and M, respectively,

Trajectory-Based Distance
Continuing with rule-based finite-value networks, rather than simply focusing on the long-run probabilities, one can take a more refined perspective by considering differences in the trajectories. Continue to let m k denote the number of states in the cycle C k for initial state x k and let t k be the time it takes x k to reach C k . The time trajectory of the network is given by X(t) = (X 1 (t), X 2 (t),…, X n (t)). For a given initial state this trajectory is deterministic. For initial state x k , denote the trajectory by Given the initial state is x k , we define the amplitude cumulative distribution of gene X i by This increasing function of z counts the fraction of time that ) ( Given two attractor cycles, C k and C j , resulting from initializations x k and x j , respectively, we define a distance between the cycles relative to gene X i using the amplitude cumulative distributions, F i (•|k) and F i (•|j), by for some function norm ||•||. For example, we could use the L 1 norm The L 1 norm possesses an interesting interpretation if gene X i has constant amplitude values, a and b, on cycles C k and C j , respectively. In this case, F i (•|k) and F i (•|j) are unit step functions with steps at a and b, respectively. Hence, in this case the L 1 norm reduces to i (C k , C j ) = |a b| ( 2 2 ) and gives the distance, in amplitude, between the values of gene X i on the two cycles. For a Boolean network, i (C k , C j ) = 0 if the gene is either ON or OFF on both cycles and i (C k , C j ) = 1 if X i is ON for one cycle and OFF for the other (assuming X i is constant on both cycles).
Considering the full set of genes, we define a distance between two attractor cycles, C k and C j by Now consider two networks, M and H, having the same genes. We define the distance between M and H as the expected distance between attractor cycles over all possible initial states: where M k C and H k C are the attractor cycles corresponding to initialization by state x k in networks M and H, respectively [15].

Equivalence Classes of Networks
The previous examples of network distance functions demonstrate a common scenario: a network semi-metric is defined by a metric on some network characteristic, for instance, its regulatory graph, its transition probability matrix, etc. The metric requirement, μ(M, H) = 0 M = H, fails because distinct networks possess the same characteristic. To formalize the situation, let M and H denote the characteristic corresponding to networks M and H, respectively. If is a metric on a space of characteristics (directed graphs, matrices, probability densities, etc.), then a semi-metric μ is induced on the network space according to This is quite natural if our main interest is with the characteristic, not the specific network itself.
Focus on network characteristics leads to the identification of networks possessing the same characteristic. Given any set, U, a relation between elements of U is called an equivalence relation if it satisfies the following three properties for a, b, c U: (1) a a [reflexivity], If we define M H if M = H, then this is a network equivalence relation. If we focus on equivalence classes of networks rather than the networks themselves, we are in effect identifying equivalent networks. For instance, if we are only interested in steady-state distributions, then it may be advantageous to identify networks possessing the same steady-state distribution.

INFERENCE PERFORMANCE
An inference procedure operates on data generated by a network H and constructs an inferred network M to serve as an estimate of H, or it constructs a characteristic to serve as an estimate of the corresponding characteristic of H. For instance, the data may be used to infer a distribution that estimates the steady-state distribution of H. The data could be dynamical, consisting of time-course observations, or it might be taken from the steady state, as with microarray measurements assumed to come from the steady state of some phenotypic class. In the latter case, it makes sense to consider inference accuracy relative to the steady-state distribution of H, rather than H itself. For full network inference, the inference procedure is a mathematical operation, a mapping from a space of samples to a space of networks, and it must be evaluated as such. There is a generated data set S and the inference procedure is of the form (S) = M. If a characteristic is being estimated, then (S) is a characteristic, for instance, (S) = F, a probability distribution.

Measuring Inference Performance Using Distance Functions
Focusing on full network inference, the goodness of an inference procedure is measured relative to some distance, μ, specifically, μ(M, H) = μ( (S), H), which is a function of the sample S. In fact, S is a realization of a random set process, , governing data generation from H. In general, there is no assumption on the nature of . It might be directly generated by H or it might result from directly generated data corrupted by noise of some sort. μ( ( ), H) is a random variable and the performance of is characterized by the distribution of μ( ( ), H), which depends on the distribution of .
The salient statistic regarding the distribution of μ( ( ), H) is its mean, E [μ( ( ), H)], where the expectation is taken with respect to .
Rather than considering a single network, we can consider a distribution, , of random networks, where, by definition, the occurrences of realizations H of H are governed by a probability distribution. This is precisely the situation with regard to the classical study of random Boolean networks. Averaging over the class of random networks, our interest focuses on It is natural to define the inference procedure 1 better than the inference procedure 2 relative to the distance μ, the random network H, and the sampling procedure if Whether an inference procedure is "good" is not only relative to the distance function, it is relative to how one views the value of the expected distance. Indeed, it is not really possible to determine an absolute notion of goodness.
In practice, the expectation is estimated by an average, where S 1 , S 2 ,…, S m are sample point sets generated according to from networks H 1 , H 2 ,…, H m randomly chosen from H.
The preceding analysis applies virtually unchanged when a characteristic is being estimated. One need only replace H and H by and , where and are a characteristic and a random characteristic, respectively, and replace the network distance μ by the characteristic distance.
We next present three examples using previously introduced distance functions to measure inference performance. Algorithm description will be sketchy in order to avoid long digressions from the issue of distance illustration. We defer to the cited literature for details. Example 1. The Boolean network model has been in existence for a long time and various inference procedures have been proposed [16][17][18]. One proposed method for Boolean networks with perturbation is based on the observation of a single dynamic realization of the network [6]. This method will be discussed in some detail in Section 5 in regard to consistent inference; for now, we are only concerned with the distance between the inferred network and the original network generating the data, where the distance function is given by μ fun (M, H) in Eq. 7. Fig. (1) shows the average (in percentage) of the distance function using 80 data sequences generated from 16 randomly generated Boolean networks with 7 genes, perturbation probability p = 0.01, uniform connectivity k = 2 or k = 3, and data sequence lengths varying from 500 to 40,000. The reduction in performance from connectivity 2 to connectivity 3 is not surprising because the number of truth-table lines jumps dramatically. Fig. (1). Rule-based distance performance for Boolean-network inference for connectivity k = 2, 3.

Example 2.
There have been a number of papers addressing the inference of connectivity graphs using informationtheoretic approaches [9,10,19]. In a study proposing using the minimum description length (MDL) principle to infer regulatory graphs [8], the hamming distance was used to compare the performance of the newly proposed algorithm with an earlier information-theoretic algorithm, called RE-VEAL [9]. Fig. (2) compares the hamming distances between the inferred networks and the corresponding synthetic networks that generated the data relative to increasing sample size. It does so for the REVEAL algorithm and the MDL algorithm using three different settings for a user-defined parameter. The performance measures are obtained by averaging over 30 randomly generated networks, each containing 20 genes and 30 edges, with the distance function being normalized over 30, the number of edges in the synthetic networks. Fig. (2). Hamming distance performance for inferring regulatory graphs using information theory. Example 3. A probabilistic Boolean network (PBN) is a network defined as a collection of discrete-valued networks such that at any point in time one of the constituent networks is controlling the network dynamics [20]. In a contextsensitive PBN there is a binary random variable determining whether there should be a switch of constituent networks at that time point, the modeling assumption being that there are latent variables outside the model network whose changes induce stochasticity into the PBN [21]. Typically, there is also a probability of permutation. This example considers a Bayesian connectivity-based inference procedure for designing PBNs from steady-state data [22]. A synthetic PBN, H, composed of two constituent Boolean networks is used to generate a random sample of size 60 from its steady-state distribution and the inference procedure is used to construct a designed PBN composed of ten constituent PBNs (note that the inference procedure does not have input relating to the number of constituent BNs of the generating network). According to definition, the attractors of a PBN are the attractors of its constituent BNs. H has six singleton attractors, two of which, call them x a and x b , contain 0.99 of the steadystate mass. The designed PBN has more attractors, which is not uncommon, but x a and x b appear in all ten constituent networks as singleton attractors and they contain 0.78 of the steady-state mass. Since, for PBNs with low probability of network switching almost all of the steady-state mass lies in its attractors [21]

CONSISTENCY
The greater the amount of data, the better inference one can expect. The hope is that, for large data sets, the inferred network will be close to the generating network. We define an inference procedure, , to be consistent if μ * (H, , ) 0 as | | . We illustrate consistency using Boolean networks with perturbation. We use the inference procedure referred to in Example 1 that applies to a single observed time series [6] and the distance function μ fun of Eq. 7.
Owing to perturbation, the network has a steady-state distribution and all states communicate with each other. Hence, given a long time series we are likely to observe most of the states and their corresponding state-to-state transitions x k x (k) , for k = 1, 2,…, N, where x (k) denotes the next state following x k under the network state function. If we ignore perturbation, then using the observed state-to-state transitions we can construct a table of state-to-gene transitions of the form x k x i , for k = 1, 2,…, N and i = 1, 2,…, n. These define the functions f 1 , f 2 ,…, f n accordingly. Because the truth table for function f i has 2 n rows of the form f i (x k ), some rows may be empty owing to insufficient observations and these rows can be filled in randomly. As the length of the time series increases, the probability of not observing the state x k goes to 0. Indeed, for any positive integer c, if we let (x k ) denote the number of times x k is observed in the time series, then P( (x k ) c) 1 as | | , where the probability is with respect to the time series .
With perturbation, the state-to-state transitions do not directly define functions because state x k may transition to more than one state. However, assuming a perturbation probability less than 0.5, the transitions from x k will be dominated by the single transition determined by the state function f and this dominating choice can be used for inference. Letting j (x k ) denote the number times we observe the transition x k x j , if f(x k ) = x (k) is the function-defined transition, then and the inference procedure is consistent relative to μ fun .
The preceding argument assumes that the perturbation probability is known. A modification of the inference procedure yields an estimator for p [6]; however, if p is also being estimated, then the model space H is no longer finite and the consistency proof has to be modified. We do not believe this is the proper place to go into such mathematical issues.

APPROXIMATION
Inference performance is evaluated based on the ability of an inference procedure to identify the network from which the data have been derived. This can only be done exactly if the data-generating network is known. Suppose we do not know the random network, H, generating the data for which we want to evaluate the inference procedure, , but know a network N that we believe to be a good approximation to the networks in H. We might then compare the inferred network to N. In effect, such a comparison is approximating μ * (H, , The key issue is approximation accuracy. The triangle inequality implies   N)], which generally means that the number of sample sets is sufficiently large that the expectation is well-estimated by the average distance.
The preceding approximation methodology is common in the literature. A proposed inference procedure is applied to one or more real data sets. The inferred network is compared, not to the unknown random network generating the data, but to a model network that has been humanconstructed from the literature (and implicitly assumed to approximate the data-generating network). For instance, a directed graph (adjacency matrix), A, is constructed from relations found in the literature and the hamming distance is used in the approximating expectation, E [μ ham ( ( ), A)], in Eq. 35. The aim is to compare the result of the inference procedure to some characteristic related to existing biological knowledge. The problem is that the constructed regulatory graph may not be a good approximation to the regulatory graph for the system generating the data. This can happen because the literature is incomplete, there are insufficiently validated connections reported in the literature, or the conditions under which connections have been discovered, or not discovered, in certain papers are not compatible with the conditions under which the current data have been derived. As a result of any of these situations, the overall validation procedure is confounded by the precision (or lack thereof) of the approximation.

VALIDATION FROM EXPERIMENTAL DATA
Another form of approximation results from using experimental data for validation rather than synthetic data generated from a known, ground-truth model. In this situation, there is a test-data sampling procedure generating data from which an estimate of the desired characteristic corresponding to the underlying physical network is formed. Validation is then via the random variable μ( ( ), ( )), where is the training-data sampling procedure used to design the network and is a real-data test-sampling procedure to validate the designed network by direct construction of the characteristic via independent sampling. To simplify the notation we consider a single underlying network H rather than a random network H. Consider what happens if one only has data to estimate (train) the model, which may happen when data are limited on account of cost or the availability of samples. In this case, one tests on the same data, thereby having = in Eq. 36 and the resubstitution estimate, E [μ( ( ), ( ))], in Eq. 37. If is a consistent estimator of H and the single training sample is large, then the conclusion of Eq. 37 again holds. But we do not have a large sample. Hence, Eq. 36 cannot be used to insure good average performance. But it also cannot be used to insure good performance when there is a small independent test-data sample. In the independent case, we are concerned with the absolute difference As with classification, where resubstitution error estimation is usually biased low owing to overfitting by the classification rule, in the case of network validation, resubstitution is risky because the characteristic of the designed network is being compared to a characteristic inferred from the same data with which the network has been designed. According to Eq. 36, as in the case of classification, this is not a problem for large samples, but it can be a serious problem for small samples because overfitting can cause train to be much less than test . Whereas substantial effort has gone into studying these kinds of problems in pattern recognition, there appears to be an absence of the analogous study for network validation.

Example 4.
An attractor-preserving inference method for PBNs based on steady-state data has been proposed and applied to PBNs [23]. A PBN has been designed from cDNA microarray data using 7 genes: WNT5A, pirin, S100P, RET1, MART1, HADHB, and STC2. The steady-state distribution of the designed network has been compared to the histogram of the data, the histogram serving as an estimate of the steady-state distribution of the underlying physical network. Fig. (3) illustrates the comparison of the portion of the steady-state distribution corresponding to the data states with the data histogram. Referring to Eq. 14, the 1-norm and 2-norm yield the resubstitution error distances 1 stead μ ( (S), (S)) = 0.45 (out of a maximum of 2) and 2 stead μ ( (S), (S)) = 0.1262, respectively, the latter being the root-mean-square error. Fig. (3). Comparison of steady-state distribution for a designed network and data histogram.

CONCLUSION
This paper has proposed a mathematically rigorous framework for the validation of inference procedures for gene regulatory networks and has illustrated this framework employing validation methods used in the literature. Owing to the central role of regulatory networks in systems biology and the need to apply inference procedures to the massive data sets resulting from high-throughput technologies, validation cannot be left to ad hoc methods whose own performances are not understood. A formal framework is necessary. As should be clear from the paper, a great deal of work needs to be done to establish the properties of inference procedures under various conditions, such as the sampling procedure, model class, and validation criterion (distance function). Absent rigorous results in this regard, proposed inference procedures will remain speculative and the quality of their performances unknown. A sound epistemology will be lacking.