MRI Reconstruction Using Markov Random Field and Total Variation as Composite Prior

Reconstruction of magnetic resonance images (MRI) benefits from incorporating a priori knowledge about statistical dependencies among the representation coefficients. Recent results demonstrate that modeling intraband dependencies with Markov Random Field (MRF) models enable superior reconstructions compared to inter-scale models. In this paper, we develop a novel reconstruction method, which includes a composite prior based on an MRF model and Total Variation (TV). We use an anisotropic MRF model and propose an original data-driven method for the adaptive estimation of its parameters. From a Bayesian perspective, we define a new position-dependent type of regularization and derive a compact reconstruction algorithm with a novel soft-thresholding rule. Experimental results show the effectiveness of this method compared to the state of the art in the field.

Apart from CS-MRI iterative algorithms for solving the problem in (1), deep learning, which has been successful in solving many computer vision problems, has also been proved beneficial in tackling 2. Methodology

MRF Priors in MRI Reconstruction
Consider a Bayesian approach to recovering jointly representation coefficients θ = Wx and their bi-level significance map s with the maximum a posteriori probability (MAP) criterion: P(θ, s|y) ∝ p(y|θ)p(θ|s)P(s). Here, s ∈ {−1, 1} D are the latent variables formed in a lattice that encode the significance of the representation coefficients: s i = +1 on those positions i in lattice where θ i is significant, and s i = −1 conversely. P(s) is the prior probability of s while p(y|θ) and p(θ|s) are conditional probability density functions (p.d.f.). Equivalent formulations of the graphical model were presented, e.g., in [27][28][29] and a similar one in [26,35] for the signal sparse in image domain. Using the negative log transformation of P(θ, s|y), the MAP criterion translates into the joint minimization with respect to (w.r.t.) θ and s: min θ,s − log p(y|θ) + log p(θ|s) + log P(s) . ( We make a connection between (3) and formulations of type (1) by reordering the terms as follows: Assuming that measurements y are corrupted by independent identically distributed (i.i.d.) Gaussian noise, we have that p(y|θ) = N (y|AP H θ, I). It then follows that the first term f (W H θ) in (4) using a relation x = W H θ becomes equal to data fidelity part 1 2 ||y − Ax|| 2 2 in (1). ψ MRF (Wx) is the new type of regularization function, which is taking the role of the second term τφ(Bx) term from (1). Assuming conditional independence p(θ|s) = ∏ i p(θ i |s i ) and a Laplacian prior for image coefficients p(θ i ) we define accordingly conditionals p(θ i |s i ) as follows: p(θ i |s i = 1) ∝ 1 2b e − |θ i | b for |θ i | ≥ B and zero otherwise while p(θ i |s i = −1) is defined oppositely with respect to B. The threshold B is related to the noise level and estimated from the highest-frequency subband coefficients; the scale parameter b is adaptively estimated per subband [29]. We choose the Gibbs distribution (MRF prior) for the θ's support model P(s) = 1 Z e −H(s)/T , where the normalizing constant Z = ∑ s e −H(s)/T represents the partition function, and the temperature T controls the peaking in the probability density [36]. We employ a common model with pair-site cliques. However, in contrast to our previous work [27][28][29], we allow for different interaction coefficients for cliques of different orientations, which models better the actual subband statistics. The energy function of this anisotropic MRF is where α expresses the a priori preference for labels of one type over the other and β o the interaction strength for cliques with orientation o ∈ {h, v, d1, d2} shown in Figure 1. The estimation of the MRF parameters (β o , α) is treated in Section 2.3. Involving the composition of the MRF-based regularization function ψ MRF (Wx) and TV norm we arrive at the proposed optimization problem formulation: The optimization problem (6) is non-convex in general due to the presence of the non-convex ψ MRF regularization function and is very hard to globally solve exactly. In the following section we propose a computationally efficient method to suboptimally solve (6).

Proposed Algorithm
To solve the problem (6) we adopt the idea of fast composite splitting from [37] and extend this algorithm to deal with the MRF-based regularization. The key difference is that our model involves ψ MRF instead of 1 -norm in [37]. Hence, we have to solve the proximal mapθ = prox µ (ψ MRF )(Wx g ): Note that, in notation prox µ (ψ MRF )(Wx g ), ψ MRF is the function for which the proximal map is calculated, and θ g = Wx g is the argument at which it is evaluated. Since the evaluation of this proximal map is very hard, we adopt a suboptimal, block-coordinate approach explained in the following.
The key novel ingredient of the proposed algorithm is a computationally efficient approximation of the proximal map in (7) for a fixed x g . From (4), it is clear that the minimization in (7) needs to be carried out jointly w.r.t. θ and s in order to evaluate (7) exactly. We adopt a suboptimal yet computationally efficient approach where we first minimize the second term in (4) w.r.t. s ∈ {−1, 1} D for a fixed θ; then we fix the obtainedŝ and minimize the objective in (7) w.r.t. θ. For the former minimization step, we adopt notationŝ = MAP-support{θ}. The step is done via the Metropolis sampler procedure using a "warm-start" initial s as in [28,29]. The latter minimization (w.r.t. θ for a fixedŝ), as shown here, can be done in closed-form and leads to a novel soft-thresholding operation. This is accomplished by solving the following minimization problem Equation (8) is derived from (7) using a simple algebraic manipulation and omitting the terms that do not depend on θ. With the analytic form of p Θ i |S i (θ i |s i ) forŝ i = {−1, 1} (see Section 2.1), the closed form solution for each single component ofŝ (it turns out that the solution decouples component-wise) is derived in (9) (index i in the equation is omitted for notation simplicity). The solution (9) is illustrated in Figure 1 on range [0, ∞) due to its (odd) symmetry about the origin.
The described block-coordinate cyclic procedure should in principle proceed in several iterative rounds. As we demonstrate here numerically, it is sufficient to perform a single cycle. Once the approximation of the proximal map in (7) is available, we incorporate it in a fast-splitting framework akin to the one in [37]. That is, we first carry out a gradient-like step on x w.r.t. data fidelity term. Then, we perform in parallel the proximal maps that correspond to the two regularization functions in (6). The two outputs of the proximal mappings are then simply averaged and fed into a Nesterov-acceleration-like step. The motivation for the parallel-proximal-then-average approach is computational efficiency, as carrying out a joint proximal map (even approximate) w.r.t. ψ MRF and φ TV would be very hard. The motivation for the Nesterov-acceleration step is to further speed-up the method.
The overall method is presented in Algorithm 1. Recall the definition of ψŝ from (8). Note that steps 3-4 are the single block-coordinate cycle to approximate the operation in (7). The parameters τ 1 , τ 2 , µ for simplicity are all set to 1, although these are not the optimal values, unless it is otherwise stated. We refer to the proposed method as Fast Composite Lattice and TV regularization (FCLaTV). A version without acceleration CLaTV can also be used, and it is obtained after omitting steps 7 and 8 from FCLaTV and replacing r {k} in the step 1 with x {k} . Extensive numerical studies demonstrate that Algorithm 1 always converges. The rigorous convergence analysis is left for future work.

Parameter Estimation for the Anisotropic MRF Prior
Here we propose a data-driven approach for specifying the parameters {α, β h , β v , β d1 , β d2 } of the MRF model. The core idea is to relate the parameters of the prior P(s) for the support of θ to some measurable characteristics of the observed θ. The representation coefficients θ are re-estimated in each iteration of Algorithm 1 and so are the MRF parameters.
The four interaction coefficients {β h , β v , β d1 , β d2 } represent the clustering strength of the coefficient labels s in the corresponding four directions for the observed subband. We reason that the clustering strength in a particular direction should be proportional to the correlation among the corresponding representation coefficients θ in that direction. Therefore, we base the estimation of interaction coefficients on magnitude of representation coefficients θ. In order to reduce the effect the noise present in θ on estimation of the interaction coefficients, we used squared magnitude of θ and account only for those that were marked as significant by the initial s prior to its MAP estimationŝ. In what follows, θ S denotes a subband in which all the coefficients that were selected as significant by s are left unchanged ( . To express two-dimensional (2D) correlation, we need to revert to 2D spatial indices (k, l). Let r (i,j) denote the correlation coefficient for squared θ S corresponding to the spatial shift (i, j): We take four correlation coefficients corresponding to the smallest possible spatial shifts in the corresponding directions, indicated by arrows in Figure 1. These are: in the vertical direction. Now we can specify the four interaction coefficients as normalized correlation coefficients of θ S : The normalization by 2 -norm here is optional (as only the relative values of the MRF parameters with respect to each other actually matter) but we find it convenient in practice to have these parameters in the range [−1, 1] as it is now guaranteed. We also tested 1 and ∞ for the purpose of this normalization, but 2 led to best performances in our experiments.
It still remains to specify the parameter α, which represents a priori preference for one type of labels (−1 or +1) over the other. With α = 0 both labels are a priori equally likely and as α increases in magnitude the more preference goes to one of these labels. For α > 0, the labels −1 will be favoured, which means that significant coefficients (labelled by +1) will be sparse. We specify α as the mean energy of the coefficients in θ S relative to the energy of the largest coefficient in that subband: Note that we have omitted the iteration indices for compactness. In fact, we have sequences s (k) , θ S(k) and α (k) , β d2 that get improved through iterations k.

Complex Image Reconstruction and Multi-Coil Reconstruction
The proposed approach is developed for the reconstruction of MR image magnitude. Here we extend it for the reconstruction of complex images (magnitude and phase) and the reconstruction of magnitude images from multi-coil measurements. In cases where it is of interest to recover the image phase together with the magnitude, steps 4 and 5 in Algorithm 1 are repeated twice: first for the regularization of the real part and then, equivalently, for the regularization of the imaginary part of a complex image x. This approach showed promising performances as it will be seen in the following section. A multi-coil image reconstruction demands knowledge of sensitivity profiles for each coil and therefore different construction of the operator A during the reconstruction procedure. If we denote with C i the sensitivity profile for coil i then undersampled measurements from the same coil are y i = AC i x. Let us collect all available measurements from N c coils in one vector y = [y T 1 , y T 2 ...y T N c ] T and create an augmented vectorized image x a = Tx with the usage of matrix T which is formed by stacking the identity matrix I N×N N c times row-wise. Then the acquisition process is defined through the following equation where A B consists of repeated A along the diagonal while C B is formed by stacking coil sensitivity maps C i along diagonal. This way a multi-coil reconstruction problem can be solved using the single-coil reconstruction method developed in this paper. This generalization in the definition of the measurement operator M = A B C B is easily introduced in step 2 in Algorithm 1 instead of A corresponding to the single-coil reconstruction scenario. The computational complexity is marginally increased and doesn't require additional code optimization.

Results
In the experimental evaluation we used different MRI images, starting from high-resolution MRI images, shown in Figure 2, and using real data acquired in k-space. For the following experiments, we used simulated sampling trajectories on a Cartesian grid with different sampling rates, except for the last experiment, when the measurements are undersampled with the non-Cartesian radial sampling trajectory used in a real scan. First, we consider reconstruction of MR image magnitude for different sampling rate (SR) obtained using various sampling trajectories. For this we utilize dataset of 248 T1 MRI brain slices acquired on a Cartesian grid at Ghent University hospital (one sagittal slice is presented in Figure 2). Then we focus on the reconstruction of complex MR images from single and multi-coil undersampled measurements. Complex T2-weighted brain images axial-1, axial-3 from [31] and [38] respectively (their magnitudes are presented in Figure 2) are used in experiments for the reconstruction of single-coil complex images while the T1-weighted brain image axial-2 [38] is used in the multi-coil reconstruction experiment. We also present the reconstruction results on real radially acquired measurements in k-space on non-Cartesian grid. These data, consisting of the acquisitions of a pomelo fruit were supplied by the Bioimaging Lab in Antwerp. In our experiments for sparse signal representation we used the non-decimated wavelet transform with 3 scales. For comparison, we report the results of LaSAL and LaSAL2 from [29], FCSA [5], FCSANL [39] and WaTMRI [25] with the original implementations. All these methods, except LaSAL, employ a compound regularization. The reconstruction results for complex images from single-coil and multi-coil measurements, are compared with the corresponding results of pFISTA [31] and P-LORAKS [38,40] methods. For quantitative comparison between the reconstructed and the reference MR image we adopt Peak Signal to Noise Ratio (PSNR) and Structural Similarity Index (SSIM). PSNR is defined as: where MAX 2 MR denotes squared maximum possible pixel value in the magnitude of MR image and MSE is the mean squared error between magnitudes of reconstructedx and reference image x r . For the calculation of SSIM index we used the simplified equation form: where µx, µ x r , σx, σ x r , σx x r are the local means, standard deviations, and cross-covariance for imageŝ x, x r and C 1 = (0.01 * MAX MR ) 2 and C 2 = (0.03 * MAX MR ) 2 are regularization constants to avoid instability in index calculation. Local statistics is calculated using isotropic Gaussian function with radius 1.5. Through experiments pixel values are stored in double-precision floating-point format with dynamic range of [0, 255] unless otherwise stated. Figure 3 shows the PSNR and SSIM for the reconstructed sagittal MR image from radially undersampled measurements with sampling rate (SR) ranging from 14% to 48%. The MRF-based methods LaSAL, LaSAL2, CLaTV and FCLaTV achieve a consistent and significant improvement in PSNR (at some sampling rates more than 4 dB) compared to WaTMRI, FCSA and FCSANL. The proposed methods CLaTV and FCLaTV outperform LaSAL and yield only slightly lower reconstruction PSNR and equally good SSIM as the best reference method LaSAL2. These results are achieved with automatic estimation of MRF parameters and without tuning the regularization parameters (the µ,τ 1 and τ 2 parameters are all set to 1).  Performances of the proposed methods are further tested through reconstruction of all 248 T1 MRI slices from data set. Results are shown in Figure 4 where besides mean PSNR values through iterations we also provide distribution of PSNR values for CLaTV, FCLaTV and LaSAL2 methods. On average, FCLaTV reached the peak performance much before CLaTV and LaSAL2. The LaSAL2 achieved its highest PSNR in average after 80 iterations which is 2 times more after FCLaTV. CLaTV, FCLaTV and LaSAL2 reached the same maximal median PSNR value of 37.5 dB through all iterations. The proposed methods need less iterations than LaSAL2 to reach this value which is presented in Figure 4 with the shift of PSNR distribution towards higher values in the first 20 iterations. WaTMRI, LaSAL2 and FCLaTV are further tested in reconstruction from measurements undersampled with straight vertical lines in k-space which is usual sampling trajectory in real scanners. From the results of this experiment, shown in Figure 5, the FCLaTV method outperforms WaTMRI and LaSAL2 visually and in terms of PSNR and SSIM measure. We tested the proposed methods in the reconstruction of a complex T2-weighted MR image axial-1 slice, the magnitude of which is shown in Figure 2. For the performance measure we use relative 2 norm error (RLNE), defined as

Data Sets Acquired on Cartesian Grid
where x r is the reference image recovered from all measurements whilex denotes the estimated (reconstructed) image from undersampled measurements. Algorithm steps 4 and 5, which refer to proximal operators are simultaneously applied on real and imaginary parts of the temporary reconstructed image x g obtained from step 2. We found that this adaptation of algorithm for handling reconstruction of complex images achieves the best performances. For the reference methods we use the pFISTA method from [31] with the Shift Invariant Discrete Wavelet Transform (SIDWT) as a tight frame for sparse signal representation. Results are shown in Figure 6 with respect to amount of CPU (Intel(R) Core(TM) i7-4700MQ CPU @ 2.4GHz, 8GB of RAM memory) time needed for reconstruction. For the both trajectories, CLaTV and FCLaTV outperform pFISTA-SIDWT in terms of RLNE measure. FCLaTV in less than 50s reaches the lowest RLNE for both trajectories, 0.0826 for radial and 0.0709 for random, while CLaTV needs around 70s to reach RLNE of 0.0846 for radial and 0.0711 for random trajectory.
A comparison of FCLaTV with the P-LORAKS [40] method is conducted by reconstructing the single channel T2-weighted complex brain image axial-3 from 50% of measurements sampled by random and uniform trajectory. The results are presented in Table 1 where for both sampling strategies FCLaTV achieved lower RLNE value than the reference P-LORAKS method.
For the multi-coil reconstruction scenario, we used a T1 weighted brain image axial-2 from [38], shown in Figure 2, and measurements gathered from 4 coils with known given sensitivity profiles. We incorporate random and uniform sampling trajectories with sampling rate of 14% per each coil. For both trajectories, the proposed FCLaTV achieved lower RLNE values reported in Table 1. Figure 7 shows visual comparison of reconstructions between FCLaTV and P-LORAKS methods.

Data Sets Acquired on Non-Cartesian Grid
In the following we present the reconstruction of a pomelo, acquired with radial sampling in the k-space. The data consist of 1608 radial lines, each with 1024 samples. We form undersampled versions by leaving out some of the radial lines. In particular, we aim to implement undersampling based on the golden ratio profile spacing [41], which guarantees a nearly uniform coverage of the space for an arbitrary number of the remaining radial lines. Starting from an arbitrary selected radial line, each next line is chosen by skipping an azimuthal gap of 111.246 • until we reach desired sampling rate. In practice we cannot always achieve this gap precisely (since we have a finite, although large, number of lines to start with). Therefore we choose the nearest available radial line relative to the position obtained after moving. Since we deal here with non-uniformly sampled k-space data, we need to employ the non-uniform FFT procedures [41], which are commonly used in MRI reconstruction and readily available. For the reference method we used LaSAL2 [29]. During the reconstruction we add different amount of Gaussian noise on undersampled measurements with zero mean and the following standard deviations: 0.01, 0.02, 0.03, 0.04 and 0.05. For each amount we perform 10 experiments of reconstruction with different noise realizations. From the obtained average performances of reconstruction in terms of SSIM for different standard deviation of noise, we calculate average across all suggested standard deviations and present results in the form of mean line with standard deviation bars in Figure 8. We empirically set τ 2 to 140 and 130 for CLaTV and FCLaTV respectively due to the different dynamic range of reference image [0,1]. Regularization parameters for LaSAL2 method are taken from [29]. As expected, standard deviation of performances reduce as the sampling rate increase and the mean SSIM becomes greater for all methods. At relatively small sampling rates, up to 60% CLaTV and FCLaTV outperforms LaSAL2 which is presented in visual comparison shown in the same Figure 8. CLaTV and FCLaTV much better recover structure in the image than LaSAL2 which is demonstrated visually in the images of reconstruction error.

Conclusions
We considered the problem of MRI image reconstruction and formulated it as an optimization problem with a quadratic least-squares type fidelity term and a composite regularization-the sum of a total variation and a MRF regularizer. We propose a novel efficient iterative method to solve the problem that involves proximal maps and Nesterov-like acceleration. The major technical novelty is in a computationally efficient method for approximating the MRF-regularizer proximal map; the method exhibits a novel soft-thresholding rule that is different from standard 1 -norm-induced soft-thresholding. Another major novelty is an efficient method for estimating the MRF model parameters on the fly, during the algorithm iterations. Significant improvements in the reconstruction performance are achieved compared to the related methods. Further research direction will include an extension of evaluation metrics with perceptual quality based and semantic-based metrics [42] and involving more undersampling trajectories from the real clinical scans.