Application of Dynamically Constrained Interpolation Methodology to the Surface Nitrogen Concentration in the Bohai Sea

Observations of ocean pollutants are usually spatiotemporally dispersive, while it is of great importance to obtain continuous distribution of ocean pollutants in a certain area. In this paper, a dynamically constrained interpolated methodology (DCIM) is proposed to interpolate surface nitrogen concentration (SNC) in the Bohai Sea. The DCIM takes the pollutant transport advection diffusion equation as a dynamic constraint to interpolate SNCs and optimizes the interpolation results with adjoint method. Feasibility and validity of the DCIM are testified by ideal twin experiments. In ideal experiments, mean absolute gross errors between interpolated observations and final interpolated SNCs are all no more than 0.03 mg/L, demonstrating that the DCIM can provide convincing results. In practical experiment, SNCs are interpolated and the final interpolated surface nitrogen distribution is acquired. Correlation coefficient between interpolated and observed SNCs is 0.77. In addition, distribution of the final interpolated SNCs shows a good agreement with the observed ones.


Introduction
The Bohai Sea is the largest and the only semiclosed inland sea in China, surrounded by land on its three sides. It connects to the north of the Yellow Sea through the Bohai Strait. The weak water exchange causes a poor self-purification ability of the Bohai Sea, making it difficult to be restored in a short time if the marine ecosystem is severely damaged. According to statistics, the sewage water of over 40 rivers flows into the Bohai Sea and the volume is nearly 890 × 10 8 m 3 [1], which inevitably aggravates the environmental problems, such as ocean eutrophication (the accumulation of nutrients: nitrogen, phosphorus, etc.). Moreover, the severe deterioration in marine environment has badly affected the development of fishery and the Bohai Sea is gradually losing its function as a fishing ground [2]. To maintain sustainable development, relevant researches about marine pollutants have been conducted. The mathematical models are considered as the most direct and effective way for quantification [3], and with help of a mathematical model knowing more about the temporal and spatial distributions of pollutants in the Bohai Sea plays an important role in environment restoration.
Activities in coastal oceans can help to speed up economic construction, but meantime it will cause serious pollution to marine ecosystems. A number of numerical studies have been carried out to simulate pollutant dispersion [4][5][6][7]. A two-dimensional water quality model was developed and applied to analyze and optimize the ecological programs, and it can simulate key model

The Dynamical Model
Considering the convection and diffusion processes, the governing equation of marine pollutant transport model is presented as follows [17,26,27] ∂C ∂t where C denotes the concentration of pollutants; t and x, y, z are the symbols of time and space, respectively; u and v represent the horizontal velocities (in x and y directions, respectively) and w represents the vertical velocity (in z direction); A H and K H denote the horizontal and vertical diffusion coefficients (A H = 100 m 2 /s, K H = 0.00001 m 2 /s), respectively; r is the pollutant attenuation coefficient, and r = 0, which means that the pollutant is treated as conservative substance [17]. For the finite difference scheme readers can be referred to the Appendix A.
The open boundary of the model is set at 122.5 • E, where a no-gradient condition and constant condition are used at the outflow boundary and the inflow boundary, respectively.

Observations and Model Setting
The marine environmental monitoring data in the Bohai Sea and the north Yellow Sea, provided by the North China Sea Environmental Monitoring Center, State Oceanic Administration, includes the data of February, May, August and October of each year. Nitrate, phosphate, pH, etc. are monitored in order to investigate the spatiotemporal distribution of different pollutant elements and then diagnose marine pollution matter [16]. The distribution of observation points is shown in Figure 1 and the date of observations is given in Table 1 [16]. The distribution of observation points is shown in Figure 1 and the date of observations is given in Table 1.    The monitoring data in 2009 are analyzed in practical experiments in this paper. The computational domain is the Bohai Sea (37 • N-41 • N, 117.5 • E-122.5 • E) with a 4 × 4 grid resolution. The computing time is 30 days and the time step is set to be 6 h. The three-dimensional Regional Ocean Model System (ROMS) provides the hydrodynamic flow field used in numerical experiments of present study [27].

Dynamically Constrained Interpolation Methodology
According to the study of Yaremchuk and Sentchev [28], the dynamically unconstrained interpolation method and the dynamically constrained interpolation method (DCIM) are two parts of interpolation methods, and the DCIM is used in the model to obtain the interpolation of observations. In addition, the adjoint method is used to optimize the interpolation results.

The Adjoint Methods
To optimize the interpolation results, the misfit between interpolation results and observations should be gradually reduced, which is described by the cost function and defined as [17] where C i,j,k and C i,j,k denote the interpolation results and the observation data at the point (i,j,k), respectively; K C represents the weighting matrix whose element equals to 1 when the observations are available; otherwise, K C = 0. The governing equation of marine pollutant transport model (1) can be written as Based on the Lagrange multiple method, the Lagrange function can be written as where C* represents the adjoint variable of C; Ω denotes the computational domain. The adjoint model of the pollution transport model is calculated from Equation (5). The gradients of the cost function with respect to model parameters can be calculated by Equation (6): where p stands for the model parameters.
In this paper, the adjoint equation and the gradient can be written as Equations (7) and (8), respectively.
where the superscript 1 denotes the SNC at the first iteration step.

The Process of DCIM
The DCIM contains the following steps, described as [25] follows.
Step 1. Propose a guess value of the parameters in the model.
Step 2. Acquire the interpolation of observations through forward model.
Step 3. Calculate the cost function and obtain the Lagrange multiple through adjoint model.
Step 4. Based on Equation (6), acquire the gradients of the cost function with respect to the parameters of the model and adjust the parameters along the opposite direction of the gradient.
Step 5. Stop calculating when the preset ending condition is satisfied; otherwise, go to step 2 and continue iterating.

Verification of the DCIM
In this part, we testified the feasibility and validity of the DCIM by ideal experiments. The observations used in ideal experiments were generated by integrating the given distribution of nitrogen over time. In order to maintain the universality, the initial guess values were set to be half of the max value of nitrogen concentration. Similar conclusions were drawn when other initial guess values were taken. Statistic results of other initial guess values are given in Appendix B.

Application of the DCIM in Ideal Twin Experiments
As mentioned by Elbern et al. [29], the validity of the assimilated or interpolated results can only be testified by the observations that were not assimilated or interpolated. Therefore, one-fifth of the total observations were randomly selected as observations that were not interpolated but only used for verification, and these observations were named as checking observations. The other observations were named as interpolated observations, which were to be interpolated with the DCIM. By this cross-validation, it can be distinguished that whether the interpolated observations were overfitted or not. If the interpolated observations were overfitted, there would be large misfit between simulation results and checking observations [30].
To eliminate the contingency induced by selection of checking observations, all idealized observations were randomly divided into five subsets and every subset was taken as the checking observations by turns. Therefore, there were five twin experiments, which were named as IE_11-IE_15, respectively. The statistics of these twin experiments are listed in Table 2. The Cressman interpolation method [31] was introduced to the ideal twin experiments IE_11-IE_15 so that the quality of results can be assessed. The comparison is presented in Appendix C. In order to quantify the difference between interpolated SNCs and observations, mean absolute gross error (MAGE) and mean normalized gross error (MNGE) were calculated as follows where N is the number of observations and I and O are the interpolated SNCs and observations, respectively.
In the five twin experiments, the rate of decline in MAGEs between checking observations and corresponding interpolated SNCs (K3) were 66.7%, 70.8%, 63.6%, 57.1%, and 71.4%, respectively; and the errors were no more than 0.12 mg/L. What is more, the MNGEs between checking observations and interpolated SNCs (K4) were all reduced by at least 50%. Besides, at the first iteration step, the MNGEs between interpolated observations and corresponding interpolated SNCs (K2) were all larger than 120%, while after applying the DCIM, K2 were all less than 6%. Thus, it can be demonstrated that the interpolated observations were not overfitted. Figure 2 shows that most dots were near the 1:1 line, no matter whether the dot stands for interpolated observations or checking observations, which indicates that the DCIM was an effective tool to interpolate observations.

Practical Applications
In this section, the observed data of SNCs were used to carry out practical experiments. The final MAGE and MNGE were 0.21 mg/L and 47.9%, respectively (Table 3), which were both reduced by more than 55%. The mean value and the standard deviation of the observed SNCs were 0.69 and 0.55 mg/L, respectively, while those of the interpolated SNCs were 0.69 and 0.46 mg/L, respectively. The results indicated that the interpolated SNCs were almost equal to the observed SNCs. Figure 3  To make the fullest use of all observations, another twin experiment IE_21 was conducted. In IE_21 all observations were used to interpolate and the final MAGE was 0.02 mg/L, which was reduced by 92.9%, while the final MNGE was 6.06% (see Table 2). Comparison between interpolated SNCs and prescribed observations is shown in Figure 2f. The correlation coefficient was near 1.00 on the whole, meaning that the final interpolated results were almost equal to the artificial observations. Thus, we can say that the DCIM was a feasible and effective method to interpolate the SNCs.

Sensitivity to Observational Errors
In the real ocean environments, the observations can be contaminated by noises. Therefore, another three twin experiments, named by IE_31, IE_32, and IE_33, respectively were conducted, in which random perturbations were added to the prescribed observations. The maximum percentages of observation errors were 10%, 20%, and 30% in three experiments, respectively. The comparison between interpolated SNCs and observations were shown in Figure 2g Table 2 also demonstrated that although the observations contained noises, the DCIM can still perform well when used to interpolate the SNCs. This means that the interpolation results may still be convincing when the DCIM was adopted in the practical situation.

Practical Applications
In this section, the observed data of SNCs were used to carry out practical experiments. The final MAGE and MNGE were 0.21 mg/L and 47.9%, respectively (Table 3), which were both reduced by more than 55%. The mean value and the standard deviation of the observed SNCs were 0.69 and 0.55 mg/L, respectively, while those of the interpolated SNCs were 0.69 and 0.46 mg/L, respectively. The results indicated that the interpolated SNCs were almost equal to the observed SNCs. Figure 3 showed the scatterplot to compare interpolated SNCs and observed SNCs visually. The 2:1, 1.25:1, 1:1, 0.85:1, and 1:2 lines were shown for reference. For 84.3% of the observations, the ratio of interpolated SNCs to the observed was between 0.5 and 2; for 11.6%, the ratio was over 2 and for 4.1%, the ratio was below 0.5. It was obvious that the closer the ratio was to 1, the close the interpolated SNCs were to the observed SNCs. For 53.7% of the observations, the ratio was between 0.85 and 1.25. What is more, the correlation coefficient between the interpolated SNCs and the observed SNCs was 0.77.
The statistical results mentioned above indicated that the interpolated SNCs with DCIM were coherent with the observed SNCs. The final distribution of the interpolated surface nitrogen concentration was given in Figure 4. The MAGE between each interpolated observation and interpolated SNC was shown in Figure 5. Statistics of MAGEs was shown in Figure 6. By statistics, we can know that 46.3% (56/121) of the MAGEs were no more than 0.1 mg/L and only 20.7% (25/121) of the MAGEs were over 0.3 mg/L. Figure 4 showed that high concentration appears in the three bays, while in the central Bohai Sea the concentration was low, and comparing with Figure 1 it showed a good agreement with the observed nitrogen concentration distribution.  K1 is MAGEs between the interpolated observations and the interpolated SNCs (mg/L); K2 is MNGEs between the interpolated observations and the interpolated SNCs.

Conclusions
In this paper, we interpolated the surface nitrogen concentration with the dynamically constrained interpolation methodology (DCIM). The pollutant transport model was taken as dynamic constraint and the interpolated results were optimized iteratively with the adjoint method.
The feasibility and validity of DCIM were testified with prescribed observations in ideal twin experiments. The statistics and the scatterplot of twin experiments illustrated that the interpolated

Conclusions
In this paper, we interpolated the surface nitrogen concentration with the dynamically constrained interpolation methodology (DCIM). The pollutant transport model was taken as dynamic constraint and the interpolated results were optimized iteratively with the adjoint method.
The feasibility and validity of DCIM were testified with prescribed observations in ideal twin experiments. The statistics and the scatterplot of twin experiments illustrated that the interpolated SNCs with DCIM were close to the prescribed observations and that the interpolated results were still convincing when noises were added to the prescribed observations. In practical experiment, the observed data were used to interpolate the surface nitrogen concentration with DCIM. The correlation coefficient between interpolated SNCs and observed SNCs was 0.77. The distribution of final interpolated surface nitrogen concentration shows a good agreement with the observations. The interpolated results in ideal experiment and in practical experiment demonstrated that the DCIM can be an effective method to interpolate the spatial and temporal distributing observations. For Equation (1), the difference form was as follows For Equation (5), the difference form can be described as

Appendix B. Statistic Results of Other Initial Guess Values
During calculation, there are three kinds of initial guess values, including the minimum, half and maximum of the nitrogen concentration. The simulation results were given in Table A1. K1 is MAGEs between the interpolated observations and the interpolated SNCs (mg/L); K2 is MNGEs between the interpolated observations and the interpolated SNCs (mg/L).
From Table A1 we know that after iterative optimization the final MAGE of the three initial guess values were all 0.02 mg/L. It meant that no matter what the initial guess value was, the final simulation results were almost the same. The reason why the final percentages of K2 were different was that due to the different initial guess values, the initial errors were different. So in the paper, half of the max value of nitrogen concentration was taken as the initial guess value.

Appendix C. Comparison between the DCIM and Cressman Interpolation Method
In order to access the quality of results, the Cressman interpolation (CI) method was introduced to the ideal twin experiments IE_11-IE_15. Comparison of MAGEs between CI and DCIM was shown in Table A2. In experiments, from IE_11 to IE_15, the MAGEs were reduced by 52.2%, 68.1%, 58.6%, 67.9%, and 77.1%, respectively, and errors were reduced by almost an order of magnitude. Through comparison, we can draw the conclusion that the DCIM was a much more effective interpolation method than the CI.