Comparison of nonparametric methods for static visual field interpolation

Visual field testing with standard automated perimetry produces a sparse representation of a sensitivity map, sometimes called the hill of vision (HOV), for the retina. Interpolation or resampling of these data is important for visual display, clinical interpretation, and quantitative analysis. Our objective was to compare several popular interpolation methods in terms of their utility to visual field testing. We evaluated nine nonparametric scattered data interpolation algorithms and compared their performances in normal subjects and patients with retinal degeneration. Interpolator performance was assessed by leave-one-out cross-validation accuracy and high-density interpolated HOV surface smoothness. Radial basis function (RBF) interpolation with a linear kernel yielded the best accuracy, with an overall mean absolute error (MAE) of 2.01 dB and root-mean-square error (RMSE) of 3.20 dB that were significantly better than all other methods (p ≤ 0.003). Thin-plate spline RBF interpolation yielded the best smoothness results (p < 0.001) and scored well for accuracy with overall MAE and RMSE values of 2.08 and 3.28 dB, respectively. Natural neighbor interpolation, which may be a more readily accessible method to some practitioners, also performed well. While no interpolator will be universally optimal, these interpolators are good choices among nonparametric methods.


Introduction
Interpolation of static visual field sensitivity data offers several benefits. By resampling examination data onto a single common grid, it can unify disparate perimetry protocols. This facilitates the joint analysis of examinations conducted under different studies or with different test grid patterns, which otherwise would be difficult to compare. Interpolation is also useful for transforming data from irregular or sparse grids into a uniformly sampled format. Resampling onto a high-density regular grid produces a surface representation of the hill of vision (HOV) which offers more flexibility in visualizing examination results than conventional display methods that show only discrete numerical or coarsely quantized information [15,26]. Surface renderings of the HOV are helpful to clinicians not only in disease monitoring and treatment, but also in providing a visual model to aid in discussions of examination results with patients [13]. Furthermore, densely sampled HOV surfaces enable new quantitative analysis techniques and new clinical trial end points. For example, topographic interpretation of interpolated HOV surfaces supports contour-based and volumetric approaches to visual field analysis [27].
Here, we compare various interpolators to help identify the best methods for static visual field data. Similar interpolator comparisons have been made in other areas of study (for example, in mathematics [10], geology [4], and climatology [23]). So far, however, only preliminary work has been done regarding visual fields [22]. In this study, we extend that work to include performance assessments Abstract Visual field testing with standard automated perimetry produces a sparse representation of a sensitivity map, sometimes called the hill of vision (HOV), for the retina. Interpolation or resampling of these data is important for visual display, clinical interpretation, and quantitative analysis. Our objective was to compare several popular interpolation methods in terms of their utility to visual field testing. We evaluated nine nonparametric scattered data interpolation algorithms and compared their performances in normal subjects and patients with retinal degeneration. Interpolator performance was assessed by leave-one-out cross-validation accuracy and high-density interpolated HOV surface smoothness. Radial basis function (RBF) interpolation with a linear kernel yielded the best accuracy, with an overall mean absolute error (MAE) of 2.01 dB and root-mean-square error (RMSE) of 3.20 dB that were significantly better than all other methods (p ≤ 0.003). Thinplate spline RBF interpolation yielded the best smoothness results (p < 0.001) and scored well for accuracy with overall MAE and RMSE values of 2.08 and 3.28 dB, respectively. Natural neighbor interpolation, which may be a more readily accessible method to some practitioners, also performed well. While no interpolator will be universally optimal, these interpolators are good choices among nonparametric methods.

3
of smoothness in addition to accuracy. Furthermore, the interpolators in this study were chosen by their suitability for clinical research and development. We limited our investigation to scattered data techniques, which can accommodate any test grid pattern. Also, each method was either strictly nonparametric or, if not, implemented with a fixed, standardized parameter value to eliminate complications associated with parameter estimation. We compared their performances with visual field data from a group of healthy, normal volunteers and patients with retinitis pigmentosa (RP), a family of inherited retinal degenerations that is characterized by progressive visual field loss [14]. The best-performing interpolators are identified, and their relative merits are discussed in relation to the performance metrics. As one of the first quantitative interpolation comparisons for visual fields, the results from this study will be useful to clinicians and researchers looking for validated interpolation strategies for visual field data.

Static perimetry data
Full-field standard automated perimetry was performed on 10 normal subjects and 10-RP patients with an Octopus 101 perimeter (Haag-Streit AG, Köniz, Switzerland). The testing protocol used a 10 cd/m 2 background luminance, the GATE-i fast thresholding strategy [20], and a size V (64 mm 2 ) stimulus with 200-ms duration. Visual fields were tested with a binocularly symmetric, radially oriented, and centrally condensed grid pattern consisting of 164 test points (Fig. 1). The maximum allowable reliability factor (the percentage of catch trials that generated either a false-positive or a false-negative response) was 15 % for each normal examination and 25 % for each patient examination [27]. Replicate testing was performed within 90 days of the first test to obtain repeated measurements  for each eye. Table 1 presents the mean age in each subject group and the distributions of repeated examinations. Approval adhering to the tenets of the Declaration of Helsinki was obtained from the Oregon Health and Science University Institutional Review Board, and written informed consent was provided by all subjects.

Interpolation methods
Each perimetry examination produced a 3-D point cloud with N = 164 test points where each point was represented by a retinal location (x i , y i ) in angular coordinates and differential luminance sensitivity (z i ), which was the light sensitivity in dB measured at that location. As seen in Fig. 1, the test points were irregularly spaced, necessitating scattered data interpolation methods.
We evaluated the 9 interpolation algorithms listed in Table 2. Each algorithm is a pure interpolator in that it reproduced the original data exactly, with no modeling, fitting, or approximation. Four of the interpolators were implemented with the built-in grid data function in MAT-LAB 8.5 (MathWorks, Natick, MA, USA). The other five interpolators, including both radial basis function methods, were implemented with custom software in MATLAB.
The grid data-based methods are each related to the Delaunay triangulation [9] of the grid pattern (Fig. 1c), which produces a piecewise triangular network with nodes set to the original data point cloud. From this network, the interpolated value can be computed from either: the nearest node in the network for nearest neighbor (NN) interpolation, the linear plane containing the three nearest nodes for linear (Lin) interpolation, or the bivariate cubic polynomial passing through those three nodes for cubic (Cub) interpolation [1]. The fourth method was natural neighbor (NatN) interpolation, which is based on the Voronoi tessellation of the test points-the geometric dual to the Delaunay triangular network. Of these four methods, only cubic and natural neighbor interpolations are guaranteed to be continuous across the boundaries of the triangular network [1]; consequently, the linear and nearest neighbor interpolators will tend to have lower smoothness characteristics.
Two of the methods we analyzed were based on inverse distance weighting, also known as Shepard's method. Here, each interpolated value is a weighted combination of every examination data point. In the classical formulation (IDW), the weights are inversely proportional to power of the distances to the data points [21]. In this study, we used the traditional square distance (power = 2) to balance the influence of points far away (emphasized with a small power) and nearby (emphasized with a large power). We also included the modified formulation of this interpolator (IDW2), which improves the localization by only including data points if they are within a specified distance from the interpolation location [11]. For this distance, we used the mean separation between all locations in the test grid, which was 48.3°.
Two of the methods are radial basis function (RBF) interpolators, which form each interpolated value through a linear combination of kernel functions. The kernels have the form φ(r) and are dependent only on the (radial) distance between the output location and all examination data locations. The linear weights are determined by solving a data-dependent system of equations. We considered two types of nonparametric kernels: linear, φ(r) = r [7], and thin-plate spline, φ(r) = r 2 log r [24].
In ordinary Kriging (Krig) or Gaussian process regression, each interpolated value is a weighted combination where the weights are derived from spatial covariances [17]. This stochastic approach treats the data as a random field consisting of mean and residual values at each location. This method is natively parametric, but the parameters can be estimated automatically through optimization to fit a covariance function that models the residuals in each examination. Here, we considered an exponential model, which is appropriate for data with an exponentially decaying spatial autocorrelation [25].
There are many other scattered data interpolation methods available. These 9 methods were chosen for this analysis because they are commonly used and frequently appear in other comparison studies [4,8,10,23]. They are likely to be more easily available to researchers in the field than other esoteric methods, and they run relatively quickly on modern computers.

Performance evaluation
We evaluated the interpolators in terms of accuracy and the smoothness of the high-density interpolated HOV surfaces. Greater smoothness indicates a lower prevalence of undesired behaviors such as surface discontinuities and peakiness. To assess accuracy, we generated a "truth" data set Inverse distance weighting IDW [17] Improved inverse distance weighting IDW2 [9] Linear radial basis function RBFlin [6] Thin-plate spline radial basis function RBFtps [20] Ordinary Kriging Krig [13] {(x i , y i ,z i )} N i=1 for each eye by averaging the sensitivity values across all examinations for that eye. Here, where z ij is the measurement at the ith location from the jth examination of an eye, and N e is the total number of examinations for the eye.
We estimated interpolation accuracy with leave-oneout cross-validation (LOOCV). In LOOCV, a set of residual errors is produced at each grid location. Traditionally, the error at the kth location is ,i� =k , the examination data subset after removal of the kth point [24]. However, we are more interested in the interpolator's ability to recover the true value from the set of measurements because this is a better indicator of accuracy between the grid locations where new values are being synthesized. Thus, in this study, the residual error was e k =ẑ k −z k . For each examination, we computed the LOOCV residual error at all N locations.
We summarized the LOOCV errors with three standard metrics: the mean absolute error (MAE), the root-meansquare error (RMSE), and Willmott's modified index of agreement (d 1 ) [28], which are given by and Each metric was evaluated with the N b points, remaining after automatic detection and removal of data points in the blind spot [2,27], which would otherwise dominate the results. Here, is the mean truth value for the examination. The RMSE is based on squared differences and is more sensitive to outliers than MAE or d 1 , which are based on the absolute values of differences. RMSE and MAE are unbounded metrics and have units matching those of the data values (dB). In comparison, d 1 is dimensionless and takes values between 0 and 1, with 1 indicating perfect agreement between the interpolated values and the truth. We also measured the smoothness of HOV surfaces created by interpolation of each examination onto a highdensity uniform grid. These surfaces are plausible products that might be created for HOV visualization, generated as intermediates for further analysis and clinical trial end point computation, or for adaptive perimetric grid location strategies [5]. The high-density N x × N y grid spanned ±90° with N x = N y = 501 points. Prior to interpolation, we . In each boxplot, the interpolators are ordered left to right from best to worst performance. The linear and thin-plate spline RBF interpolators had the best performance according to all three accuracy metrics 1 3 inserted boundary points to the examination data: 72 points with z = 0 around a circle with radius 120° to represent the lack of visual sensitivity beyond the peripheral field. Surface smoothness was assessed with first-order (TV 1 , introduced in [19]) and second-order isotropic total variation (TV 2 , derived from [18]). Denoting h i,j as the interpolated surface and Ω as the set of points (i, j) within the convex hull of the test grid, with a similar structure for y. For both TV 1 and TV 2 , smaller values indicate a smoother surface. While both metrics are sensitive to edges, slopes, and transitions in the surface, TV 2 is more indicative for undesired peakiness because the first derivative is zero at local apexes. Statistical testing was performed in MATLAB. Due to the small sample size and lack of certainty about the data distributions, nonparametric hypothesis tests were used instead of standard t tests. First, one-tailed Wilcoxon signed-rank tests [12] determined whether the LOOCV performance for one interpolator was significantly better than the others, and whether the smoothness results were better than others. To account for the comparison among multiple interpolators, the significant level was decreased from 0.05 to 0.0055 (a factor of 9) via Bonferroni correction [6]. Second, two-tailed Wilcoxon rank-sum tests [12] determined whether the LOOCV performance for one interpolator was significantly different between the normal and patient subject groups. For these tests, a significance level of 0.05 was used.

Results
Boxplots with basic statistical summary measures for the LOOCV-based accuracy metrics (MAE, RMSE, and d 1 ) are presented in Fig. 2 for the normal and patient groups. For each metric, the interpolators are ordered by mean performance across all subjects. For all three accuracy metrics, the RBFlin, RBFtps, and NatN interpolators were the top three performers. Table 3 shows the mean values of each metric for all interpolators. Values are shown for three groups: all subjects, normal only, and patients only. In terms of the three accuracy metrics, the RBFlin consistently had the best result for all groups, with the only exception being d 1 performance in patients, where RBFlin has the second-best result. In general, the second-and third-most accurate interpolators were RBFtps and NatN respectively. Table 4 shows the results from determining whether the performance of RBFlin was significantly better than the other interpolators. It can be seen that RBFlin was significantly better in nearly all cases, with a few exceptions for comparisons with RBFtps, NatN, and Krig.
Boxplots of the smoothness metrics are shown in Fig. 3, where again the interpolators are ordered by mean performance across all subjects. Both IDW interpolators yielded the smallest TV 1 scores, but did not do nearly as well in terms of TV 2 . For the TV 2 metric, RBFtps yielded the smoothest surfaces overall. Considering both TV 1 and TV 2 , the RBFtps and RBFlin interpolators were the smoothest. Smaller values indicate more smoothness. In each boxplot, the interpolators are ordered left to right from most smooth to least. The thinplate spline RBF interpolator had the best performance according to the TV 2 metric, which is a more robust smoothness measure than TV 1 Visualizing the interpolated HOV surfaces helps assess the relative merit of the two smoothness metrics. Example HOV surfaces from several interpolators are shown in Fig. 4 for two RP patients. This figure illustrates the differences in smoothness characteristics that are summarized in Fig. 3. Nearest neighbor interpolation produces piecewise constant surfaces that yield large total variation scores. With IDW, significant dimpling with local peaks and valleys is apparent around the test grid locations, especially near the center where the test grid pattern is denser. The dimpling and peakiness also occurs with IDW2 (not shown). Surface TV 2 smoothness from Kriging was highly variable among patients; the Kriging surface is smooth and almost indistinguishable from RBFtps for subject A, yet for subject B, Kriging exhibited peakiness similar to IDW2albeit at the periphery instead of the center. The two RBF methods produced very similar surfaces, with differences only apparent around ridges and isolated rises. Figure 4 demonstrates the importance of TV 2 as the better metric for differentiating smoothness performance. The dimpled and peaky features of the IDW surfaces are not reflected in the TV 1 scores. The visually smoothest surfaces are created by the RBFtps and RBFlin interpolators, and these interpolators have the best TV 2 scores. As seen in Table 3, RBFtps had the best TV 2 scores for all groups. Table 5 shows the results of statistical testing of RBFtps against the other interpolators for each group. In all cases, RBFtps had significantly better TV 2 smoothness. Table 6 presents the significance levels from comparing the performance metrics between the normal and patient groups. For the accuracy metrics, there were significant differences between groups in most cases. For the smoothness metrics, values were significantly different between groups for every interpolator.

Discussion
The results of this analysis are consistent with those from other reviews of interpolators for different fields of study. For example, in a study of numerous interpolators on closed-form analytic functions, Franke found that the thinplate spline have the best scores for accuracy and visual representation [10]. Comparing RBF, IDW, and Kriging interpolators for digital terrain modeling applications, Erdogan found that the thin-plate spline RBF was the overall best performer, while IDW showed the largest errors.
Accuracy assessments with LOOCV require careful interpretation. The residual error reflects how well one data point can be recovered from its neighbors, and thus, it is proportional not only to the fidelity of the interpolator, but also to the spatial autocorrelation of the data. To isolate only the effect of the interpolator, we have focused on how the algorithms compare with one another instead of absolute error performance.
There is no practical method to obtain reliable ground truth for a visual field examination, especially in patients with progressive disease and unstable eye fixation. We constructed "truth" data by averaging data from replicate examinations under the assumption that the measurement errors are zero mean. However, this is only an estimate of the true sensitivity, which itself can change depending on patient fatigue and anxiety levels, and on the time of day [3]. True accuracy can only be gauged after the development of a realistic, full-scale visual field simulator. In terms of accuracy and smoothness, the linear and thin-plate spline RBF interpolators yielded the best results with the latter providing slightly smoother HOV surfaces. This is reflective of the properties of the RBF kernels: the thin-plate spline kernel has a much smaller discontinuity at r = 0 and is infinitely differentiable everywhere else. Natural neighbor interpolation, which is available in many data analysis software packages such as MATLAB, was the next best performer. Although nominally providing a good balance between the speed of linear interpolation and the complexity of cubic interpolation, these results show natural neighbor interpolation is superior to both in terms of accuracy and smoothness.
Among the best-performing interpolators, accuracy tended to be better with RP patients than normal subjects. However, the variance was often larger in patients, meaning the accuracy performance was less consistent. This behavior is likely due to the diversity of visual field patterns in this group and the high degree of correlation present in the scotomas and low-sensitivity regions. Smoothness was significantly better in normals than in patients for every interpolator. This result is not surprising given the lack of scotomas and other visual field defects among normals that would otherwise interfere with the stability and continuity of the HOV surface.
Here, we have analyzed only a small set of interpolation techniques. We specifically chose these methods because they are relatively simple and readily available, and make good initial candidates for clinical implementation. However, we recognize that incorporating rigorous parameter estimation could improve both IDW methods and allow additional interpolators into the study including parametric RBF approaches. Also, although we did not include techniques based on approximation or regularization, we believe that developing appropriate models for these techniques will help mitigate measurement error and variability and should be the focus of future research.
There is likely no unconditionally optimal method for visual field interpolation, only a best method for a specific set of circumstances. Machine learning and algorithm portfolio optimization techniques [16] could be employed to select which interpolator or interpolator combination is best suited for a particular instance.

Conclusion
Interpolation can be beneficial for both clinical and research analysis of static visual field data. Interpolation facilitates biomarker and clinical trial end point computation, multi-modal data alignment, and quantitative analyses that require higher sampling density. In this study of nine nonparametric methods, interpolation with linear and thin-plate spline radial basis functions yielded the best accuracy and smoothness performance. If these methods are unavailable or too difficult to implement, natural neighbor interpolation is the next best choice at a cost of reduced smoothness of interpolated surfaces.