Evaluation of Chlorophyll-a Estimation Approaches Using Iterative Stepwise Elimination Partial Least Squares (ISE-PLS) Regression and Several Traditional Algorithms from Field Hyperspectral Measurements in the Seto Inland Sea, Japan

Harmful algal blooms (HABs) occur frequently in the Seto Inland Sea, bringing significant economic and environmental losses for the area, which is well known as one of the world’s most productive fisheries. Our objective was to develop a quantitative model using in situ hyperspectral measurements in the Seto Inland Sea to estimate chlorophyll a (Chl-a) concentration, which is a significant parameter for detecting HABs. We obtained spectra and Chl-a data at six stations from 12 ship-based surveys between December 2015 and September 2017. In this study, we used an iterative stepwise elimination partial least squares (ISE-PLS) regression method along with several empirical and semi-analytical methods such as ocean chlorophyll, three-band model, and two-band model algorithms to retrieve Chl-a. Our results showed that ISE-PLS using both the water-leaving reflectance (RL) and the first derivative reflectance (FDR) had a better predictive ability with higher coefficient of determination (R2), lower root mean squared error (RMSE), and higher residual predictive deviation (RPD) values (R2 = 0.77, RMSE = 1.47 and RPD = 2.1 for RL; R2 = 0.78, RMSE = 1.45 and RPD = 2.13 for FDR). However, in this study the ocean chlorophyll (OC) algorithms had poor predictive ability and the three-band and two-band model algorithms did not perform well in areas with lower Chl-a concentrations. These results support ISE-PLS as a potential coastal water quality assessment method using hyperspectral measurements.


Introduction
The Seto Inland Sea is an approximately 23,000 km 2 semi-enclosed coastal sea in western Japan, with an average depth of 38 m. This sea is well-known as one of the world's most productive fisheries due to its abundance of fish and variety of fish species [1]. Approximately 35 million people live around the Seto Inland Sea, bringing increased industrialization and urbanization that have made the Seto Inland Sea one of Japan's most industrialized regions [2]. However, productivity of fisheries is sensitive and thus vulnerable to anthropogenic stress. Eutrophication of coastal waters has affected fishing and other activities by contributing to harmful algal blooms (HABs), also known as red tides. HABs frequently occurred in the Seto Inland Sea during a period of high economic growth in the 1970s [3]. Although HABs have decreased from about 300 cases per year in 1976 to about 100 cases per

Study Area
The study area is in the central part of the Seto Inland Sea near the city of Fukuyama as shown in Figure 1. We selected six sites as sampling stations, which are described in Table 1. The study area has an average water depth of 17.3 m and water temperatures range from 7.3 • C (winter) to 28.4 • C (summer). The Seto Inland Sea is rich in fishery resources, with more than 50% of the total fish production contributed by aquaculture production. Additionally, there are approximately 17 fish farms being operated in the Tashima and Yokota areas near our study area [26]. Consequently, there are a range of organic compounds released by fish farm waste that may affect eutrophication due to dissolved nitrogen [27].

Data Collection and Pre-Processing
We conducted 12 ship-based surveys between 16 December 2015, and 7 September 2017 at six stations ( Figure 1). We performed in situ measurements of water-leaving reflectance (RL) using a MS-720 (Eiko Co. Ltd., Tokyo, Japan) spectrometer, with a spectral range of 350-1050 nm and a spectral interval of 3.3 nm. We interpolated the spectral interval to 1 nm when exporting data. We gathered spectral readings approximately 1 m above the water surface, with a probe field angle of 25°, between 9:00 a.m. and 11:00 a.m. under clear sky conditions. We measured Chl-a using a Hydrolab DS5 (Hach, Loveland, CO, USA) multiparameter data sonde with sensors for measuring Chl-a (range from 0 to 500 µg/L) and other water quality parameters (e.g., temperature, salinity, dissolved oxygen, etc.). In this study we used Chl-a data from just beneath the water surface.
With respect to spectral data, we identified reflectance ranges of 325-399 nm and 901-1075 nm as noise and removed them. We then smoothed the spectral data using a Savitzky-Golay filter with 15 smoothing points. To compare the performance for Chl-a estimation with the original waterleaving reflectance, we also computed the first derivative reflectance (FDR), which is calculated as the difference of water-leaving reflectance for two adjacent wavebands.

OC Algorithms
In this study we used the newest ocean chlorophyll (OC) algorithms (version 6) [28], which was formulated as a fourth-order polynomial with five coefficients. The newest OC algorithms yielded better statistical agreement between model data and Chl-a than the first version OC algorithm [12], which was a modified cubic polynomial relationship between Chl-a and a ratio of remote sensing reflectance Rrs.
Remote sensing reflectance Rrs(λ) can be represented by the relationship with water-leaving reflectance RL(λ) as follows [29]:

Data Collection and Pre-Processing
We conducted 12 ship-based surveys between 16 December 2015, and 7 September 2017 at six stations ( Figure 1). We performed in situ measurements of water-leaving reflectance (R L ) using a MS-720 (Eiko Co. Ltd., Tokyo, Japan) spectrometer, with a spectral range of 350-1050 nm and a spectral interval of 3.3 nm. We interpolated the spectral interval to 1 nm when exporting data. We gathered spectral readings approximately 1 m above the water surface, with a probe field angle of 25 • , between 9:00 a.m. and 11:00 a.m. under clear sky conditions. We measured Chl-a using a Hydrolab DS5 (Hach, Loveland, CO, USA) multiparameter data sonde with sensors for measuring Chl-a (range from 0 to 500 µg/L) and other water quality parameters (e.g., temperature, salinity, dissolved oxygen, etc.). In this study we used Chl-a data from just beneath the water surface.
With respect to spectral data, we identified reflectance ranges of 325-399 nm and 901-1075 nm as noise and removed them. We then smoothed the spectral data using a Savitzky-Golay filter with 15 smoothing points. To compare the performance for Chl-a estimation with the original water-leaving reflectance, we also computed the first derivative reflectance (FDR), which is calculated as the difference of water-leaving reflectance for two adjacent wavebands.

OC Algorithms
In this study we used the newest ocean chlorophyll (OC) algorithms (version 6) [28], which was formulated as a fourth-order polynomial with five coefficients. The newest OC algorithms yielded better statistical agreement between model data and Chl-a than the first version OC algorithm [12], which was a modified cubic polynomial relationship between Chl-a and a ratio of remote sensing reflectance R rs .
Remote sensing reflectance R rs (λ) can be represented by the relationship with water-leaving reflectance R L (λ) as follows [29]: R rs (λ) = πR L (λ) (1) where λ is wavelength. The version 6 OC algorithms use a fourth-order polynomial equation that can be written as: where R = log 10 (R rs (λ 1 )/R rs (λ 2 )) and the coefficients a 0 , a 1 , a 2 , a 3, and a 4 are different in the OC2, OC3, and OC4 algorithms. OC2 uses the blue/green ratio R rs (blue)/R rs (green) and R is described as R = log 10 ( R rs (green) ). OC3 uses a three-band formulation with a maximum of R rs band ratios R rs (blue1)/R rs (green) and R rs (blue2)/R rs (green), and R is expressed as R = log 10 ( ). Similarly, OC4 uses the maximum of three R rs ratios R rs (blue1)/R rs (green), R rs (blue2)/R rs (green) and R rs (blue3)/R rs (green)-to build the formulation, with R expressed as R = log 10 ( ). The OC4 algorithm has been considered a standard method for satellite detection of HABs over global waters [30,31].
In view of the sensor differences between the hyperspectral spectrometer used for in situ measurements and the SeaWiFS satellite sensor, we recalculated the parameters for Equation (2) by model recalibration using in situ Chl-a and R rs , which was in accordance with specified OC algorithm wavebands.

Three-Band Model
The three-band model uses the NIR and red wavebands and is formulated as [19,32]: where R λi is the reflectance at a wavelength of λi nm. Previous study found the optimal spectral ranges for these wavelengths to be, λ1 = 660-670 nm, which is maximally sensitive to absorption by Chl-a; λ2 = 690-720 nm, which is minimally sensitive to absorption by Chl-a; and λ3 = 720-750 nm, which is minimally affected by absorption by any constituents (e.g., Chl-a, suspended solids, etc.) [32,33]. We expected to find the optimal spectral ranges of λ1, λ2, and λ3 for Chl-a estimation by spectrally tuning the conceptual model using a stepwise technique [34]. First, we set λ2 and λ3 to 700 nm and 750 nm, respectively, and then linearly regressed using all available bands and Chl-a to obtain the first estimate of λ1, with which there was a high correlation coefficient (r). After we fixed λ1, we set λ2 as an unknown waveband and linearly regressed to find an optimal λ2 based on the best r value using the reflectance corresponding to a fixed λ1 and an assumed λ3. Analogously, we confirmed the optimal λ3 using the reflectance corresponding with fixed λ1 and λ2 values.

Two-Band Model
The NIR/red two-band model has been widely used to retrieve Chl-a concentrations in turbid productive waters to identify phytoplankton blooms [35]. This model is formulated as follows: where λ1 is in the red region and λ2 is in NIR region. According to the band tuning method [34], we tuned the model to select the optimal NIR and red bands for Chl-a retrieval in this research area and compared its accuracy with the previous model, i.e., NIR/red model using wavelengths of 705 nm in the NIR region and 670 nm in the red region [18].

ISE-PLS
Partial least square (PLS) is useful for handling many descriptors even when co-linearity and noise in the model building regression are present [36]. The standard PLS regression equation can be expressed as follows: where y is the response variable that represents Chl-a, x i is the predictor variable representing spectral data such as R L or FDR values for spectral bands 1 to i (400-900 nm), β i is the estimated weighted regression coefficient, and ε is the error vector. In the PLS model, the original predictor variables (x) are projected onto a small number of orthogonal latent variables to simplify their relationships with response variables (y) [37]. We selected the optimal number of latent variables (NLV) in the final model using the leave-one-out (LOO) cross-validation method with a minimum value of the root mean squared error (RMSE), which is calculated as follows: where y i and y p represent sample i's measured and predicted Chl-a, respectively, and n is the number of samples in the dataset (n = 59). The iterative stepwise elimination PLS (ISE-PLS) uses a model-wise elimination technique [24] that permits the removal of less useful descriptors to improve predictive performance. This process is based on the importance of the predictor z i , which is defined as: where I is the maximum number of variables, s i is the standard deviation of predictor x i (each predictor includes 59 samples). PLS modeling uses all available wavebands (501 bands between 400 and 900 nm).
Predictors are then evaluated based on the value of the importance of predictor z i . The predictor with minimum importance (i.e., the minimum z i ) is eliminated in each elimination cycle and the remaining predictors are used to recalibrate the model [38]. Finally, a model with maximum predictive ability is selected using the minimum RMSE value from the cross-validation.

Evaluation of Predictive Ability
We used the coefficient of determination (R 2 ), RMSE, and bias to evaluate the predictive ability of empirical and semi-analytical algorithms such as OC, three-band, and two-band model algorithms. Higher R 2 values and a lower RMSE indicate better Chl-a estimation performance, and bias represents systematic difference between actual and predicted values. To evaluate the ISE-PLS predictive ability, we used R 2 and RMSE from the LOO cross-validation in the final model. Additionally, we introduced the residual predictive deviation (RPD), which is defined as the ratio of the standard error of the prediction to the standard deviation, as the evaluating indicator. RPD can be expressed as RPD = SD/RMSE [39]. As shown in a previous study by Chang and Laird (2002) [40], an RPD > 2 indicates a model with good predictive ability, 1.4 < RPD < 2 indicates moderately good model in need of some improvement, and an RPD < 1.4 means the model has poor predictive ability. Finally, a method for NLV selection was used in the final PLS model, which has been reported can lower the risk of over-fitting [41]. The evaluation was basing on the sum of RMSE in cross-validation and Jaggedness (J), defined as: where j is NLV and β ji represents the regression coefficient when using j latent variables. At first, the RMSE in cross-validation and J were calculated from each NLV model (maximum of 10 latent variables were set) basing on ISE-PLS selected variables, then each value was rescaled to the range [0-1] as follows: where the subscript r refers to rescaled. The lowest value of RMSE r + J r indicates the optimal NLV for the final PLS model. We performed all data handling and regression analyses using Matlab software version 8.6 (MathWorks, Sherborn, MA, USA). Table 2 shows Chl-a concentration descriptive statistics from this study, including stations, number of samples, minimum (Min), maximum (Max), mean, standard deviation (SD) and coefficient of variation (CV).

Performance of Models for All Dataset
We used several empirical and semi-analytical models for Chl-a retrieval, the results of which are shown in Table 3. We initially used three standard empirical algorithms, OC2, OC3, and OC4. The first row of Figure 2 shows scatter plots between in situ measured Chl-a and Chl-a derived from OC models. The results show a linear relationship between measured and modelled Chl-a for all three OC algorithms, with poor R 2 values (0.36, 0.31, and 0.30, respectively for OC2, OC3, and OC4). In addition, results of all three OC algorithms underestimate Chl-a, indicated by the bias (−2.32, −2.71, and −2.70, respectively for OC2, OC3, and OC4). The second row of Figure 2 shows scatter plots between recalibrated OC algorithms and Chl-a. The R 2 values for all three OC algorithms were slightly improved (0.39, 0.36, and 0.35 respectively for OC2, OC3, and OC4), and scattered points were closer to the 1:1 line with a smaller bias (−0.46, −0.49, and −0.49, respectively for OC2, OC3, and OC4). For both the standard and recalibrated OC models, the OC2 algorithm performed better than OC3 and OC4 for Chl-a retrieval in this study; however, its predictive ability remains poor due to its low R 2 value.
The three-band and two-band algorithms were both based on the NIR region, which has high absorption by water, and the red region, which has high absorption by Chl-a. Figure 3 shows the three-band algorithm tuning process. The optimal λ1 appeared at 664 nm where the r value is highest when using assumed λ2 and λ3 values of 700 nm and 750 nm, respectively. λ2 and λ3 appeared at 695 nm and 736 nm when using the tuning method. These results showed a linear relationship between the three-band algorithm and Chl-a concentration with a R 2 value of 0.46, as shown in Figure 4.             The two-band NIR red model results showed an incompact linear relationship between the reflectance ratios of 705 nm and 670 nm and measured Chl-a, with a poor R 2 value of 0.17, as shown in Figure 5a. As with the three-band model, we tuned the spectral position to obtain the optimal NIR and red wavebands. We initially set the NIR waveband to 705 nm and then selected the optimal red region waveband, which we set from 620 nm to 680 nm based on the highest r value. Figure 6a shows that 666 nm was the optimal red waveband with a r value of 0.43. After fixing the optimal red waveband, we selected the optimal NIR region waveband, which we set from 680 nm to 740 nm. As shown in Figure 6b, we selected 693 nm as the best NIR waveband with a r value of 0.63. Figure 5b shows a linear relationship between the reflectance ratios of 693 nm and 666 nm and measured Chl-a, with a R 2 of 0.39.  The two-band NIR red model results showed an incompact linear relationship between the reflectance ratios of 705 nm and 670 nm and measured Chl-a, with a poor R 2 value of 0.17, as shown in Figure 5a. As with the three-band model, we tuned the spectral position to obtain the optimal NIR and red wavebands. We initially set the NIR waveband to 705 nm and then selected the optimal red region waveband, which we set from 620 nm to 680 nm based on the highest r value. Figure 6a shows that 666 nm was the optimal red waveband with a r value of 0.43. After fixing the optimal red waveband, we selected the optimal NIR region waveband, which we set from 680 nm to 740 nm. As shown in Figure 6b, we selected 693 nm as the best NIR waveband with a r value of 0.63. Figure 5b shows a linear relationship between the reflectance ratios of 693 nm and 666 nm and measured Chl-a, with a R 2 of 0.39.

Performance of Models for Separated Dataset
Remote sensing methods for retrieving water quality parameters contain spatial and temporal variations because the water body components that affect reflection properties vary in space and time. To further clarify the most fitted Chl-a retrieval method in the research area, we analysed algorithms using a separated dataset of six stations. Figure 7 shows the RL and average RL of each station for the research period. As we can see, the average RL of each station shows little difference those of the others. Especially at station 4 ( Figure 7d), there is an obvious reflectance peak around 580 nm, which is a result of minimum absorption by all pigments [42]. Figure 1 shows that station 4 is near a river, which could bring various nutrients from land to the coastal area. Consequently, the highest max Chl-a and SD values were obtained at station 4, as shown in Table 2.

Performance of Models for Separated Dataset
Remote sensing methods for retrieving water quality parameters contain spatial and temporal variations because the water body components that affect reflection properties vary in space and time.
To further clarify the most fitted Chl-a retrieval method in the research area, we analysed algorithms using a separated dataset of six stations. Figure 7 shows the R L and average R L of each station for the research period. As we can see, the average R L of each station shows little difference those of the others. Especially at station 4 (Figure 7d), there is an obvious reflectance peak around 580 nm, which is a result of minimum absorption by all pigments [42]. Figure 1 shows that station 4 is near a river, which could bring various nutrients from land to the coastal area. Consequently, the highest max Chl-a and SD values were obtained at station 4, as shown in Table 2.

Performance of Models for Separated Dataset
Remote sensing methods for retrieving water quality parameters contain spatial and temporal variations because the water body components that affect reflection properties vary in space and time. To further clarify the most fitted Chl-a retrieval method in the research area, we analysed algorithms using a separated dataset of six stations. Figure 7 shows the RL and average RL of each station for the research period. As we can see, the average RL of each station shows little difference those of the others. Especially at station 4 (Figure 7d), there is an obvious reflectance peak around 580 nm, which is a result of minimum absorption by all pigments [42]. Figure 1 shows that station 4 is near a river, which could bring various nutrients from land to the coastal area. Consequently, the highest max Chl-a and SD values were obtained at station 4, as shown in Table 2.  We analysed regressions using all possible band ratios in the 400 to 900 nm range and analysed Chl-a concentration for each station, as shown in Figure 8. Two-dimensional correlation matrixes indicate the R 2 distribution for all band ratios (250,000 combinations). The yellow regions indicate high R 2 values for calibration between band ratios and Chl-a concentration, with most figures indicating that high R 2 values appear in the NIR and red regions (near 680-710 nm) and green region (near 500-600 nm). However, Figure 8a shows no correlation between NIR/red ratio and Chl-a concentration, which may indicate that the NIR/red ratio doesn't fit for water areas with lower and narrower Chl-a concentration ranges, as indicated the lowest mean and SD values shown in Table 2, which is consistent with a previous study on band ratio analysis [18]. We analysed regressions using all possible band ratios in the 400 to 900 nm range and analysed Chl-a concentration for each station, as shown in Figure 8. Two-dimensional correlation matrixes indicate the R 2 distribution for all band ratios (250,000 combinations). The yellow regions indicate high R 2 values for calibration between band ratios and Chl-a concentration, with most figures indicating that high R 2 values appear in the NIR and red regions (near 680-710 nm) and green region (near 500-600 nm). However, Figure 8a shows no correlation between NIR/red ratio and Chl-a concentration, which may indicate that the NIR/red ratio doesn't fit for water areas with lower and narrower Chl-a concentration ranges, as indicated the lowest mean and SD values shown in Table 2, which is consistent with a previous study on band ratio analysis [18]. We conducted calibrations between the three-band algorithm and Chl-a concentration at each station, the results of which are shown in Figure 9. We selected three optimal wavebands using a tuning method before conducting calibration for each station. It is apparent that station 4 performed better than other stations (1, 2, and 3) with the same dataset number (N = 12), with a R 2 value of 0.66, using wavebands of 674, 705, and 750 nm. However, we obtained a poor R 2 at station 1, which had the lowest Chl-a concentration in this study, using wavebands of 664, 689, and 750 nm. These results may indicate that the three-band algorithm performs well in water with relatively higher Chl-a concentrations, which is consistent with several previous studies [19,22]. Figures 9e,f also show better R 2 values (0.63 at station 5 and 0.81 at station 6) with calibration between the three-band algorithm and Chl-a concentration. This provides a possibility of using the three-band algorithm to estimate We conducted calibrations between the three-band algorithm and Chl-a concentration at each station, the results of which are shown in Figure 9. We selected three optimal wavebands using a tuning method before conducting calibration for each station. It is apparent that station 4 performed better than other stations (1, 2, and 3) with the same dataset number (N = 12), with a R 2 value of 0.66, using wavebands of 674, 705, and 750 nm. However, we obtained a poor R 2 at station 1, which had the lowest Chl-a concentration in this study, using wavebands of 664, 689, and 750 nm. These results may indicate that the three-band algorithm performs well in water with relatively higher Chl-a concentrations, which is consistent with several previous studies [19,22]. Figure 9e,f also show better R 2 values (0.63 at station 5 and 0.81 at station 6) with calibration between the three-band algorithm and Chl-a concentration. This provides a possibility of using the three-band algorithm to estimate Chl-a in these areas; nevertheless, a shortness of data (N = 6 at station 5, N = 5 at station 6) may also provide uncertainty to the results. Chl-a in these areas; nevertheless, a shortness of data (N = 6 at station 5, N = 5 at station 6) may also provide uncertainty to the results.

ISE-PLS Calibration and Validation
ISE-PLS selected optimal variables both for RL and FDR data due to the iterative stepwise elimination function, to select the optimal NLV in the final model, the relationship between RMSEr + Jr and NLV (from 1 to 10) were analysed, as shown in Figure 10. The minimum value showed the optimal NLV (6 for RL and 4 for FDR), larger NLV indicated over-fitting and lower NLV indicated under-fitting. Table 4 summarizes ISE-PLS calibration and validation results using RL and FDR for Chl-a retrieval. As Table 4 shows, ISE-PLS had the same R 2 values (0.83 for both RL and FDR) and slightly different RMSE values (1.29 for RL and 1.28 for FDR) for calibration. We also found that ISE-PLS using both datasets had better Chl-a retrieval performance than other algorithms, which was indicated by R 2 (0.77 for RL and 0.78 for FDR) and RPD (2.10 for RL and 2.13 for FDR) values in the validation results. ISE-PLS using FDR performed marginally better than ISE-PLS using RL because of the higher R 2 and RPD and lower RMSE (1.47 for RL and 1.45 for FDR) values for validation. Figure 11a,c show validation plots for ISE-PLS using RL and FDR, respectively. Both figures show a

ISE-PLS Calibration and Validation
ISE-PLS selected optimal variables both for R L and FDR data due to the iterative stepwise elimination function, to select the optimal NLV in the final model, the relationship between RMSE r + J r and NLV (from 1 to 10) were analysed, as shown in Figure 10. The minimum value showed the optimal NLV (6 for R L and 4 for FDR), larger NLV indicated over-fitting and lower NLV indicated under-fitting. Table 4 summarizes ISE-PLS calibration and validation results using R L and FDR for Chl-a retrieval. As Table 4 shows, ISE-PLS had the same R 2 values (0.83 for both R L and FDR) and slightly different RMSE values (1.29 for R L and 1.28 for FDR) for calibration. We also found that ISE-PLS using both datasets had better Chl-a retrieval performance than other algorithms, which was indicated by R 2 (0.77 for R L and 0.78 for FDR) and RPD (2.10 for R L and 2.13 for FDR) values in the validation results. ISE-PLS using FDR performed marginally better than ISE-PLS using R L because of the higher R 2 and RPD and lower RMSE (1.47 for R L and 1.45 for FDR) values for validation. Figure 11a,c show validation plots for ISE-PLS using R L and FDR, respectively. Both figures show a close linear relationship between predicted and observed Chl-a with the exception of a few scatter points.
close linear relationship between predicted and observed Chl-a with the exception of a few scatter points. Because of the iterative stepwise elimination function, we selected the optimal wavebands using ISE-PLS for both RL and FDR datasets based on the lowest RMSE for validation, as shown in Figure  11b,d. Selected wavebands for RL ranged from 495 to 496 nm [43], 589 to 593 nm [44], and 660 to 667 nm [45], which had been proven related to phytoplankton absorption, 544 to 549 nm [43], and 689 to 696 nm [32], which indicated relationship with Chl-a fluorescence, and 730 nm which is also sometimes used for Chl-a retrieval [45]. We selected a total of 30 (6%) informative wavebands from all 501 wavebands. And for FDR, we selected 10 (2%) informative wavebands from all 501 wavebands.  Because of the iterative stepwise elimination function, we selected the optimal wavebands using ISE-PLS for both R L and FDR datasets based on the lowest RMSE for validation, as shown in Figure 11b,d. Selected wavebands for R L ranged from 495 to 496 nm [43], 589 to 593 nm [44], and 660 to 667 nm [45], which had been proven related to phytoplankton absorption, 544 to 549 nm [43], and 689 to 696 nm [32], which indicated relationship with Chl-a fluorescence, and 730 nm which is also sometimes used for Chl-a retrieval [45]. We selected a total of 30 (6%) informative wavebands from all 501 wavebands. And for FDR, we selected 10 (2%) informative wavebands from all 501 wavebands.  Because of the iterative stepwise elimination function, we selected the optimal wavebands using ISE-PLS for both RL and FDR datasets based on the lowest RMSE for validation, as shown in Figure  11b,d. Selected wavebands for RL ranged from 495 to 496 nm [43], 589 to 593 nm [44], and 660 to 667 nm [45], which had been proven related to phytoplankton absorption, 544 to 549 nm [43], and 689 to 696 nm [32], which indicated relationship with Chl-a fluorescence, and 730 nm which is also sometimes used for Chl-a retrieval [45]. We selected a total of 30 (6%) informative wavebands from all 501 wavebands. And for FDR, we selected 10 (2%) informative wavebands from all 501 wavebands.   Table 4. The coefficient of determination (R 2 ) and root mean square error (RMSE) for calibration of iterative stepwise elimination partial least squares (ISE-PLS) and leave-one-out (LOO) cross-validation using the entire dataset (N = 59), with residual predictive deviation (RPD), number of wavebands, and percent ratio in the full spectrum (i = 501). In this study, station 4 may have been affected by river nutrients, we carried out ISE-PLS regressions using the R L dataset except for at station 4 to decrease the impact of different water types. Figure 12 shows the validation plot between observed and predicted Chl-a, which was obtained using the LOO method in the ISE-PLS regression. As we can see, the maximum Chl-a concentration (from 14.33 to 8.74 µg/L) decreased after removing the station 4 dataset. Results shows a close linear relationship between observed and predicted Chl-a; however, compared to the R 2 value (0.77) obtained by ISE-PLS validation using all datasets (N = 59), a relatively lower R 2 value (0.72) was obtained using the datasets except station 4 (N = 47), which may indicate that ISE-PLS performed better in water areas with a wide range of Chl-a concentrations. Table 4. The coefficient of determination (R 2 ) and root mean square error (RMSE) for calibration of iterative stepwise elimination partial least squares (ISE-PLS) and leave-one-out (LOO) crossvalidation using the entire dataset (N = 59), with residual predictive deviation (RPD), number of wavebands, and percent ratio in the full spectrum (i = 501). In this study, station 4 may have been affected by river nutrients, we carried out ISE-PLS regressions using the RL dataset except for at station 4 to decrease the impact of different water types. Figure 12 shows the validation plot between observed and predicted Chl-a, which was obtained using the LOO method in the ISE-PLS regression. As we can see, the maximum Chl-a concentration (from 14.33 to 8.74 µg/L) decreased after removing the station 4 dataset. Results shows a close linear relationship between observed and predicted Chl-a; however, compared to the R 2 value (0.77) obtained by ISE-PLS validation using all datasets (N = 59), a relatively lower R 2 value (0.72) was obtained using the datasets except station 4 (N = 47), which may indicate that ISE-PLS performed better in water areas with a wide range of Chl-a concentrations.

Empirical and Semi-Analytical Models Performance
In the present study, several empirical and semi-analytical models have been established for Chl-a estimation in the Seto Inland Sea. Results show all the standard empirical OC algorithms have underestimated the in situ Chl-a, which turns out to be lower RMSE values. The underestimated Chla calculated from standard satellite algorithms possibly due to the uncertainty in the performance of atmospheric-correction algorithms [46]. The coefficients for OC algorithms have been recalculated using in situ Chl-a and hyperspectral dataset, however, OC algorithms using recalculated coefficients also show lower R 2 and higher RMSE values. The three-band model utilized in this study shows a better R 2 than all OC algorithms, and the optimal spectral bands (664, 695, and 736 nm) selected from band tuning are in accord with previous study [19]. However, the low accuracy may indicate the model is unstable as the RMSE shown, which may attribute to the lower Chl-a concentration (maximum value of 14.33 µg/L) in this study, because other optically active constituents (CDOM,

Empirical and Semi-Analytical Models Performance
In the present study, several empirical and semi-analytical models have been established for Chl-a estimation in the Seto Inland Sea. Results show all the standard empirical OC algorithms have underestimated the in situ Chl-a, which turns out to be lower RMSE values. The underestimated Chl-a calculated from standard satellite algorithms possibly due to the uncertainty in the performance of atmospheric-correction algorithms [46]. The coefficients for OC algorithms have been recalculated using in situ Chl-a and hyperspectral dataset, however, OC algorithms using recalculated coefficients also show lower R 2 and higher RMSE values. The three-band model utilized in this study shows a better R 2 than all OC algorithms, and the optimal spectral bands (664, 695, and 736 nm) selected from band tuning are in accord with previous study [19]. However, the low accuracy may indicate the model is unstable as the RMSE shown, which may attribute to the lower Chl-a concentration (maximum value of 14.33 µg/L) in this study, because other optically active constituents (CDOM, tripton etc.) may affect the water reflection properties [47]. Similarly, the NIR/red tuning model also shows a better R 2 than OC algorithms, while a poor accuracy, the optimal spectral bands (666 and 693 nm) are in accord with previous study [19]. For the NIR/red model using bands 670 and 705 nm, result shows a poor accuracy, which may attribute to the red shift property [48], that is, there is a red shift in the fluorescence peak to the longer wavelength in higher Chl-a concentration waters.

ISE-PLS Performance
Through the above analyses, it has been shown that ISE-PLS using both R L and FDR performed better than other algorithms, including the OC, three-band model, and NIR/red two-band model algorithms, indicated by higher R 2 and lower RMSE values. This finding is consistent with previous studies in which PLS method can be used as useful method in retrieval of water quality parameters [22,25]. In addition, ISE-PLS using FDR performed better than ISE-PLS using R L as indicated by higher R 2 and RPD values and lower RMSE for validation. This may have resulted from derivative analysis reducing random noise and removing the effects of suspended matter on Chl-a concentration estimates [22]. Using ISE-PLS, a total of 30 (6%) wavebands for R L and 10 (2%) wavebands for FDR have been selected as optimal bands from all 501 wavebands, separately, which indicate that 94% wavebands when using R L and 98% wavebands when using FDR are redundant for Chl-a estimation in the Seto Inland Sea. This result provide an evidence that ISE-PLS can be used as an approach for optimal wavebands selection, especially when using hundreds wavebands of hyperspectral data.

Applications of ISE-PLS Method
This study established a potential model for Chl-a estimation in the Seto Inland Sea, which provide the possibility for detecting HABs, since Chl-a concentrations can be used as indices of HABs [30]. In general, HABs detection using Chl-a algorithms involve the generalized relationship between a high chlorophyll content and HABs occurrences [6]. However, due to the sampling data show lower Chl-a concentrations and short of HABs occurrences data, it is difficult to build the relationship between HABs and Chl-a concentrations in this study. Nevertheless, the ISE-PLS is a useful method to estimate Chl-a concentration, which can be used to evaluate the water quality, so as to management the aquaculture in the Seto Inland Sea.

Conclusions
In this study, we developed various models for estimating water Chl-a concentration in the Seto Inland Sea, including ISE-PLS using both R L and FDR and other methods such as OC, three-band model, and two-band model algorithms. Our results showed that the ISE-PLS method is effective for predicting Chl-a concentration in the Seto Inland Sea using in situ measured spectral data. With a higher prediction accuracy, ISE-PLS also selects important wavebands that match previously published studies. Additionally, ISE-PLS using FDR is marginally enhanced compared to using R L for Chl-a retrieval. However, OC algorithms are not robust in this present study, and three-band and two-band model algorithms did not perform well in water areas with lower Chl-a concentration. Our results also indicate that the ISE-PLS method can perform better when used in water areas with a wide range of Chl-a concentrations. These results provide potential insights into coastal water quality assessment by using a Chl-a estimation method with hyperspectral measurements.
Author Contributions: Y.S., Z.W. and K.K. designed this study and the fieldwork; Z.W., K.K. and S.O. performed the fieldwork; Z.W. analyzed the data and wrote the manuscript; Z.W. and Y.S. revised the paper.

Conflicts of Interest:
The authors declare no conflict of interests.