Deep Learning Convolutional Neural Network for the Retrieval of Land Surface Temperature from AMSR2 Data in China

A convolutional neural network (CNN) algorithm was developed to retrieve the land surface temperature (LST) from Advanced Microwave Scanning Radiometer 2 (AMSR2) data in China. Reference data were selected using the Moderate Resolution Imaging Spectroradiometer (MODIS) LST product to overcome the problem related to the need for synchronous ground observation data. The AMSR2 brightness temperature (TB) data and MODIS surface temperature data were randomly divided into training and test datasets, and a CNN was constructed to simulate passive microwave radiation transmission to invert the surface temperature. The twelve V/H channel combinations (7.3, 10.65, 18.7, 23.8, 36.5, 89 GHz) resulted in the most stable and accurate CNN retrieval model. Vertical polarizations performed better than horizontal polarizations; however, because CNNs rely heavily on large amounts of data, the combination of vertical and horizontal polarizations performed better than a single polarization. The retrievals in different regions indicated that the CNN accuracy was highest over large bare land areas. A comparison of the retrieval results with ground measurement data from meteorological stations yielded R2 = 0.987, RMSE = 2.69 K, and an average relative error of 2.57 K, which indicated that the accuracy of the CNN LST retrieval algorithm was high and the retrieval results can be applied to long-term LST sequence analysis in China.


Introduction
Land surface temperature (LST) is an important factor for measuring the energy balance of Earth's surface. LST is one of the most influential factors in climate, ecology and agriculture [1]. The timely and effective acquisition of large-scale LST information is of great practical significance for regional climate and agricultural monitoring [2][3][4]. As the most effective method to obtain land surface information over large areas, remote sensing can rapidly acquire LST information. After years of research and development, thermal infrared LST retrieval algorithms, including MODIS, and AVHRR data, have been developed and achieved high precision [5][6][7][8][9]. However, thermal infrared remote sensing is greatly influenced by moisture in the atmosphere and cannot penetrate the cloud layer, making it difficult to apply in all weather conditions [10,11]. Previous researchers analyzed the thermal infrared LST products provided by NASA, and most of the LST products in 60% of the middle and low latitude areas were affected by clouds and missing data, which resulted in certain limitations to

Remote Sensing Data
Advanced Microwave Scanning Radiometer 2 (AMSR2) L3 TB data are used as an independent variable for training the surface temperature retrieval algorithm in this research. The time span of the AMSR2 sensor is from July 2012 to February 2019. There are 14 horizontal/vertical polarization observation channels, which are 6.9, 7.3, 10.65, 18.7, 23.8, 36.5 and 89.0 GHz. The equatorial crossing times are at 13:30 and 1:30 local time. To generate a long-term data product using the trained CNN retrieval model, it is necessary to input a long-term sequence of passive microwave remote sensing TB data to the retrieval model, thereby retrieving the LST. Therefore, AMSR-E L3 TB data are required. The time span of the AMSR-E sensor is from July 2002 to September 2011, and the transit times are also 13:30 and 1:30 local time. The Japan Aerospace Exploration Agency (JAXA) provides AMSR2 and AMSR-E L3 TB data (http://gportal.jaxa.jp) with a resolution of 10 km.
To overcome the difficulties caused by the synchronization of the measured sample data, the MODIS LST product was used as a sample-dependent variable as an input for the CNN. The Aqua satellite is equipped with two sensors, MODIS and AMSR-E. The AMSR-E sensor has the same transit times as AMSR-2 at 13:30 (ascending) and 1:30 (descending). Therefore, MODIS LST can correspond to the TB of AMSR-2. For this purpose, the MYD11A1 data was used. Previous studies have shown that the accuracy of the product in the verification area is within 1 K under clear sky conditions [28]. To match the resolution of the AMSR2 TB pixels, the MODIS data were resampled to spatial resolutions of 1 km to 10 km, which is consistent with the spatial resolution of the AMSR2 data.
Due to the existence of the satellite launch window between the passive microwave AMER-E sensor and the AMSR2 sensor (October 2011 to June 2012), the corresponding passive microwave TB data could not be obtained for CNN retrieval. The data from the empty window period are replaced by the surface temperature product synthesized by MYD11C3 from MODIS data, and the spatial resolution is 5.6 km. The MYD11C3 data were resampled to 10 km, and the seasonal average of the surface temperature was calculated from October 2011 to June 2012 using the monthly synthetic product. MODIS products are made available to the public by the U.S. Land Processes Distributed Active Archive Center (lpdaac.usgs.gov).

Remote Sensing Data
Advanced Microwave Scanning Radiometer 2 (AMSR2) L3 TB data are used as an independent variable for training the surface temperature retrieval algorithm in this research. The time span of the AMSR2 sensor is from July 2012 to February 2019. There are 14 horizontal/vertical polarization observation channels, which are 6.9, 7.3, 10.65, 18.7, 23.8, 36.5 and 89.0 GHz. The equatorial crossing times are at 13:30 and 1:30 local time. To generate a long-term data product using the trained CNN retrieval model, it is necessary to input a long-term sequence of passive microwave remote sensing TB data to the retrieval model, thereby retrieving the LST. Therefore, AMSR-E L3 TB data are required. The time span of the AMSR-E sensor is from July 2002 to September 2011, and the transit times are also 13:30 and 1:30 local time. The Japan Aerospace Exploration Agency (JAXA) provides AMSR2 and AMSR-E L3 TB data (http://gportal.jaxa.jp) with a resolution of 10 km.
To overcome the difficulties caused by the synchronization of the measured sample data, the MODIS LST product was used as a sample-dependent variable as an input for the CNN. The Aqua satellite is equipped with two sensors, MODIS and AMSR-E. The AMSR-E sensor has the same transit times as AMSR-2 at 13:30 (ascending) and 1:30 (descending). Therefore, MODIS LST can correspond to the TB of AMSR-2. For this purpose, the MYD11A1 data was used. Previous studies have shown that the accuracy of the product in the verification area is within 1 K under clear sky conditions [28]. To match the resolution of the AMSR2 TB pixels, the MODIS data were resampled to spatial resolutions of 1 km to 10 km, which is consistent with the spatial resolution of the AMSR2 data.
Due to the existence of the satellite launch window between the passive microwave AMER-E sensor and the AMSR2 sensor (October 2011 to June 2012), the corresponding passive microwave TB data could not be obtained for CNN retrieval. The data from the empty window period are replaced by the surface temperature product synthesized by MYD11C3 from MODIS data, and the spatial resolution is 5.6 km. The MYD11C3 data were resampled to 10 km, and the seasonal average of the surface temperature was calculated from October 2011 to June 2012 using the monthly synthetic product. MODIS products are made available to the public by the U.S. Land Processes Distributed Active Archive Center (lpdaac.usgs.gov).

Ground-Measured Data
The ground-measured data are used for accuracy verification. The China National Meteorological Administration (http://data.cma.cn/data/) site provides daily and hourly ground temperature monitoring data, where the ground surface temperature (GST) is measured by detectors contacting the ground surface. The time interval between measurements hourly, and the distribution of the monitoring sites is shown in Figure 2. The equatorial crossing times of the AMSR2 are 13:30 and 1:30 local time. Therefore, the average values of the LST data between 13:00 and 14:00 and between 1:00 and 2:00 from the ground monitoring site are selected. The observation times for the ground observation site data are longer, and the coverage is wider. However, there are still many sites that are missing measurements, and the values obviously deviate without quality control. Therefore, 33 sites (Table 1) with flat terrain, uniform terrain types, no obvious deviations and null values from April 2016 to October 2016 were sorted and selected, and 385 data were extracted. These data were used to verify the CNN surface temperature retrieval model.

Microwave Radiation Transmission Theory
A microwave radiometer measures the thermal emissions from the ground and their transfer from the ground through the atmosphere to the remote sensor on the satellite. In the microwave band, the atmospheric radiation transfer Equation for the thermal radiation balance can be described as follows [29]: In Equation (1), Bf(T) is the radiation intensity on the satellite, Tf is the luminance temperature on the satellite, τf(θ) is the transmittance of the frequency f, εf(θ) is the emissivity of the frequency f, Bf(Ta ↓ ) is the total downward radiation intensity of the atmosphere, Bf(Ta ↑ ) is the total upward radiation intensity of the atmosphere, Bf(Ts) is the surface radiation intensity, and Ts is the surface temperature. Bf(T) is the Planck function: In Equation (2), Bf(T) is the spectral brightness of a black body, and its units are Wm -2 sr -1 Hz −1 . T is the absolute temperature, its units are K. h is the Planck constant 6.63×10 −34 J. The units for the frequency f are Hz. k is the Boltzmann constant 1.38 × 10 −23 J•K −1 , and c is the speed of light of 2.992458 × 10 8 ms −1 . Since the approximation of the microwave band is based on Rayleigh-Jones law, Equation (2) can be simplified as: 2 Therefore, by substituting Equation (3) into Equation (1), the following Equation can be obtained:

Microwave Radiation Transmission Theory
A microwave radiometer measures the thermal emissions from the ground and their transfer from the ground through the atmosphere to the remote sensor on the satellite. In the microwave band, the atmospheric radiation transfer Equation for the thermal radiation balance can be described as follows [29]: In Equation (1), B f (T) is the radiation intensity on the satellite, T f is the luminance temperature on the satellite, τ f (θ) is the transmittance of the frequency f, ε f (θ) is the emissivity of the frequency f, B f (T a ↓ ) is the total downward radiation intensity of the atmosphere, B f (T a ↑ ) is the total upward radiation intensity of the atmosphere, B f (T s ) is the surface radiation intensity, and T s is the surface temperature. B f (T) is the Planck function: In Equation (2), B f (T) is the spectral brightness of a black body, and its units are Wm −2 sr −1 Hz −1 . T is the absolute temperature, its units are K. h is the Planck constant 6.63×10 −34 J. The units for the frequency f are Hz. k is the Boltzmann constant 1.38 × 10 −23 J·K −1 , and c is the speed of light of 2.992458 × 10 8 ms −1 . Since the approximation of the microwave band is based on Rayleigh-Jones law, Equation (2) can be simplified as: Therefore, by substituting Equation (3) into Equation (1), the following Equation can be obtained: The above Equation shows that the surface temperature retrieval algorithm constructed by a linear combination of TB data exhibits clear physical averaging. However, the single-channel radiative transfer Equation contains at least two unknown LSTs (T s ) and surface emissivity (ε f ) values, as well as atmospheric influence factors (T a ). In the process of surface temperature retrieval, the microwave observation radiation signal depends on the surface soil moisture and is affected by surface roughness, attenuation and extinction characteristics of the vegetation layer, as well as surface and vegetation temperature. It is therefore difficult to explain from the physical mechanism of microwave radiation transmission that the change in TB received by a satellite is caused by the combination of one or more factors [30]. To this end, it is necessary to introduce a CNN to construct a nonlinear relationship between TB and LST.

CNN
The CNN method uses a gradient descent back-propagating algorithm to train the weights in the network and achieves a global training algorithm with an improved learning rate to create a network structure with a multilayer filter in deep learning [31]. This approach reduces the complexity of the network model. In contrast to previous machine learning models, a CNN has two characteristics: local connections and weight sharing [32]. CNN algorithms use local connections to effectively extract information from data and reduce the number of parameters by sharing the weights. A typical CNN is shown in Figure 3. A CNN is mainly composed of an input layer, a convolutional layer, a pooling layer, a fully connected layer, and an output layer. Among these layers, several convolutional layers and pooled layers are alternately arranged to form multiple CNNs. and vegetation temperature. It is therefore difficult to explain from the physical mechanism of microwave radiation transmission that the change in TB received by a satellite is caused by the combination of one or more factors [30]. To this end, it is necessary to introduce a CNN to construct a nonlinear relationship between TB and LST.

CNN
The CNN method uses a gradient descent back-propagating algorithm to train the weights in the network and achieves a global training algorithm with an improved learning rate to create a network structure with a multilayer filter in deep learning [31]. This approach reduces the complexity of the network model. In contrast to previous machine learning models, a CNN has two characteristics: local connections and weight sharing [32]. CNN algorithms use local connections to effectively extract information from data and reduce the number of parameters by sharing the weights. A typical CNN is shown in Figure 3. A CNN is mainly composed of an input layer, a convolutional layer, a pooling layer, a fully connected layer, and an output layer. Among these layers, several convolutional layers and pooled layers are alternately arranged to form multiple CNNs. The input layer for a CNN ensures that the pixel size is uniform at 14 × 1, which means each AMSR2 brightness temperature pixels have 14 bands; that is, the size of each sample is 14 × 1, and the data are regularized. The convolutional layer is used to extract the sample features from the data and is the most central part of a CNN. Each neuron node that is input into the second convolutional layer is only part of the output from the previous layer, i.e., the output is followed by a layer of outputs for local connections [33]. To further extract the data features, the convolution layer obtains the input values of the neurons by weighted summation of the corresponding connection weights and local input values plus the offset value. The convolution operation is calculated by the following Equation (4): In Equation (5), l is the number of layers where the convolution layer is located, k is a convolution kernel, b is a bias term, f is an activation function, and Mj is the input characteristic data for a previous layer. The convolution kernel is a weight matrix whose size and number are manually specified. Considering that the total size of the input data is only 14, the convolution kernel size set in this study is 4 × 1, which results in a total of 30 kernels. The convolution kernel moves in steps of 1. To reduce the number of operations based on the retention of useful information and accelerate the training speed of the network, the CNN is actually equivalent to the convolution operation of the data and the convolution kernel [34]. The kernel is convolved, and then, a bias is applied to the activation function of the neuron to obtain the convolution output layer. The output of each neuron in the convolutional layer is delinearized by an activation function [35]. Commonly used activation functions include sigmoid functions, tanh functions, and rectified linear unit (ReLU) functions. Both the sigmoid function and the tanh function suffer from saturation problems, and it is easy to make the gradient disappear. The ReLU function is an unsaturated nonlinear function that can solve the problem of neural network gradient explosion and gradient disappearance and can accelerate the convergence speed of the The input layer for a CNN ensures that the pixel size is uniform at 14 × 1, which means each AMSR2 brightness temperature pixels have 14 bands; that is, the size of each sample is 14 × 1, and the data are regularized. The convolutional layer is used to extract the sample features from the data and is the most central part of a CNN. Each neuron node that is input into the second convolutional layer is only part of the output from the previous layer, i.e., the output is followed by a layer of outputs for local connections [33]. To further extract the data features, the convolution layer obtains the input values of the neurons by weighted summation of the corresponding connection weights and local input values plus the offset value. The convolution operation is calculated by the following Equation (4): In Equation (5), l is the number of layers where the convolution layer is located, k is a convolution kernel, b is a bias term, f is an activation function, and M j is the input characteristic data for a previous layer. The convolution kernel is a weight matrix whose size and number are manually specified. Considering that the total size of the input data is only 14, the convolution kernel size set in this study is 4 × 1, which results in a total of 30 kernels. The convolution kernel moves in steps of 1. To reduce the number of operations based on the retention of useful information and accelerate the training speed of the network, the CNN is actually equivalent to the convolution operation of the data and the convolution kernel [34]. The kernel is convolved, and then, a bias is applied to the activation function of the neuron to obtain the convolution output layer. The output of each neuron in the convolutional layer is delinearized by an activation function [35]. Commonly used activation functions include sigmoid functions, tanh functions, and rectified linear unit (ReLU) functions. Both the sigmoid function and the tanh function suffer from saturation problems, and it is easy to make the gradient disappear. The ReLU function is an unsaturated nonlinear function that can solve the problem of neural network gradient explosion and gradient disappearance and can accelerate the convergence speed of the network training [36]. Therefore, the ReLU function is used as the activation function and is calculated according to the following Equation: For ReLU, if the input is greater than 0, the output is equal to the input; otherwise, 0 is the output. With the ReLU function, the output does not tend to saturate as the input gradually increases. The CNN adds a pooling layer after each convolution operation to down-sample the output of the convolutional layer. The pooling layer can very effectively reduce the size of the matrix, thereby reducing the parameters in the last fully connected layer, accelerating the calculation speed, and preventing overfitting problems [37]. The forward propagation process of the pooling layer is also accomplished by moving a kernel function. The commonly used kernel functions have a maximum operation or an average operation. In this paper, the largest pooling method is used. The Equation is as follows: In Equation (7), l represents the number of layers, β represents the maximum pooling coefficient, b is the bias term, and f is the activation function. The pooling layer also uses ReLU as an activation function. The pooling kernel is set to a matrix size of 2. In a CNN structure, one or two fully connected layers are connected after multiple convolution and pooling layers. The CNN described above uses two sets of convolution and pooling layers and finally a fully connected layer. Each neural node in the fully connected layer is fully connected to all neural nodes in the previous layer. The role of the fully connected layer is to integrate local information with different features in the convolutional layer and the pooled layer [38]. To maintain the stability of the CNN network, the neural network node activation function at the fully connected level also uses the ReLU function. The fully connected layer performs information transformation and calculation processing based on the high-order invariant features obtained after convolution and pooling and completes a forward propagation learning process. The CNN output layer uses the average square error (MSE) as a loss function, and the layers' functional Equation is as follows: In Equation (8), y i is the target answer of the i-th data in a training batch, and y i ' is the retrieval value given by the neural network. CNN optimization processes include two stages: the forward propagation stage and the backpropagation stage. The forward propagation stage is inputted from the training sample and transmitted to the output layer on a layer-by-layer basis through the convolution kernel pool [39]. The predicted value in the forward propagation stage is calculated and compared with the target value to obtain the error between the two. The backpropagation stage is used to calculate the gradient of the loss function for each parameter, and then, the gradient descent algorithm is used to update each parameter according to the gradient and learning rate so that the CNN loss function on the training data is as small as possible [40]. With massive amounts of training data, it takes a long time to calculate the loss function for all the training data. To accelerate the training process, a stochastic gradient descent (SGD) algorithm is used. The traditional SGD algorithm maintains a single parameter learning rate to update the ownership weight. The learning rate does not change during the training process. The Adam (adaptive moment estimation) algorithm was used as a gradient descent algorithm for the CNN backpropagation stage. The method was proposed by Diederik Kingma of OpenAI and Jimmy Ba of the University of Toronto in 2014 [41]. The Adam algorithm is an extension of the SGD algorithm. Compared with the traditional SGD algorithm, Adam calculates an independent adaptive parameter learning rate for different parameters by calculating the first moment estimation and the second moment estimation of the gradient. Therefore, Adam has higher computational efficiency and lower memory requirements than the traditional SGD algorithm [42]. To build a CNN, we use the Keras interface in Python and the algorithm framework provided by the TensorFlow deep learning library.

Results
To evaluate the accuracy and practicability of CNN LST retrieval, we selected the classic AMSR2 TB and MODIS LST products to establish training and test datasets. The CNN is used to evaluate and analyze the training and test datasets acquired in different ways. We perform a comprehensive CNN retrieval verification analysis using a multisource training database. Finally, we analyze the temporal and spatial variation characteristics of surface temperature in different climate regions in China over the past 15 years.

Analysis of CNN Retrieval Model Based on Different Channel Combinations
To analyze the applicability of CNN retrieval of surface temperature, 23275 sets of TB-surface temperature samples were collected, and the dataset was randomly divided into a training dataset with 20947 samples and a test dataset with 2328 samples. The CNNs were then trained using three different channel combinations, 6.9, 7.3, 10 Table 2 shows that in the case of 3000 iterations, the first combined retrieval error correlation coefficient (R 2 ) is 0.942, RMSE=3.84, and the average relative error is 3.41 K; the second combined retrieval error R 2 is 0.976, RMSE=3.16 K, and the average relative error is 2.82 K; the third combined retrieval error R 2 is 0.935, RMSE=3.64 K, and the average relative error is 3.29 K. As shown in Figure 4, after 6.9 GHz is excluded, the combination of six frequencies (7.3, 10.65, 18.7, 23.8, 36.5, and 89 GHz, twelve V/H channels) results in the most stable and accurate CNN retrieval model. Previous studies [27] have shown that the low-frequency data of AMSR2 are poorly related to MODIS surface temperature products. The lower the frequency is, the more different the contribution of energy from different surface layers, so the combined retrieval error decreases after the low-frequency data (e.g., 6.9 GHz) are excluded. Therefore, the combination of six frequencies (twelve V/H channels) is best suited for retrieval. analyze the training and test datasets acquired in different ways. We perform a comprehensive CNN retrieval verification analysis using a multisource training database. Finally, we analyze the temporal and spatial variation characteristics of surface temperature in different climate regions in China over the past 15 years.

Analysis of CNN Retrieval Model Based on Different Channel Combinations
To analyze the applicability of CNN retrieval of surface temperature, 23275 sets of TB-surface temperature samples were collected, and the dataset was randomly divided into a training dataset with 20947 samples and a test dataset with 2328 samples. The CNNs were then trained using three different channel combinations, 6.9, 7.3, 10 Table 2 shows that in the case of 3000 iterations, the first combined retrieval error correlation coefficient (R 2 ) is 0.942, RMSE=3.84, and the average relative error is 3.41 K; the second combined retrieval error R 2 is 0.976, RMSE=3.16 K, and the average relative error is 2.82 K; the third combined retrieval error R 2 is 0.935, RMSE=3.64 K, and the average relative error is 3.29 K. As shown in Figure 4, after 6.9 GHz is excluded, the combination of six frequencies (7.3, 10.65, 18.7, 23.8, 36.5, and 89 GHz, twelve V/H channels) results in the most stable and accurate CNN retrieval model. Previous studies [27] have shown that the low-frequency data of AMSR2 are poorly related to MODIS surface temperature products. The lower the frequency is, the more different the contribution of energy from different surface layers, so the combined retrieval error decreases after the low-frequency data (e.g., 6.9 GHz) are excluded. Therefore, the combination of six frequencies (twelve V/H channels) is best suited for retrieval.    To analyze the error effects of the vertical and horizontal polarization modes on the CNN LST retrieval model, we divided the above datasets into two parts, vertical polarization and horizontal polarization, and then trained the CNN. With the combination of the seven channels of 6.9, 7.3, 10.65, 18.7, 23.8, 36.5, and 89 GHz, Table 2 and Figure 5 show that the CNN after 3000 iterations exhibits the best retrieval results and highest correlation. For the vertical polarization combination, the R 2 is 0.894, the RMSE is 4.08, and the average relative error is 3.57 K. For the horizontal polarization combination, the R 2 is 0.853, the RMSE is 4.85 K, and the average relative error is 4.37. The vertical polarization is better than the horizontal polarization, but because the CNN relies heavily on a large amount of data, training with only one polarization will reduce the number of channels. Therefore, the combination of the vertical and horizontal polarization modes is better than the combination with a single polarization mode. To analyze the error effects of the vertical and horizontal polarization modes on the CNN LST retrieval model, we divided the above datasets into two parts, vertical polarization and horizontal polarization, and then trained the CNN. With the combination of the seven channels of 6.9, 7.3, 10.65, 18.7, 23.8, 36.5, and 89 GHz, Table 2 and Figure 5 show that the CNN after 3000 iterations exhibits the best retrieval results and highest correlation. For the vertical polarization combination, the R 2 is 0.894, the RMSE is 4.08, and the average relative error is 3.57 K. For the horizontal polarization combination, the R 2 is 0.853, the RMSE is 4.85 K, and the average relative error is 4.37. The vertical polarization is better than the horizontal polarization, but because the CNN relies heavily on a large amount of data, training with only one polarization will reduce the number of channels. Therefore, the combination of the vertical and horizontal polarization modes is better than the combination with a single polarization mode.

Analysis of CNN Retrieval Model based on Different Regions
To further analyze the influence of samples from different regions on the retrieval error of the CNN, the accuracies in three regions were verified, including Xinjiang (10456 training data samples and 1134 test data samples), midwest Inner Mongolia (10598 training data samples and 1061 test data samples) and Northeast China (10327 training data samples and 1033 test data samples). According to the results of the analysis of different channel combinations, the three regions were trained with twelve V/H channels (7.3, 10.65, 18.7, 23.8, 36.5, 89 GHz V/H). The results of the CNN retrieval of the surface temperature for the test sample data are shown in Table 3. In Xinjiang, R 2 = 0.981, RMSE = 2.84 K, and the average relative error is 2.61 K. In midwest Inner Mongolia, R 2 = 0.979, RMSE = 3.16 K, and the average relative error is 2.82 K. In Northeast China, R 2 = 0.969, RMSE = 3.25 K, and the average relative error is 3.03 K. These results indicate that the CNN surface temperature retrieval model has the highest accuracy in Xinjiang. The retrieval results of the three regions ( Figure 6) indicate that the LST from the CNN retrieval model can adequately represent the actual overall spatial variation, indicating that a CNN model can be applied to surface temperature retrieval, especially over large areas of bare land (such as Xinjiang), where the model precision is high.

Analysis of CNN Retrieval Model based on Different Regions
To further analyze the influence of samples from different regions on the retrieval error of the CNN, the accuracies in three regions were verified, including Xinjiang (10456 training data samples and 1134 test data samples), midwest Inner Mongolia (10598 training data samples and 1061 test data samples) and Northeast China (10327 training data samples and 1033 test data samples). According to the results of the analysis of different channel combinations, the three regions were trained with twelve V/H channels (7.3, 10.65, 18.7, 23.8, 36.5, 89 GHz V/H). The results of the CNN retrieval of the surface temperature for the test sample data are shown in Table 3. In Xinjiang, R 2 = 0.981, RMSE = 2.84 K, and the average relative error is 2.61 K. In midwest Inner Mongolia, R 2 = 0.979, RMSE = 3.16 K, and the average relative error is 2.82 K. In Northeast China, R 2 = 0.969, RMSE = 3.25 K, and the average relative error is 3.03 K. These results indicate that the CNN surface temperature retrieval model has the highest accuracy in Xinjiang. The retrieval results of the three regions ( Figure 6) indicate that the LST from the CNN retrieval model can adequately represent the actual overall spatial variation, indicating that a CNN model can be applied to surface temperature retrieval, especially over large areas of bare land (such as Xinjiang), where the model precision is high.

Analysis of the CNN Retrieval model Based on Different Regions.
Ground verification is the key to the practical application of the LST retrieval model. At present, the actual accuracy verification of the LST retrieval model is a point of difficulty in the research on retrieval methods. The main reason for this difficulty is that the ground-point measured data and the large-scale remote sensing retrieval results cannot be accurately combined on the spatial scale of expression. Ground verification is very important for the practical application and promotion of the method. Therefore, it is necessary to verify the analysis by comparing the LST data from a measurement site with the CNN retrieval results. The AMSR2 TB image corresponding to the time of the trained CNN is input to retrieve the LST. Then, the retrieval result pixel is extracted according to the latitude and longitude of the station. Finally, the extracted data are compared with the groundmeasured data for verification.

Analysis of the CNN Retrieval model Based on Different Regions.
Ground verification is the key to the practical application of the LST retrieval model. At present, the actual accuracy verification of the LST retrieval model is a point of difficulty in the research on retrieval methods. The main reason for this difficulty is that the ground-point measured data and the large-scale remote sensing retrieval results cannot be accurately combined on the spatial scale of expression. Ground verification is very important for the practical application and promotion of the method. Therefore, it is necessary to verify the analysis by comparing the LST data from a measurement site with the CNN retrieval results. The AMSR2 TB image corresponding to the time of the trained CNN is input to retrieve the LST. Then, the retrieval result pixel is extracted according to the latitude and longitude of the station. Finally, the extracted data are compared with the ground-measured data for verification.
The results are shown in Figure 7. The average relative error between the value retrieved by the CNN model and the ground-measured data is 2.57 K, and the R 2 is 0.987, RMSE = 2.69 K. Although the data representation is as close as possible to the sample collection, the CNN retrieval results are somewhat lower than the site data, which may be related to the MODIS LST product being one of the sources of the dependent variable sample. Because the TB values from passive microwave and thermal infrared data are somewhat different, the passive microwave TB is deeper than the thermal infrared TB. In addition, it is difficult to accurately verify the surface temperature on the ground. Ground-based observation sites generally use measurements from gridded points, which is different from the large-area surface measurements of microwave radiometers. Therefore, this difference requires an appropriate correction of the retrieval results in the actual CNN retrieval. When selecting the CNN training sample data, the synchronization problem between the passive microwave TB and the ground observation data is overcome as much as possible, and some pixels with relatively large error are removed. These pixels may be affected by large areas of rainfall or other factors. The advantage of a CNN is that it increases the retrieval accuracy by adding enough ground training data with high precision to the MODIS LST product, which is applicable to more complex surface types.
To further analyze the retrieval results, the CNN LST retrieval model was applied to the ascending and descending AMSR2 TB data on August 13, 2016. The LST values of China were inferred, and the spatial variations were generated (Figure 8). Figure 8 shows that the retrieval results are basically consistent with the dry and wet distribution in China. The LST values around the coast, Yangtze River, Yellow River and some large lakes are low, which is consistent with the actual situation. The LST of the CNN retrieval is consistent with the spatial distribution characteristics of surface temperature in summer in China. The retrieval results of the ascending pass (Figure 8a) show that during the day (13:30), the surface temperatures of the Badain Jaran Desert, Tengger Desert, and Taklimakan Desert in Xinjiang are the highest. The temperature south of the Yangtze River is generally higher than that in North China and Northeast China. In the daytime LST retrieval results, Shanxi and Shaanxi Provinces have small pixel values, mainly because the input TB pixels are affected by rain during the day. The results of descending pass in central and western Inner Mongolia at night (1:30) (Figure 8b) show that the surface temperatures of the Badain Jaran Desert, Tengger Desert, and Taklimakan Desert in Xinjiang are low, which is in line with the geographical characteristics of the abovementioned day and night temperature differences. The surface temperatures in the Qinghai-Tibet Plateau are the lowest during the day and night, which is consistent with the characteristics of the plateau climate. Some of the pixel values are low due to lakes and alpine ice. In general, the CNN surface temperature retrieval algorithm based on passive microwave remote sensing performs well. Of course, in the future, it will be necessary to supplement more training sample data with high precision over representative regions to further improve the retrieval accuracy of the algorithm. from the large-area surface measurements of microwave radiometers. Therefore, this difference requires an appropriate correction of the retrieval results in the actual CNN retrieval. When selecting the CNN training sample data, the synchronization problem between the passive microwave TB and the ground observation data is overcome as much as possible, and some pixels with relatively large error are removed. These pixels may be affected by large areas of rainfall or other factors. The advantage of a CNN is that it increases the retrieval accuracy by adding enough ground training data with high precision to the MODIS LST product, which is applicable to more complex surface types. To further analyze the retrieval results, the CNN LST retrieval model was applied to the ascending and descending AMSR2 TB data on August 13, 2016. The LST values of China were inferred, and the spatial variations were generated (Figure 8). Figure 8 shows that the retrieval results are basically consistent with the dry and wet distribution in China. The LST values around the coast, Yangtze River, Yellow River and some large lakes are low, which is consistent with the actual situation. The LST of the CNN retrieval is consistent with the spatial distribution characteristics of surface temperature in summer in China. The retrieval results of the ascending pass (Figure 8a) show that during the day (13:30), the surface temperatures of the Badain Jaran Desert, Tengger Desert, and Taklimakan Desert in Xinjiang are the highest. The temperature south of the Yangtze River is generally higher than that in North China and Northeast China. In the daytime LST retrieval results, Shanxi and Shaanxi Provinces have small pixel values, mainly because the input TB pixels are affected by rain during the day. The results of descending pass in central and western Inner Mongolia at night (1:30) (Figure 8b) show that the surface temperatures of the Badain Jaran Desert, Tengger Desert, and Taklimakan Desert in Xinjiang are low, which is in line with the geographical characteristics of the abovementioned day and night temperature differences. The surface temperatures in the Qinghai-Tibet Plateau are the lowest during the day and night, which is consistent with the characteristics of the plateau climate. Some of the pixel values are low due to lakes and alpine ice. In general, the CNN surface temperature retrieval algorithm based on passive microwave remote sensing performs well. Of course, in the future, it will be necessary to supplement more training sample data with high precision over representative regions to further improve the retrieval accuracy of the algorithm.

Spatiotemporal Variations in Summer Soil Moisture in China.
The temporal and spatial changes are analyzed at the seasonal scale in this research. First, according to meteorology, spring is from March to May, summer is from June to August, autumn is September to November, and winter is from December to February of the next year. The LST dataset is derived from the retrieval results obtained from the CNN model trained in the previous section. The precision of the model is verified, and the retrieval accuracy is high. The TB data of the ascending and descending passes are input to the CNN retrieval model, and then the average of the diurnal results is calculated to obtain the daily average LST. Finally, the seasonal average is calculated from the daily average value.
Based on the seasonal average of LST, the annual variations in LST in the six major regions of China were calculated by means of raster pixel subregional statistics (Figure 9). The change in LST from 2003 to 2018 in China was small. There were significant seasonal differences in the magnitude of the change, with the spring and autumn averages being close to the annual average, but the summer and winter averages are the highest and lowest, respectively. Over the past 15 years

Spatiotemporal Variations in Summer Soil Moisture in China.
The temporal and spatial changes are analyzed at the seasonal scale in this research. First, according to meteorology, spring is from March to May, summer is from June to August, autumn is September to November, and winter is from December to February of the next year. The LST dataset is derived from the retrieval results obtained from the CNN model trained in the previous section. The precision of the model is verified, and the retrieval accuracy is high. The TB data of the ascending and descending passes are input to the CNN retrieval model, and then the average of the diurnal results is calculated to obtain the daily average LST. Finally, the seasonal average is calculated from the daily average value.
Based on the seasonal average of LST, the annual variations in LST in the six major regions of China were calculated by means of raster pixel subregional statistics (Figure 9). The change in LST from 2003 to 2018 in China was small. There were significant seasonal differences in the magnitude of the change, with the spring and autumn averages being close to the annual average, but the summer and winter averages are the highest and lowest, respectively. Over the past 15 years, the areas with large fluctuations in LST were mainly in Northeast China in the winter and the Qinghai-Tibet Plateau. After the seasonal average of LST data was calculated, least squares linear regression analysis was used to analyze the LST trends from 2003 to 2018 in China.
The slope reflects the trend of the variable during the study period, and a slope greater than 0 indicates that the variable is increasing; otherwise, it is decreasing. To effectively analyze the significance level of the change trend, the slope of each season was subjected to an F test. At the significance test levels of 0.01 and 0.1, the results were divided into three categories: basically unchanged, significant change, and very significant change. Using the grid calculator in ArcGIS, the classification result is multiplied by the positive and negative slope values, and the five types of changes are significantly decreasing, significantly decreasing, substantially unchanged, significantly increasing, and very significantly increasing. After the seasonal average of LST data was calculated, least squares linear regression analysis was used to analyze the LST trends from 2003 to 2018 in China.
The slope reflects the trend of the variable during the study period, and a slope greater than 0 indicates that the variable is increasing; otherwise, it is decreasing. To effectively analyze the significance level of the change trend, the slope of each season was subjected to an F test. At the significance test levels of 0.01 and 0.1, the results were divided into three categories: basically unchanged, significant change, and very significant change. Using the grid calculator in ArcGIS, the classification result is multiplied by the positive and negative slope values, and the five types of changes are significantly decreasing, significantly decreasing, substantially unchanged, significantly increasing, and very significantly increasing. For the regional average surface temperature from 2003-2018 in China, the linear change slope, correlation coefficient and trend classification are calculated. Figure 10 shows that the slope of the annual average LST value has been concentrated in the range of (−0.015, 0.015) over the past 15 years.  Figure 10 shows that the slope of the annual average LST value has been concentrated in the range of (−0.015, 0.015) over the past 15 years. Table 4 shows that the area where the LST remains basically unchanged accounts for a large proportion, reaching 59.33% of the area, corresponding to 5.69 million km 2 . The area where the LST has risen significantly covers 19.73% of the country, with an area of 1.89 million km 2 , and these trends are mainly distributed in Guangdong Province, Heilongjiang Province and parts of the Qinghai-Tibet Plateau. The significantly decreasing area accounts for 13.14% of the country, with an area of 1.26 million km 2 , and these trends are mainly distributed in Northwest and Northeast China. The region with a very significant reduction accounts for 5.13% of the country, and these trends are scattered in eastern Inner Mongolia, Sichuan Province and Guizhou Province. The areas that are significantly rising are concentrated in the southern part of the Qinghai-Tibet Plateau and the Hulunbuir grassland in Inner Mongolia, accounting for only 1.30% of the country.   To further evaluate the trend of the annual spatial variations in LST in various seasons, we calculated the slope ( Figure 11) for the pixel-by-pixel LST data of China from 2003 to 2018. The slope ranges between -0.015 and 0.015 in the four seasons, indicating that the average LST in China over the past 15 years exhibits a small change. In most areas in spring, the LST showed a downward trend (slope s < 0, correlation coefficient r < 0, which are the blue and green parts in the figure, respectively), and a small part (e.g., Northeast China and the west of the Qinghai-Tibet Plateau) showed an upward trend (s > 0, which is the red part of the figure). In addition to Inner Mongolia and northern Xinjiang, the summer LST exhibits an overall upward trend, with the most obvious trends in the Tarim Basin and the Kunlun Mountains (slope s < ×0.015). In autumn, the area where the slope is less than 0 is widely distributed, indicating that the LST in autumn generally shows a downward trend. The declining trends in Heilongjiang, central Inner Mongolia, Qinghai and northern Xinjiang are obvious.
In winter, LST generally shows an increasing trend. The slopes in North China, Southeast China and Southwest China were all greater than 0, and a small number of regions show downward trends, such as central Inner Mongolia, Liaoning Province and Jilin Province.  To further evaluate the trend of the annual spatial variations in LST in various seasons, we calculated the slope (Figure 11) for the pixel-by-pixel LST data of China from 2003 to 2018. The slope ranges between -0.015 and 0.015 in the four seasons, indicating that the average LST in China over the past 15 years exhibits a small change. In most areas in spring, the LST showed a downward trend (slope s < 0, correlation coefficient r < 0, which are the blue and green parts in the figure, respectively), and a small part (e.g., Northeast China and the west of the Qinghai-Tibet Plateau) showed an upward trend (s > 0, which is the red part of the figure). In addition to Inner Mongolia and northern Xinjiang, the summer LST exhibits an overall upward trend, with the most obvious trends in the Tarim Basin and the Kunlun Mountains (slope s < ×0.015). In autumn, the area where the slope is less than 0 is widely distributed, indicating that the LST in autumn generally shows a downward trend. The seasonal average LST slopes in China from 2003-2018 were classified (Figure 12), and the area ratios of the seasonal trend types in spring, summer, autumn and winter were calculated ( Table  5). The results indicate that the LST in China in spring has generally decreased over the past 15 years. In southern China, North China, Northwest China, and the Qinghai-Tibet Plateau, there were significant decreases. The areas with significant decreases and extremely significant decreases covered 4.78 million km 2 and 2.94 million km 2 , respectively, accounting for 49.77% and 30.69% of the national area, respectively. The regions with a significant increase in LST in spring accounted for only 0.39% of the country, which were mainly in the southwestern region, Gansu Province, Qinghai Province and northern Xinjiang, while significant increases occurred in the northeastern region, northeastern Inner Mongolia, and the border of the Qinghai-Tibet Plateau. In contrast to spring, the winter surface temperatures in southern China, North China, Northwest China and the Qinghai-Tibet Plateau showed significant increasing trends, with significant increases over 6.19 million km 2 , accounting for 64.46% of the country. The regions with downward trends in winter accounted for only 10.81% of the country, which were mainly distributed in Jilin, Liaoning, central Inner Mongolia and northern Xinjiang. In the summer, 10.14% of the regions showed very significant increases in LST, with an area of 0.97 million km 2 , which was the highest of the four seasons. The areas that significantly increased were mainly distributed in the Tarim Basin, the Qaidam Basin, Yunnan Province, and the original region of the rivers in Qinghai Province. The area where the LST in the autumn remained basically unchanged covered 54.35% of the country, corresponding to an area of 5.21 million km 2 , which is the highest of the four seasons, indicating that the LST changes in China were the most stable in autumn.  (Figure 12), and the area ratios of the seasonal trend types in spring, summer, autumn and winter were calculated ( Table 5).
The results indicate that the LST in China in spring has generally decreased over the past 15 years. In southern China, North China, Northwest China, and the Qinghai-Tibet Plateau, there were significant decreases. The areas with significant decreases and extremely significant decreases covered 4.78 million km 2 and 2.94 million km 2 , respectively, accounting for 49.77% and 30.69% of the national area, respectively. The regions with a significant increase in LST in spring accounted for only 0.39% of the country, which were mainly in the southwestern region, Gansu Province, Qinghai Province and northern Xinjiang, while significant increases occurred in the northeastern region, northeastern Inner Mongolia, and the border of the Qinghai-Tibet Plateau. In contrast to spring, the winter surface temperatures in southern China, North China, Northwest China and the Qinghai-Tibet Plateau showed significant increasing trends, with significant increases over 6.19 million km 2 , accounting for 64.46% of the country. The regions with downward trends in winter accounted for only 10.81% of the country, which were mainly distributed in Jilin, Liaoning, central Inner Mongolia and northern Xinjiang. In the summer, 10.14% of the regions showed very significant increases in LST, with an area of 0.97 million km 2 , which was the highest of the four seasons. The areas that significantly increased were mainly distributed in the Tarim Basin, the Qaidam Basin, Yunnan Province, and the original region of the rivers in Qinghai Province. The area where the LST in the autumn remained basically unchanged covered 54.35% of the country, corresponding to an area of 5.21 million km 2 , which is the highest of the four seasons, indicating that the LST changes in China were the most stable in autumn.

Discussion and Conclusions
This research applies the deep learning CNN method to the passive microwave LST retrieval problem. The retrieval accuracy of deep learning CNNs mainly depends on the training and test datasets. However, the spatial resolution of passive microwave data is relatively low, and it is difficult to obtain ground-measured data that are synchronized with satellite data. Therefore, we chose to use the MODIS surface temperature products as a reference to obtain ground synchronization data, thus overcoming the problem of synchronous ground observation data. The AMSR2 TB data and MODIS LST data were randomly divided into training and test datasets; a CNN was constructed to simulate the passive microwave radiation transmission process, and the LST was inverted. The main conclusions are as follows: First, the combination of twelve V/H channels of 7.3, 10.65, 18.7, 23.8, 36.5, and 89 GHz results in the most stable and accurate CNN retrieval model. Vertical polarization is better than horizontal polarization. However, because the CNN rely heavily on large amounts of data, training with only one polarization reduces the number of channels by half; thus, the combination of both vertical and horizontal polarizations is better than the use of a single polarization.
Second, the influence of different regional samples on the retrieval errors of the CNNs was further analyzed, and the training results in Xinjiang, Inner Mongolia, Central and Western China were verified. The results showed that in Xinjiang, the R 2 = 0.981, RMSE = 2.84 K, and the average relative error is 2.61 K. These results indicate that the CNN surface temperature retrieval model exhibits the highest accuracy in Xinjiang, indicating that the CNN surface temperature retrieval exhibits the highest accuracy over large areas of bare land.
Moreover, the fitted line of the CNN retrieval data and ground station data is generally close to 1:1, indicating that the CNN surface temperature retrieval can maintain the overall spatial variations presented by the actual values. This result shows that the CNN method can be applied to LST retrieval. After ground precision verification, the R 2 = 0.987, RMSE = 2.69 K, and the average relative error is 2.57 K, which indicates that the accuracy of the CNN surface temperature retrieval algorithm is high.
Finally, according to retrieval results based on a long time series, we analyze the temporal and spatial variations in surface temperature in different climate regions in China over the past 15 years. Over the past 15 years, the average surface temperature in China has generally changed slightly. From a spatial perspective, the surface temperatures are generally decreasing in spring, and most areas of China (including the southern, northern, northwest, and Qinghai-Tibet Plateau regions) exhibit significantly decreasing trends. In contrast to the temperatures in spring, most of the winter surface temperatures in China show a significant increasing trend, indicating that the surface temperatures in winter and spring in China have changed in the same regions over the past 15 years, but the changes are generally small. Because passive microwave sensors are able to penetrate the atmosphere, microwave radiation energy comes from different levels than visible light and thermal infrared radiation. Therefore, the LST retrieval results of passive microwave sensors are different from other ground reanalysis data and thermal infrared LST products. In addition, because of the inevitable mixed pixel problem of satellite remote sensing, the spatial resolution of passive microwave remote sensing is currently low. When the microwave radiometer on the satellite passes over the study area, the microwave radiation is instantaneously measured. Therefore, it is ver difficult to obtain the measured data at the surface at this exact moment. The ground conditions must be balanced with the corresponding atmospheric conditions, topography and land cover types. This requirement results in difficulty in verifying the accuracy of remote sensing surface parameter retrieval. With the further improvement of the surface measurement network in the future by adding sufficient ground high-precision training data, the deep learning advantages of CNNs can be used to adapt to more complex surface types and further improve the retrieval accuracy of the model.