Geostatistical merging of ground-based and satellite-derived data of surface solar radiation

In this paper, we demonstrate the benefit of using observations from Meteosat Second Generation (MSG) satellites in addition to in-situ measurements to improve the spatial resolution of solar radiation data over Belgium. This objective has been reached thanks to geostatistical methods able to merge heterogeneous data types. Two geostatistical merging methods are evaluated against the interpolation of ground-data only and the single use of satellite-derived information. It results from our analysis that merging both data sources provides the most accurate mapping of surface solar radiation over Belgium.


Introduction
Knowledge of the local solar radiation is essential for many applications, including design, planning and operation of solar energy systems, architectural design, crop growth models and evapotranspiration estimates.Traditionally, solar radiation is observed by means of networks of meteorological stations.Costs for installation and maintenance of such networks are very high and national networks comprise only few stations.Consequently the availability of observed solar radiation measurements has proven to be spatially inadequate for many applications.Mapping solar radiation by interpolation/extrapolation of measurements is possible but usually leads to large errors, except for dense networks (Zelenka et al., 1992;Hay, 1981Hay, , 1984;;Hay and Hanson, 1985;WMO, 1981;Perez et al., 1997).
Because several authors have shown the potentialities of the images of the Earth taken by polar-orbiting and geostationary satellites for mapping the global irradiation impinging on a horizontal surface at the ground level (e.g., Zelenka et al., 1992Zelenka et al., , 1999;;Perez et al., 1997Perez et al., , 2002;;Pinker et al., 1995), we evaluate in the present paper the benefit of using space-based observations as an additional information source when interpolating the ground measurements.More specifically, we consider surface incoming global short-wave radiation products derived from Meteosat Second Generation (MSG, Schmetz et al., 2002) in order to improve the spatial resolution of daily surface solar radiation data over Belgium.
Correspondence to: M. Journée (michel.journee@oma.be) To reach that objective, we implemented two geostatistical methods able to merge heterogeneous data types (i.e., kriging with external drift and regression kriging) and evaluate these methods against mappings derived from a single source of data (i.e., either in-situ or satellite data).

Ground-based solar radiation measurements
The Royal Meteorological Institute of Belgium (RMIB) is currently performing measurements of global solar irradiance (in Wm −2 ) by means of CNR1 and CM11 pyranometers of Kipp & Zonen at 13 sites well-distributed over Belgium (see Fig. 3).Measurements are made with a 5 s time step and time-integrated on a 10 min basis in the RMIB data warehouse.The 10-min solar irradiation data (in Whm −2 ) are then subject to a set of semi-automatic quality assessment tests and gaps in the time series are filled by model estimations (Journée and Bertrand, 2011).

MSG-derived surface solar irradiance
Within the Satellite Application Facility (SAF) network, the down-welling short-wave irradiance at the Earth's surface is operationally retrieved from MSG imageries by three decentralized SAFs: the Ocean and Sea Ice SAF (OSI-SAF, www.osi-saf.org),the Land Surface Analysis SAF (LSA-SAF, land-saf.meteo.pt)and the SAF on Climate Monitoring (CM-SAF, www.cmsaf.eu).To retrieve the same parameter, the different SAFs use their own algorithms and different ancillary input data.
Published by Copernicus Publications.
Sea surface being out of the scope of this study, we focused our investigation on the LSA-SAF and CM-SAF products.The LSA-SAF surface solar radiation product (Geiger et al., 2008) is generated every 30 min and distributed to the users in near real-time at the pixel spatial resolution of the MSG spectral imager (i.e., about 6 km in NS direction and 3.3 km in EW direction over Belgium).The operational CM-SAF surface solar radiation product (Mueller et al., 2009) is an off-line product provided on a 15 × 15 km sinusoidal grid in daily and monthly average.The monthly mean diurnal cycle is also provided.Because of the relatively coarse spatial and temporal resolution of the CM-SAF operational product, intermediate CM-SAF values (R. Mueller, personal communication, 2010) were considered in the present study, namely instantaneous hourly CM-SAF solar surface irradiance remapped onto 3 × 3 km, 9 × 9 km and 15 × 15 km grids.

Geostatistical mapping methods
In this study, we considered three geostatistical mapping methods: ordinary kriging (OK), kriging with external drift (KED) and regression kriging (RK).The aim of these methods is to interpolate a random field, G, (e.g., the spatial distribution of surface solar radiation) from observations, G(x i ), at selected locations x i | i=1,...,N (e.g., the ground measurements).Based on the knowledge of the spatial dependence of the random field, the kriging estimation Ĝ(x 0 ) at an unobserved location x 0 is computed as a linear combination of the observations, Ĝ(x 0 ) = N i=1 w i G(x i ), where the weights, w i , are chosen in such a way that the estimator is unbiased and the error variance is minimized.The spatial correlation of the random field between two locations x i and x j is described by means of the variogram γ(x i , x j ) = E[(G(x i ) −G(x j )) 2 ], which is often chosen as isotropic, meaning that it is function only of the distance d between x i and x j , i.e., γ(x i , x j ) = γ(d).When the number of observation locations is sufficient, a model can be fitted to the empirical variogram derived from the observed data.The variogram has otherwise to be assumed.
While in OK the random field G is assumed to have a constant, albeit unknown, mean at all locations, the KED and RK techniques rely on the knowledge of a densely sampled auxiliary variable g (e.g., the SAFs' products) to model G as a non-stationary random field of the form E[G(x)] = a 0 + a 1 g(x).Although the KED and RK methods are aimed toward a similar objective, they differ in the way to compute the parameters a i and the weights w i (Hengl et al., 2003).In KED, a i and w i are derived together by forcing an exact interpolation of the auxiliary variable, g(x 0 ) = N i=1 w i g(x i ).In RK, the parameters a i are first computed by linear regression from data at the observed locations.The regression residuals are then interpolated by OK.From a computational point of view, all three kriging methods require to solve linear systems of equations.We refer to Wackernagel (1995), Hengl et al. (2003) and references therein for more details on these techniques.

Cross validation analysis
This study is focused on the mapping of surface solar radiation data over Belgium on a daily basis.Daily totals of the 10-min RMIB ground data are obtained by simple summation, while the instantaneous satellite data are integrated by trapezoidal integration.The considered interpolation and merging methods are evaluated by leave-one-out cross-validation (CV) on the basis of two years of qualitycontrolled data (2008 and 2009).In total, we used a set of 491 days for which the ground data and both SAFs' satellite data were available at all stations and over the entire diurnal cycle.The performance of the different methods is assessed by the average on these 491 instances of three indices derived from the bias between the cross-validation prediction Ĝ and the actual measurement G at the N locations x i | i=1,...,N : -the cross-validation mean bias error -the cross-validation mean absolute error -the cross-validation root mean square error where ) is the average solar radiation over all stations.
Since surface solar radiation is measured at only 13 stations, variograms can hardly be estimated from the ground data.Hence, we chose a fixed variogram model, e.g., an exponential variogram γ(d) = 1 − exp(−d/ d).The evolution of RMSE cv as a function of the range parameter d indicates that the best performance is reached when d = 500 km for OK and d = 50 km for KED and RK (see Fig. 1).This difference in optimal values of d results from the high spatial resolution of the satellite data used by KED and RK.Even if the variogram is expected to vary with the sky conditions, we used these values of d for all the 491 days.
Because of the possible non-uniformity of the surface global radiation within the MSG pixel, the SAFs' satellite products have been used at various spatial resolutions as auxiliary information for the KED and RK methods (i.e., from 1 pixel to 3 × 3 pixels aggregates for the LSA-SAF product and from 3 × 3 km to 15 × 15 km areas for the CM-SAF product).The geostatistical interpolation of ground data and the geostatistical merging of ground data with satellite information are compared against the SAFs' estimations in Table 1.Since the MBE cv error is in overall very small for all methods, performance comparison relies essentially on the MAE cv and RMSE cv indices.First, the largest RMSE cv and MAE cv are found for OK of the ground measurements and  the unmerged LSA-SAF mappings, respectively, while the best performance is observed once ground and satellite data are merged together.The KED and RK merging methods exhibit virtually identical results.Using both ground-based and satellite-derived information provides a significant improvement with respect to mappings based solely on one of these data sources (e.g., in case of the LSA-SAF product, the RMSE cv is reduced by about 25% and 30% with respect to the satellite and the OK mappings, respectively).Regarding the satellite information used as auxiliary information for KED and RK, the both SAF products provide comparable results although the best scores are obtained by KED with the LSA-SAF data.Finally, the spatial resolution of the satellite products appears to have little impact on the resulting performance, albeit that slight improvements are obtained with larger resolutions.In Fig. 2, we investigate the impact of sky conditions on the mapping performance.Sky-type classification is made upon both the mean and standard deviation over all stations of the daily clearness index (i.e., the ratio of the daily totals of surface and top-of-the-atmosphere incoming solar radiation).First, as far as the average sky condition over Belgium is concerned (see Fig. 2, left panel), the OK interpolation of ground data outperforms the single use of SAFs' data for overcast and very clear skies.In overcast conditions, it is well-known that the SAF's data overestimate the surface incoming solar radiation, while the exact mechanism that causes this overestimation is still unclear (Ineichen et al., 2009;Journée and Bertrand, 2010).The geostatistical merging of ground and satellite data exhibits the best performance for all types of sky.The improvement is however less pronounced for very clear skies.Second, regarding the influence of the spatial variability in sky conditions, the benefit of using the SAF's products is the largest for sky conditions that are highly inhomogeneous over the country (see Fig. 2, right panel).variability in sky conditions, the benefit of using the SAF's products is the largest for sky conditions that are highly inhomogeneous over the country (see Fig. 2, right panel).The daily clearness index at a specific location is inferred by means of the geostatistical interpolation and merging methods or directly from the satellite data for each of the 491 days selected in this study.Averages on these 491 instances are then computed.All maps clearly highlight the global south-east to northwest positive gradient in clearness index.Satellite-based information is however needed to capture more regional features.The values derived from the LSA-SAF data only (Figure 3, middle panel) are at most stations slightly below those derived from ground measurements only (Figure 3, left panel).Hence, merging the two data sources (Figure 3, right panel) enables to take advantage of both the accuracy of ground measurements and the global spatial coverage of the high spatial resolution satellite data are involved or not in the interpolation process.

Maps of surface solar radiation
The best performance has been observed once ground observations are merged with the LSA-SAF product.No significant difference has been noted between the two implemented geostatistical merging methods (kriging with external drift and regression kriging).This reflects the fact that both methods are conceptually very close.Concerning the impact of sky conditions, mappings inferred by merging ground and SAFs' data were systematically more accurate than when using each data source separately, whatever the sky-type, while the benefit of the method is less apparent in case of very clear skies.Finally, the benefit of using satellite information appeared to be the largest in case of highly variable sky conditions over the studied area.In overall, merging ground and satellite data enables to take advantage of both the high accuracy of ground data and the global spatial coverage of satellite information.
Acknowledgements.The authors are grateful to Richard Mueller (German Meteorological Service, Germany) for providing us with 2 years of hourly CM-SAF solar surface irradiance remapped onto regular latitude-longitude grids at various spatial resolutions (3 × 3 km, 9 × 9 km and 15 × 15 km).This study was supported by the Belgian Science Policy under the research project: Contribution de (Fig. 3, middle panel) are at most stations slightly below those derived from ground measurements only (Fig. 3, left panel).Hence, merging the two data sources (Fig. 3, right panel) enables to take advantage of both the accuracy of ground measurements and the global spatial coverage of satellite observations.

Conclusions
In this paper, we demonstrated the benefit of using MSG satellites imageries in addition to in-situ measurements to improve the spatial resolution of solar radiation data over Belgium.Regarding the MSG-derived information, we considered two products delivered by the LSA-SAF and the CM-SAF, respectively.The variograms used in the implemented geostatistical interpolation/merging methods were estimated to provide the best cross-validation scores over 13 sites in Belgium.Differences by a factor of 10 were observed for the optimal variogram range parameter, depending on whether the high spatial resolution satellite data are involved or not in the interpolation process.
The best performance has been observed once ground observations are merged with the LSA-SAF product.No significant difference has been noted between the two implemented geostatistical merging methods (kriging with external drift and regression kriging).This reflects the fact that both methods are conceptually very close.Concerning the impact of sky conditions, mappings inferred by merging ground and SAFs' data were systematically more accurate than when using each data source separately, whatever the sky-type, while the benefit of the method is less apparent in case of very clear skies.Finally, the benefit of using satellite information appeared to be the largest in case of highly variable sky conditions over the studied area.In overall, merging ground and satellite data enables to take advantage of both the high accuracy of ground data and the global spatial coverage of satellite information.

Figure 1 .
Figure 1.Distribution of the cross-validation root mean square error (RMSE cv ) as a function of the variogram range parameter d for the three kriging methods (left panel: OK; center panel: KED; right panel: RK).The SAFs' products are used at the finest spatial resolution as auxiliary information for KED and RK (i.e., MSG pixel resolution for the LSA-SAF and 3 × 3 km resolution for the CM-SAF).

Figure 3 Figure 2 .Figure 3 .
Figure3compares the spatial distribution over Belgium of the average daily clearness index as computed by the OK interpolation of ground data, by the single use of LSA-SAF data, and by the KED merging of ground and LSA-SAF data.The daily clearness index at a specific location is inferred by means of the geostatistical interpolation and merging methods or directly from the satellite data for each of the 491 days selected in this study.Averages on these 491 instances are then computed.All maps clearly highlight the global south-east to northwest positive gradient in clearness index.Satellite-based information is however needed to capture more regional features.The values derived from the LSA-SAF data only

Figure 3
Figure3compares the spatial distribution over Belgium of the average daily clearness index as computed by the OK interpolation of ground data, by the single use of LSA-SAF data, and by the KED merging of ground and LSA-SAF data.The daily clearness index at a specific location is inferred by means of the geostatistical interpolation and merging methods or directly from the satellite data for each of the 491 days selected in this study.Averages on these 491 instances are then computed.All maps clearly highlight the global south-east to northwest positive gradient in clearness index.Satellite-based information is however needed to capture more regional features.The values derived from the LSA-SAF data only (Figure3, middle panel) are at most stations slightly below those derived from ground measurements only (Figure3, left panel).Hence, merging the two data sources (Figure3, right panel) enables to take advantage of both the accuracy of ground measurements and the global spatial coverage of

Figure 3 .
Figure 3. Spatial distribution over Belgium of the average daily clearness index as computed (1) by the OK interpolation of ground data, (2) by the single use of satellite-based estimations, and (3) by the KED merging of ground and satellite data.The displayed values of the daily clearness index are averages on the 491 days selected in this study.Dots indicate the ground stations' locations.

Table 1 .
Cross validation mean bias error (MBE cv ), mean absolute error (MAE cv ) and root mean square error (RMSE cv ) for the geostatistical interpolation by OK of the RMIB ground data, the SAFs' products used at various spatial resolutions, and the geostatistical merging by KED and RK of the RMIB ground data with the SAFs' products.