the Creative Commons Attribution 3.0 License. Advances in

Abstract. The climatic reference values for monthly and annual average air temperature and total precipitation in Catalonia – northeast of Spain – are calculated using a combination of statistical methods and geostatistical techniques of interpolation. In order to estimate the uncertainty of the method, the initial dataset is split into two parts that are, respectively, used for estimation and validation. The resulting maps are then used in the automatic outlier detection in meteorological datasets.


Introduction
In climatology, as well as in geography, in biology or in other fields, it is necessary to have climatic reference values for a specific spot or region.Rather often, there is not a meteorological station in the area of interest.Moreover, there is not any guarantee that data supplied by the nearest station describe accurately enough its climatic conditions.There are numerous methods to build a map to represent the spatial distribution of climatic parameters: deterministic, probabilistic or physical methods, artificial neural networks, etc. (Lam, 1983;Demyanov et al., 1998;Ninyerola et al., 2007).
On the other hand, studies including analysis of meteorological data usually require a preliminary arduous and unpleasant debugging of the original datasets in order to purge wrong values.Most times, this task has been carried out through the comparison of the dataset to analyse with others recorded at nearby stations.Nevertheless, the method is not straightforward, especially when it is applied to magnitudes with strong spatial variations.Such variations are usually more evident in areas with complex physiographic features.
The problem can be mitigated through the use of the difference with the reference value instead of the original value.In Sect.3, the method used to obtain the spatial distribution of the climatic magnitudes is described.Section 4 refers the attempts to automatically identify possible outliers from the spatial distribution of differences between meteorological data and climatic reference values.

Geographic framework and data
The geographic framework for the present study is Catalonia, which is located in the north-eastern corner of the Iberian Peninsula.It has a surface area of 31 895 km 2 and a significant geographic diversity, from the Mediterranean Sea to peaks over 3000 m above sea level in the Pyrenees.The climatological dataset contains 1961-1990 averages of monthly data (INM, 2000) recorded at 144 thermometric and 302 pluviometric stations.15 thermometric and 55 pluviometric stations are located over surrounding regions and have been included in the study in order to smooth border effects.Auxiliary physiographic parameters, such as altitude, distance to sea or land slope have been retrieved from a digital elevation model (DEM) using GIS techniques.The DEM, with a 200m resolution, has been provided by the Institut Cartogràfic de Catalunya (ICC).

Calculation of the climatic reference values
The climatic magnitudes -average monthly temperature and monthly precipitation -are represented using a combination of statistic methods and geostatistical techniques of interpolation.First, a multiple regression analysis yields a model that relates the climatic variable with several physiographic parameters.Then, the residuals, that is, the difference between the observed values and the predictions of the linear model, are spatially interpolated using an ordinary kriging.The result, a map of residuals, shows the amount of variability that is not explained by the regression model.This variability can be attributed either to errors in the original datasets or to relationships with physiographic or meteorological magnitudes that have not been considered in the Published by Copernicus Publications.The physiographic parameters used in the regression are: altitude, latitude, longitude, distance to sea and average and minimum altitudes in circles of 5 to 40 km radius, which are considered, following Dyras et al. (2005), in order to account for topography at different scales.Then, the best predictive parameters for a given variable are chosen using a technique called stepwise regression (Hocking, 1976).
For average monthly temperature, altitude and distance to sea turn out to be the best predictive physiographic variables.The altitude explains most of the variance and it is significant in all cases.On the other hand, the distance to sea is only significant during the cold season, from October to March.The coefficients of the regression equation are displayed in Table 1.
For total monthly precipitation, the best predictive variables are those related to altitude.Between May and September, the average altitude in a 40-km radius circle explains most of the variance.In the other months, the most significant variables are average and/or minimum altitudes for a 10/15-km radius circle.Latitude and longitude have also resulted to be significant in some cases.The coefficients of the regression equation are shown in Table 2.
An ordinary kriging is used to interpolate both temperature and precipitation residuals.The empirical semivariogram, built following Journel and Huijbreghts (1978), is modelled with an exponential curve, with the range set to 40 km.The values of nugget for the different months are in the range The initial meteorological dataset is split into two parts in order to estimate the uncertainty of the method.The full process is first carried out using 70% of the initial data, which are selected at random.Then, the remaining 30% of data are compared with values retrieved from the resulting maps.This linear regression yields adjusted coefficients of determination that are over 0.80 for average temperature (they range from 0.81 in July to 0.89 in November) and over 0.75 for total precipitation (they range from 0.75 in September to 0.90 in July).The highest errors are found in areas of great orographic complexity and sparse observational coverage, where both the data fitting into the regression model and the interpolation of residuals are deficient.Anyway, since the overall results are acceptable, the process is finally carried out using the whole meteorological dataset.

Application of the reference values in debugging meteorological datasets
At the Instituto Nacional de Meteorología, it has been implemented a method to automatically detect possible errors in the monthly average temperature and total precipitation datasets.It is based on GIS techniques and makes use of the climatic reference values previously computed.The first step of the method is the calculation of the socalled monthly anomalies.The thermal anomalies are defined as the difference between a monthly averaged temperature and its reference value.The precipitation anomalies are defined in a similar way, but they are expressed as a percent of the reference value.
Then, two different filters are applied in order to find out those stations whose data are suspicious and require further investigation.The first filter simply detects the stations Table 1.Coefficients of the regression equation for monthly average temperature (T ): T =C+Ax 1 +Bx 2 , where x 1 is the altitude of the station, x 2 is the distance to sea.The adjusted coefficient of regression and the root mean square error are also displayed.
, where x 1 is the altitude of the station (retrieved from the DEM), x 2 is the difference between the altitude of the station and the average altitude in a circle of a 10 km-radius around it, x 3 is the difference between the altitude of the station and the minimum altitude in a circle of a 15 km-radius around it, x 4 is the difference between the altitude of the station and the average altitude in a circle of 40 km-radius around it, x 5 is the eastern longitude of the station and x 6 is the northern latitude (minus 40 degrees).The adjusted coefficient of regression and the root mean square error are also displayed.whose anomalies exceed a predefined threshold, which is specific for every variable.The second filter is a little more complex, since it detects groups of nearby stations with incompatible data.The presence of areas with sharp spatial variations of anomalies is supposed to be related with the existence of incompatible data.
To analyse the spatial variations of anomalies, they are first interpolated to 400-m spaced grid-points using an inversedistance weighting method that avoids excessive smoothing.Then, the slopes in this grid are calculated and the second filter may detect the areas with the steepest slopes.The thresholds for suspicious values have been initially set to 0.7 • C km −1 for thermal anomalies and 7 km −1 for anomalies in precipitation, although they can be changed according to the particular data distribution.
Figure 2 illustrates the application of the method to monthly averaged temperatures recorded in June 2007.It shows a zoom into the areas where the slope in the distribution of thermal anomalies exceeds the above-defined threshold.Further investigation revealed the presence of a wrong value in both areas.

Conclusions
A reliable map of climatic reference values allows precise estimations of the represented climatic magnitudes in any area, even if there is not a meteorological station in it.It also may refer meteorological data gathered in a station to its corresponding climatic value, even if there is not a long and homogeneous time series of such data in the station.It has been implemented a method to automatically detect possible errors in meteorological datasets.It performs very well for monthly precipitation in months with a smooth spatial distribution of anomalies.Sharper spatial distributions of anomalies (occurrence of heavy local showers) lead to higher false alarm rates.The method also presents a good performance for average monthly temperature.Nevertheless, it is expected to improve the method skill through its separated application to monthly averaged minimum and maximum daily temperatures.
Edited by: M. Dolinar Reviewed by: L. de Salas, J. A. Fernandez, and another anonymous referee

Figure 1 .
Figure 1.Reference values for the average temperature in October.

Figure 2 .
Figure 2. Distribution of the thermal anomaly, that is, the difference between the monthly-averaged temperature and its reference value, in June 2007.In both sides, zoom into areas where the slope exceeds the threshold value of 0.7 • C km −1 .