Wind observations are important for a wide range of domains including among others meteorology, agriculture and extreme wind engineering. To ensure the provision of high quality surface wind data over Belgium, a new semi-automated data quality control (QC) has been developed and applied to wind observations from the automated weather stations operated by the Royal Meteorological Institute of Belgium. This new QC applies to 10 m 10 min averaged wind speed and direction, 10 m gust speed and direction, 2 m 10 min averaged wind speed and 30 m 10 min averaged wind speed records. After an existence test, automated procedures check the data for limits consistency, internal consistency, temporal consistency and spatial consistency. At the end of the automated QC, a decision algorithm attributes a flag to each particular data point. Each day, the QC staff analyzes the preceding day's observations in the light of the assigned quality flags.
This paper describes the quality control (QC) procedures developed at the Royal Meteorological Institute of Belgium (RMI) to ensure the accuracy and reliability of the wind observations performed within the Automatic Weather Stations (AWS) network operated by RMI. Indeed, while high quality wind measurements are critical for many fields of science and engineering, only little attention has been paid in literature to quality control (QC) of wind-related variables. DeGaetano (1997) was the first to introduce complex automated QC checks for hourly wind records. In a subsequent study (DeGaetano, 1998), the author went a step further by proposing a distinct treatment for calm and non-calm wind speed values in the detection of wind speed bias. Later, Graybeal (2006) proposed to evaluate the reliability of extreme wind values using a relationship between daily wind speed and daily wind gust peaks (Weggel, 1999). In more recent years, Jiménez et al. (2010) extended the automated QC procedures of wind speed and wind direction values to wind data collected at higher temporal resolutions of 10 or 30 min. Lastly, Chávez-Arroyo and Probst (2015) presented a set of eleven QC procedures applied to the wind velocity records of the automated surface observation network of the Mexican National Weather Service. In the present approach, the automated QC functions are included in a larger QC protocol involving manual inspections.
Wind data are evaluated daily by automated screening and manual inspection. Every morning a comprehensive suite of QC algorithms is applied to the previous day's data and a report summarizing the results for each station is produced for the RMI's QC staff. The purpose of the automated data screening is to objectively identify anomalous data values for subsequent review by an experienced data analyst. Note that false positives (i.e. type I error) increase the burden on the manual QC and false negatives (i.e. type II error) reduce the data quality. The review is necessary to determine whether an anomaly results from a problem with the instrument – and what maintenance action may be necessary – or whether it accurately reflects unusual meteorological conditions.
The paper is organized as follows. In Sect. 2, we briefly describe the wind measurements performed within the AWS operated by RMI. In Sect. 3, the automated quality control procedures are presented. Manual QC is discussed in Sect. 4. Finally, conclusion and perspective are given in Sect. 5.
Location of wind measurements performed within the
RMI's AWS network and the associated QC groups (see Table
List of the 14 RMI's Automatic Weather Stations performing wind observations and the associated measurements. Group of stations with similar measurements are defined by “QC group”.
Wind speed and direction are recorded in 14 AWSs operated by RMI (see
Fig.
RMI's AWS are built around a programmable data logger that acquires the sensors' measurements, then processes, stores and transmits the data to the central RMI database (DB) in Uccle, Brussels. Once converted to digital values a first processing is performed at the raw data level allowing calculation of 10 min wind speed and direction averages from the 1 s measurements together with the computation of the gust speed and direction (the gust speed being defined as the maximum 3 s wind speed running average over the 10 min time period).
Similarly to what is done for the air and soil temperatures measurements from
the RMI's AWS (Bertrand et al., 2013, 2015), a first basic QC is performed on
all wind records once acquired centrally to ensure that gross errors are
found before being further transmitted in the central DB. Automated
procedures monitor the data to make sure they are collected and that the
system performance is acceptable. After an existence test, a module checks
for physical limits and flags the data violating these limits (erroneous when
data lie outside physical limits and suspect when lying outside basic
long-term climatological extremes that do not take into account the time of
year and location). A list of missing and flagged data is automatically
produced after each control cycle and transmitted to the AWS network
maintenance team for further intervention (see Fig.
Flowchart of the wind quality assurance process implemented at RMI. (Dashed arrow only applies to 30 m wind measurements in Uccle – AWS6447.)
Second, each night automated procedures check the previous day wind records
for more subtle errors. Based on previous works by, e.g., DeGaetano (1997),
Graybeal (2006), Liljegren et al. (2009), Jiménez et al. (2010) and
Chávez-Arroyo and Probst (2015), the developed data quality tests fall into
four categories ranging from simple to complex, from less restrictive to more
restrictive. More specifically, the quality assessment process (see
Fig.
To interpret the results of the automated tests, a decision algorithm has been developed that is applicable to all wind parameters and all sites. The algorithm proceeds sequentially through each step until a failure mode is identified. If no failure mode is identified, the measurement is judged to be valid. At the end of the process, a report is automatically generated for each AWS and sent to the QC staff.
Implausible values are first defined according to the ranges specified by the
manufacturer of the measurement equipment. Here, the fixed range is from 0 to
360
The internal consistency check consists of three different stages; in the
first stage gust consistency is verified while in the second stage it is
required that zero wind (gust) speed records must have zero and non-changing
associated wind (gust) direction records. The third stages involve vertical
comparisons of the 10 min wind speed measurement at different heights on the
same AWS/site. This test provides a more stringent constraint than simple
valid maximum/minimum limit tests by requiring consistency among the
measurements as well as consistency with historical data. In order to
implement such a QC procedure, 10 min wind speed record measured at a given
height is related to 10 min wind speed value at another height using a
simple linear regression model. At each station location, parameters of the
regression model were estimated using the resistant least trimmed squares
(LTS) regression method (e.g., Rousseeuw, 1984) due to the expected existence
of outliers in the historical station data considered to fit the model. The
biweight mean and standard deviation (Lanzante, 1996) were then used to
calculate the confidence intervals around the regression line. Note that
prediction intervals were constructed on the basis of a target-flagging rate
of 1 per 1000 (e.g. a 99.9 % interval) for erroneous and of 10 per 1000
(e.g. a 99 % interval) for suspicious, respectively (e.g., Eischeid et al.,
1995; Graybeal et al., 2004; Graybeal, 2006; Liljegren et al., 2009). Because
two comparisons are necessary to unambiguously identify which level is
problematic, at least two vertical tests (comparing three levels) must fail
for the decision algorithm to report an anomaly. Consequently the decision
algorithm will never report a vertical anomaly for AWS where only two wind
speed measurement levels are available. However, a single vertical test is
still valuable because a single failed vertical test can confirm a range test
failure and cause the decision algorithm to report a range anomaly. For this
reason, wind speed vertical comparisons are performed not only at stations of
QC group 1 (where the test is the more efficient as it involves three
measuring levels) but also at stations of QC groups 2 and 4 (see
Table
The temporal consistency check aims at detecting abnormally low and high
variations in wind speed and direction records. Because the frequency
distribution of repetitive readings under calm conditions for a given AWS has
a far heavier tail than the distribution for non calm condition (i.e. both
true calms and total sensor failures produce a sequence of repetitive values,
while a similar situation is quite improbable for valid non-calm values) a
distinct treatment is applied to calm and non-calm wind speed records,
respectively. Ideally the limit between calm and non-calm wind speeds should
be given by the anemometers cut-in wind speed (typically in the order of
0.3 m s
Frequency of occurrence of different consecutive
10 m wind speed repetitions at given AWS (non-calm wind speed) gathered over
five years (i.e., 2010–2014) of 10 min wind speed records (see
Table
To identify the maximum number of consecutive unchanging records that can be
assumed to be valid, an analysis of the frequency counts for different
numbers of consecutive repetitions was performed at each station's location
for each wind parameter at each recording height using manually quality
controlled historical data. For short durations, constant wind periods are
reported with a high frequency of occurence. As duration increases, an abrupt
decrease in frequency of constant wind periods appears in all stations
although each site has a different decay rate. As an example,
Fig.
Besides the persistence test, a spike/step test compares the magnitude of change between 10 min wind speed records with the maximum probable change for a 10 min, 1h, 2h, 3h, and 6h time step periods. As for the persistence test, the maximum probable change is based on the 99th (99.9th) percentile change for several years of quality controlled prior data. The maximum probable change for wind speed depends on the location and sensor mounting height. To minimize the possibility of a false positive identification, a given 10 min wind speed record must fail in at least three of the five tested time steps prior to be flagged as suspect or erroneous. Moreover, because wind speed associated with thunderstorms can produce large changes in successive data values, the decision algorithm does not report a spike/step anomaly if more than one sensor mounted at different height on the same location fails the spike/step test. Note that the wind speed spike/step test is performed in all our AWS. Indeed, even if a failure in only one height is insufficient to report an anomaly, it can support other kind of detected failure mode during the algorithm decision making process.
The step/spike test is however not well suited when missing values are found
in the time series as the difference between records cannot be calculated.
Using different time steps allow to partly overcome such a limitation but
does not solve the handling of abnormally high wind speed records surrounded
by missing values (which tend to be systematically detected as invalid;
Jiménez et al., 2010). Therefore, based on Graybeal (2006) an additional QC
procedure evaluates the reliability of extreme wind values using the
empirical relation of Weggel (1999) between daily mean wind speed,
Log daily peak-gust factor plotted against daily mean
wind speed, for an arbitrary sample of
It is worth pointing out that when abnormal low or high (excepted for the spike/test) variations are detected by the automated procedures, a daily warning is sent to the QC staff for requesting a visualization of the entire daily time evolution of the identified problematic wind parameter during the manual QC.
In the horizontal test, the differences between a measurement and the
corresponding measurements on other locations are compared. Here, the spatial
check only applies to the 10 min averaged wind direction records (for
non-calm wind speed situations). It compares the station's direction to the
mean wind direction in a radius of 75 to 100 km around the analyzed station's
direction value. Basically, the station fails the neighboring test if the
difference between the recorded station's direction and the computed
direction in the radius is larger than 100
Illustration of the spatial consistency check applied
to the 10 min averaged direction records. Date: 2015-08-23 15:00 UTC.
Station flagged: Zelzate (AWS 6431). Station's speed: 2.5 m s
Each day, the QC staff analyses the preceding day's wind records in the light of the assigned quality flags from the automated system. Results of the automated system can be graphically plotted on the operator terminal screen. In that case, all the analyzed wind speed records (including the gust speed) of the inspected day at a given station are presented in a graphic window with erroneous or suspect values indicated in the corresponding parameter daily time series. Similarly, the daily time series of the wind and gust directions are reported in separate window together with wind direction recorded at neighboring stations (e.g. daily time series of wind direction measurements from stations included in a domain surrounding the analyzed station – domain delimited by the operator). Visual inspection of all records flagged by the automated decision making algorithm is done to distinguish instrumental problems from plausible behaviors. It is the human decision whether or not a value is accepted. When errors are verified or visually detected, faulty records are eliminated and “trouble tickets” are issued where needed to the maintenance team so that sensors can be replaced or repaired. More than simply deleting erroneous measurements, human operators supply corrections and estimations (i.e., when values are missing) where possible. They have the opportunity to visualize different automated corrections on the problematic time series in order to determine the most appropriate in their specific case while it is always possible for individuals to apply their own corrections. When the correction/estimation process is completed, all modifications introduced by the operator are automatically recorded in the central RMI's DB. Note that the original parameters values are kept in the database and still accessible by the QC staff if required.
Automation of the RMI's AWS data quality control is in progress. After the
automated quality control of 10 min air and soil temperatures records
(Bertrand et al., 2013, 2015), automated quality assurance procedures devoted
to wind records have been operationally implemented to support the QC staff
in their work. Validation exercises have revealed that unsurprisingly the
automatic QC system performs better for stations of the QC group 1 than for
those of the QC group 3 as the increased wind speed recording heights allow
to refine the final decision of the algorithm. Nevertheless, it has been
found that the automated QC is able to correctly identify problematic
parameters in a particular station on a given day irrespectively of the AWS
QC group. However, the spatial consistency check applied to the 10 min wind
direction tends to produce type I error (i.e. false positives) at some
stations (located in the North-East part of Belgium). This occurs when the
station's direction while being close to the direction recorded at nearest
neighboring station differs by more than 100
Finally, while the validation exercise has not revealed a particular weakness
in the step/spike test, we are planning to investigate if it could be
relevant or not to adapt the procedure using the daily mean wind speed and
the gust factor to detect abnormally high variations in presence of missing
data records (Eq.