Convection-permitting weather forecasting models allow for prediction of rainfall events with increasing levels of detail. However, the high resolutions used can create problems and introduce the so-called “double penalty” problem when attempting to verify the forecast accuracy. Post-processing within an ensemble prediction system can help to overcome these issues. In this paper, two new up-scaling algorithms based on Machine Learning and Statistical approaches are proposed and tested. The aim of these tools is to enhance the skill and value of the forecasts and to provide a better tool for forecasters to predict severe weather.

Since the dawn of Numerical Weather Prediction (NWP), the increasing need for accurate prediction has grown as fast as advances in technology and modern high-performance computing

As outlined in

One common post-processing approach for EPS forecasts is to use a neighbourhood processing technique called up-scaling

Machine learning, as well as deep learning algorithms and neural networks are becoming increasingly popular and widespread throughout the scientific world for their natural quality of self-adjustment and self-learning

In this paper, up-scaling methods will be applied to a number of rainfall cases, using forecasts from the operational ensemble system in use at Met Éireann, the Irish national meteorological service. This system, along with the cases and observational data, will be introduced in Sect.

The Irish Regional Ensemble Prediction System (IREPS) used for daily operations at Met Éireann is based on the system described in

The test cases introduced below and reported on later in the paper's results focus on the rainfall forecast by IREPS in the 24 h from the 00:00 Z cycle. In order to reduce data sizes a domain cut-out over the island of Ireland was used (see Fig.

The operational IREPS domain (green) with the “Island of Ireland” sub-domain used in this study (blue).

This case study was exploited during the initial testing of the statistical and machine learning tools. It concerns an episode of strong convective activity over south-west Ireland on the 9 May 2020. The IREPS forecast reveals a rapid development of convective phenomena over the south-west of Ireland starting from hour 13:00 Z.
Shown in Fig.

Three-hour rainfall accumulations from 15:00–18:00 Z on the 9 May 2020, when convective rainfall was triggered particularly in the south-west and south-east of the country. The top left panel shows observed accumulations from Met Éireann's operational radar. The remaining panels show the predicted amounts from the 11 members of the 00:00 Z IREPS cyle on the 9 May.

June 2020 presented a rich variety of meteorological scenarios in terms of rainfall. Eight 24 h periods from June 2020 were investigated. A short description of the rainfall characteristics of the period is given in Table

The verification metrics in Sect.

Concise description of the 24 h rainfall forecasts from the 00:00 Z cycles of IREPS on the selected days in June 2020.

Met Éireann synoptic and automatic climate stations used.

In this section, different up-scaling methodologies are presented. The idea behind neighbourhood up-scaling is to reduce the double-penalty error by substituting each grid point with a reweighting that takes into account the forecasts at a number of neighbouring points

The up-scaling procedure may be neatly described by a matrix convolution as follows. Let

It should be noted that each algorithm returns a two dimensional array smaller than the input one because it takes into account enough distance from the borders to perform the convolution. The matrix covers a sub-section of the entire IREPS domain thus, points at the edge are treated alike without need of further manipulation.

This subsection describes the basic up-scaling post-processing currently used by Met Éireann, which aims to better classify the occurrence of rain in a region rather than by expecting a perfectly precise geographic match. A fraction-like matrix is defined, here called the Fraction Probability Matrix (FPM). It is obtained as follows. First, the original precipitation field is mapped onto a Boolean form, allocating a logical value of

There are two main way to proceed at this stage. The convolution could be applied on each binary matrix followed by a summation. This approach requires some further manipulation, whereby a clipping function has to be applied to restrain values above the probabilistic range. However, there are a few disadvantages that must be taken into account. If working with an EPS with higher number of members (greater then

Hence, a second approach is more convenient, in which a simple averaging is first applied before the up-scaling procedure is performed. This average defines the FPM as

Any up-scaling technique should ideally have a dynamical configuration in order to effectively adapt itself to extreme highly variable situations. The following approach endeavours to model the in-homogeneity of the rainfall distribution and seeks to recognise those regions where changes takes place.
Areas at equal values of fraction probability are up-scaled by the same kernel, while a gradient highlights an edge between zones at different level of agreement within members which makes the radius settings more delicate. Thus, it is reasonable to assume that the value of

Variability is evaluated by an indicator. Among several metrics available, standard deviation is chosen as it is widely used to return the dispersion from the mean within a group of numbers

A window of odd size dimension is defined, within which the spread metric is going to be calculated (lets call it

A double nested loop running over rows and column allows to select each point of the FPM, which becomes the center of the pre-set

Up-scaling is performed, using a Gaussian kernel (further details can be found in Appendix

Implementation requires setting appropriate ranges of standard deviation and related radii.
The list of radii is given as an input argument, while the standard deviation ranges are fixed. In such a way all possible configurations can be analysed, a total of

Figure

Spread matrix for the 9 May test case, constructed by replacing each point of the domain with the respective value of standard deviation.

These three images allow for a qualitative comparison between the unupscaled FPM

A second up-scaling approach is now proposed based on a machine learning tool in which a pattern recognition ability is employed, i.e. the capability of arranging data or information in regular and well defined structures. Further information on the underlying strategy can be found in

A Hierarchical aggregation framework treats all objects in a given initial set as an individual group and by step-wise comparison it progressively discovers the best way to group them together based on similarity level. All elements are jointly exhaustive (each point must be in one subset) and mutually exclusive (the same point can not be found in more than one subset). Data are stored into column arrays with which the algorithm is fed. Then, the Hierarchical clustering starts analysing every observation as a cluster and after each iteration two groups (containing one or more observations) are gathered.
When all remaining elements are aggregated in one single cluster the algorithm stops.
Specifications regarding the step-wise aggregation are chosen as follow: similarity between grid values (inner-elements distance) is estimated using a standard

A first testing phase was conducted on the case-study described in Sect.

From a meteorological perspective, latitude and longitude are less relevant, given the small scale of Ireland, although topography is certainly relevant to a convective rainfall field at the boundary and could be interesting to explore.
It can be seen in the right panel of Fig.

In a similar manner to the dynamical approach in Sect.

As previously mentioned, Hierarchical clustering requires one-dimensional arrays. Therefore, the FPM is reshaped from 2D matrix to a column value arrangement. In such a way it can be treated as a feature by the linkage operation.

The number of clusters is obtained via a dendrogram, a graphical representation of the cumulative hierarchical progression to display relationships of data. A dendrogram is a tree-like plot, where data are listed along the

Once the number of cluster is established, each grid-point has to be associated to a group. Therefore, the agglomerative clustering is performed to get aggregation prediction. When dealing with the FPM, points having similar fraction probability are grouped together as is shown in Fig.

Each point within a cluster is up-scaled using the usual convolution operation and the assigned radius. A further scaling is required to maintain values in the range

On the left side of the panel, a snapshot of the total rainfall field forecast at

Result from step 3 of the hierarchical clustering algorithm. The aggregation prediction associates each point of the domain with one cluster. Each color represents a cluster.

Clustering based up-scaling, with a combination of

Flowchart representation of the schemes of action for the spread-base up-scaling

Table

The precipitation thresholds [0.2, 0.5 and 1.0–5.0 mm in 0.5 mm increments] were chosen so as to ensure usable statistics from each of the days in the dataset. This gives in total a sample size of 11 thresholds over the nine 24 h periods. Statistics are calculated over the island of Ireland domain illustarted in blue in Fig.

As illustrated by both the BS and AUC scores, the machine learning based clustering approach gives the most satisfactory results, with a mean BS of 0.111 and an AUC of 0.845. The improvement in AUC score in particular between the “Original FPM” method and the “Clustering” method is noteworthy. This suggests that IREPS rainfall forecasts could benefit from a post-processing technique based on a clustering technique. The “Fixed”, “Spread” and “Clustering” methodologies all illustrate improvements over the “Original FPM”.

Figures

The improvement in both BS and AUC scores for each of the up-scaling techniques is encouraging. The results clearly demonstrate the need for the up-scaling of EPS convective rainfall forecasts, as has been reported by many others

It is important to note that these results aim to show a general improvement in the post-processed skill scores by the various statistical and machine learning algorithms, not to prove their supremacy in all scenarios. Indeed there are some scenarios where the basic fixed up-scaling performs better (not shown). The BS and AUC scores presented here were calculated using synoptic and climate stations. A more thorough and robust verification of the various algorithms could be performed by calculating skill scores using radar data to ensure better spatial coverage and a greater sample size. It must also be noted that the advantage of the “Spread” and “Clustering” methodologies is that the control parameters of each method can be modified in order to reach a better verification score.

Mean value of BS and AUC calculated for 9 May and case studies in Table

Brier Score

The role of post-processing is undoubtedly fundamental to issues affecting NWP products, which still use approximated theories or parametrizations to reproduce behaviours at small spatial scales that NWP model resolutions cannot handle.

In this paper, two neighbourhood approaches for convective-permitting models based on a statistical post-processing and a machine learning technique were proposed. Objective verification results demonstrated that the machine learning approach gave an improvement over more traditional up-scaling approaches.

A number of aspects were not explored in this work however. A more robust dataset with a more diverse set of meteorological scenarios would need to be investigated in order to make concrete statements about the supremacy of the machine learning technique.

Furthermore, the dependency on the weather scenario should be taken into account and so a deeper exploratory analysis is required in order to set all arbitrary hyper-parameters in the chosen algorithms to adapt along with an evolution in weather patterns. Other machine learning techniques, e.g. convolutional neural networks

The results described here have focused solely on convective rainfall in the summer period. Testing the methodologies on rainfall episodes during Ireland's winter may uncover a preference for other methodologies, as could an investigation into the up-scaling of other meteorological variables (e.g. 10 m wind speed, 2 m temperature). Finally, a more global solution could be to base the choice of up-scaling algorithm on the atmosphere's large-scale dynamics

Widely used in imaging filtering processes to blur images

3D plots showing three different configurations of the Gaussian kernel with respect to the change of boundary

The code used in this paper is open-source and available at the following web address:

The data used to obtain the results in this study are available upon request from the lead author.

TC developed the up-scaling algorithms and the code for implementing them. CC, CD and AH planned the cases and prepared the forecast data and observations. All authors analysed the results. TC primarily prepared the manuscript with contributions from CC, CD and AH.

The authors declare that they have no conflict of interest.

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the special issue “Applied Meteorology and Climatology Proceedings 2020: contributions in the pandemic year”.

We thank Rónán Darcy for his introductory work on the fixed up-scaling, Luca Ruzzola for valuable discussion and very useful brainstorming, and two anonymous reviewers who helped to improve the presentation of this work. The opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Science Foundation Ireland.

This publication has emanated from research supported in part by a Grant from Science Foundation Ireland under grant no. 18/CRT/6049.

This paper was edited by Andrea Montani and reviewed by three anonymous referees.