Hostname: page-component-7dd5485656-s6l46 Total loading time: 0 Render date: 2025-10-28T19:01:20.067Z Has data issue: false hasContentIssue false

From winter storm thermodynamics to wind gust extremes: discovering interpretable equations from data

Published online by Cambridge University Press:  27 October 2025

Frederick Iat-Hin Tam*
Affiliation:
Faculty of Geosciences and Environment, University of Lausanne, Lausanne, VD, Switzerland Expertise Center for Climate Extremes, University of Lausanne, Lausanne, VD, Switzerland
Fabien Augsburger
Affiliation:
Faculty of Geosciences and Environment, University of Lausanne, Lausanne, VD, Switzerland
Tom Beucler
Affiliation:
Faculty of Geosciences and Environment, University of Lausanne, Lausanne, VD, Switzerland Expertise Center for Climate Extremes, University of Lausanne, Lausanne, VD, Switzerland
*
Corresponding author: Frederick Iat-Hin Tam; Email: ft21894@gmail.com

Abstract

Reliably identifying and understanding temporal precursors to extreme wind gusts is crucial for early warning and mitigation. This study proposes a simple data-driven approach to extract key predictors from a dataset of historical extreme European winter windstorms and derive simple equations linking these precursors to extreme gusts over land. A major challenge is the limited training data for extreme events, increasing the risk of model overfitting. Testing various mitigation strategies, we find that combining dimensionality reduction, careful cross-validation, feature selection, and a nonlinear transformation of maximum wind gusts informed by Generalized Extreme Value distributions successfully reduces overfitting. These measures yield interpretable equations that generalize across regions while maintaining satisfactory predictive skill. The discovered equations reveal the association between a steady drying low-troposphere before landfall and wind gust intensity in Northwestern Europe.

Information

Type
Application Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press

Impact Statement

Data-driven equation discovery is a growing field in scientific machine learning, yet its application to extreme events remains underexplored, particularly for small, noisy, multidimensional datasets. Using European extreme winter storm data, we demonstrate that careful cross-validation, nonlinear transformations informed by extreme value theory, and multi-objective optimization yield interpretable equations that generalize well across regions and unseen storms. This paves the way for objective knowledge discovery for weather extremes.

1. Introduction

Damaging windstorms associated with extratropical cyclones are among the leading weather-related disasters in the mid-latitudes winters, especially in Europe (Schwierz et al., Reference Schwierz, Köllner-Heck, Zenklusen Mutter, Bresch, Vidale, Wild and Schär2010). From a weather forecasting standpoint, early detection of reliable precursors leading to extreme windstorms helps with timely disaster mitigation. Beyond horizontal temperature differences strengthening storms (Laurila et al., Reference Laurila, Gregow, Cornér and Sinclair2021), studies have highlighted how lower-atmosphere humidity fuels wind jets (Martínez-Alvarado et al., Reference Martínez-Alvarado, Gray, Clark and Baker2022) and transports high-wind air toward the surface (Browning et al., Reference Browning, Smart, Clark and Illingworth2015), yet simple models linking winterstorm characteristics to wind gusts are lacking.

Interpretable machine learning has enabled trustworthy data-driven models of thunderstorms (Hilburn, Reference Hilburn2023) and tropical cyclones (Tam et al., Reference Tam, Beucler and Ruppert2024), improving process understanding by reducing the attribution uncertainty of post-hoc explainable AI methods (Mamalakis et al., Reference Mamalakis, Barnes and Ebert-Uphoff2023). This motivates us to apply data-driven methods to uncover novel temporal patterns in environmental precursors to extreme European windstorms. A key challenge is the limited availability of extreme storm data, which restricts the use of complex nonlinear models prone to overfitting in small-sample conditions. However, this limitation also presents an opportunity to derive a simple, interpretable equation approximating complex physical processes, paving the way for data-driven discovery.

Data-driven equation discovery distills simple laws from empirical data (Schmidt and Lipson, Reference Schmidt and Lipson2009) with methods ranging from sparse selection algorithms to symbolic regression (Song et al., Reference Song, Jiang, Camps-Valls, Williams, Zhang, Reichstein, Vereecken, He, Hu and Shi2024). While data-driven approaches have successfully identified physical laws from controlled fluid dynamics experiments (Brunton et al., Reference Brunton, Proctor and Kutz2016; Cheng and Alkhalifah, Reference Cheng and Alkhalifah2024), finding interpretable equations from noisy, multidimensional weather and climate data remains under-explored. Balancing model performance and algorithmic complexity, Grundner et al. (Reference Grundner, Beucler, Gentine and Eyring2024) uncovered simple analytical cloud cover equations from high-resolution model outputs. We similarly implement a hierarchical modeling framework to track the trade-off between the model complexity and generalization capability, identifying a set of sparse yet descriptive equations that achieves satisfactory prediction skills for unseen storm cases across European regions. Additionally, we explore the best strategy for dimensionality reduction and feature selection for reliable discovery of simple equations from small yet high-dimensional datasets.

2. Data

2.1. Raw datasets description

Winter windstorm tracks for Europe. The winter storm data analyzed in this study is based on the ECMWF Winter Windstorm Indicator dataset (C3S CDS, 2022). This dataset documents historical severe windstorms between 1979 and 2021, and their footprints. The tracking of storms is performed by applying a tracking algorithm on the ERA5 reanalysis data from October through March. The tracking algorithm tracks local maxima of 850 hPa relative vorticity, with a minimal 10 m wind speed threshold of 25 m s−1. A total of 118 storms is identified in the 42-year period. This study analyzes storms in the last 30 years (96 storms). Since we focus on the impact of severe windstorms over land in this study, short-lived or non-landfalling storms are filtered out, which leaves us with 63 storms.

ERA5 meteorological reanalysis. We use the hourly ERA5 reanalysis dataset with a horizontal resolution of 0.25°–0.25° to characterize the environmental conditions for all tracked storms. Our analysis considers 28 potential environmental drivers for wind gust formation and intensity (Schulz and Lerch, Reference Schulz and Lerch2022) listed in Supplementary Information Table 6. To find the driver conditions local to each storm, we adopt a system-following 4°–4° grid box centered around the storm track of all 28 environmental drivers. These drivers include measurements of the winterstorm kinematics (e.g., 10 m wind speeds), precipitation characteristics (e.g., mean precipitation rates), and the thermodynamic characteristics (e.g., latent heat flux; K-index) near the windstorms. A full list of the ERA5 variables and their units is provided in Supplementary Information Table 6.

2.2. Data preprocessing: dimensionality reduction and partitioning

Focusing on extreme events limits the training data for data-driven models. To ensure reliable equation discovery, we reduce the dimensionality of storm history and post-landfall gust data, turning our dataset into one that can be analyzed with simple data-driven models without overfitting.

Features: prescribed spatial dimensionality reduction. To encode the 4D storm history fields as time series, we compute four spatial statistics: maximum (max), minimum (min), mean, and standard deviation (std). Using predefined statistics instead of learned filters facilitates physical interpretability. For example, a higher temperature standard deviation may indicate the presence of weather fronts (Lagerquist et al., Reference Lagerquist, Allen and McGovern2022). To further reduce dimensionality, we apply Principal Component Analysis (PCA) to compress time series into orthogonal temporal modes ( $ {\Pi}_X(t) $ ) and PC loadings ( $ {\mathrm{PC}}_X $ ), which are scalars denoting the projection of the time series onto $ {\Pi}_X(t) $ :

(1) $$ X(t)-\overline{X(t)}=\sum \limits_{i=1}^N{\mathrm{PC}}_{X,i}{\Pi}_{X,i}(t),\hskip1em \mathrm{where}\hskip1em {\left\langle {\Pi}_{X,i}(t)|X(t)\right\rangle}_X={\mathrm{PC}}_{X,i}. $$

Here, $ N $ is the number of retained PC modes indexed by $ i $ , and $ {\left\langle \cdot |\cdot \right\rangle}_X $ denotes the inner product associated with the PCA decomposition of $ X $ . The modes $ {\Pi}_{X,i}(t) $ are normalized such that $ {\left\langle {\Pi}_{X,i}(t)|{\Pi}_{X,j}(t)\right\rangle}_X={\delta}_{ij} $ where $ {\delta}_{ij} $ is the Kronecker delta, ensuring orthonormality of the PC modes.

Targets: geographical clustering. In a low-sample-size regime, predicting wind gusts for every pixel across Europe is impractical and difficult to interpret. We use K-means++ clustering (Arthur and Vassilvitskii, Reference Arthur and Vassilvitskii2007) to partition the landmass into $ K $ subregions, incorporating elevation, latitude, longitude, and maximum gusts from 63 severe windstorms. This ensures clusters reflect windstorm gust characteristics while remaining spatially coherent. To enforce geographical contiguity, we apply k-nearest neighbors (kNN; Enas and Choi, Reference Enas and Choi1986), reassigning edge pixels based on the most frequent cluster label. Although silhouette scores suggest an optimal $ K=4 $ –5 (Supplementary Figure S1), these clusters group disjointed regions with distinct wind gust characteristics (Supplementary Figure S2). We therefore consider three additional clustering metrics (e.g., Xie and Beni, Reference Xie and Beni1991), which identify local optima at $ K=9 $ , 14, and 15 (Supplementary Figure S3). Among these, $ K=15 $ best preserves geographical compactness and aligns with established European climatic zones (Jylhä et al., Reference Jylhä, Tuomenvirta, Ruosteenoja, Niemi-Hugaerts, Keisu and Karhu2010) (Supplementary Figure S4). With clusters defined, we extract, for each storm, the highest instantaneous 10 m wind gust within each cluster during the 15 hours post-landfall. This window ensures short-lived but intense storms like Lothar (Wernli et al., Reference Wernli, Dirren, Liniger and Zillig2002) are included. Our target is the maximum 15-hour wind gust within each cluster ( $ {U}_{\mathrm{gust},i} $ ), where $ i $ indexes the 15 clusters.

Cross-validation procedure. To ensure data-driven models select gust predictors applicable to severe windstorms, we always leave out the same seven storms for testing. The test set is randomly chosen, except for the Lothar storm, which is explicitly withheld to assess performance on an extreme, unseen case. The remaining 56 storms are split into 7 training-validation folds by randomly selecting 7 sets of 8 storms without replacement for validation (“Random split”). Alternatively, as a step toward invariant causal prediction (Peters et al., Reference Peters, Bühlmann and Meinshausen2016), we identify features robust to distributional shifts by clustering the output vector $ {\mathbf{U}}_{\mathbf{gust}} $ into seven groups via K-means++ (Supplementary Table S2). We then train on six clusters and validate on the remaining one, repeating this process seven times (“ICP split”), following Häfner et al. (Reference Häfner, Gemmrich and Jochum2023). Sweet et al. (Reference Sweet, Müller, Anand and Zscheischler2023) show that such an “ICP split” outperforms naive k-fold cross-validation for regression.

2.3. Extreme value theory for wind gust distributions

Assessing wind gust severity relative to local climatology and distinguishing moderate from extreme gusts are key to risk assessment. We describe extreme gusts using the Generalized Extreme Value (GEV) distribution, a family of continuous distributions designed for block maxima such as extreme sea levels (Méndez et al., Reference Méndez, Menéndez, Luceño and Losada2022). The cumulative distribution function (CDF) and its functional inverse (quantile function) are

(2) $$ {G}_{\mu, \sigma, \xi }(x)=\left\{\begin{array}{cc}\exp \left[-{\left(1+\xi \frac{x-\mu }{\sigma}\right)}^{-\frac{1}{\xi }}\right],& \xi \ne 0,\\ {}\exp \left[-\exp \left(-\frac{x-\mu }{\sigma}\right)\right],& \xi =0,\end{array}\right.\hskip1em {G}_{\mu, \sigma, \xi}^{-1}(p)=\left\{\begin{array}{cc}\mu +\frac{\sigma }{\xi}\left[{\left(-\ln p\right)}^{-\xi }-1\right],& \xi \ne 0,\\ {}\mu -\sigma \ln \left(-\ln p\right),& \xi =0.\end{array}\right. $$

where $ \mu \in \mathrm{\mathbb{R}} $ (location), $ \sigma \in {\mathrm{\mathbb{R}}}_{+}^{\ast } $ (scale), and $ \xi \in \mathrm{\mathbb{R}} $ (shape). Unlike empirical CDFs, the GEV enables extrapolation beyond observed data to estimate return periods for extreme events. Due to the high spatial variability in land surface and topography across Europe, we fit a separate GEV for each cluster (Figure 1ce; Supplementary Table S1). After testing multiple definitions and durations for block maxima sampling, we opted for daily maximum gusts over all cluster grid points during winter months (Supplementary Figure S5). This approach differentiates regions frequently impacted by strong gusts from those less affected (Figure 1a,d) and allows modeling of unseen extremes, which empirical distributions cannot capture (Figure 1b).

Figure 1. (a) We derive geographically contiguous regions of the European continent via K-means++ and kNN clustering to define maximum wind gust targets. (b) Extreme gust distributions are modeled using the GEV distribution, with shape (c), location (d), and scale (e) parameters fitted for each region.

3. Methodology

Feature selection. We aim to derive interpretable equations linking storm characteristics to post-landfall gusts. Even with prescribed spatial statistics and PCA for dimensionality reduction, the number of features (28 predictors $ \times $ 4 spatial statistics $ \times $ 1–12 PC loadings) still far exceeds the number of training and validation samples (56). To address this underdetermined problem, we apply sequential forward feature selection, minimizing the mean or maximum validation root-mean-squared error (RMSE) across our seven cross-validation splits. Features are added until the RMSE objective no longer decreases.

Model hierarchy. We construct a hierarchy of data-driven models expressible as simple equations by training multiple linear regressions (MLR) with PC loadings as features while varying three hyperparameters: feature smoothing, retained feature variance, and target transformation. For feature smoothing, we apply a low-pass rectangular filter to the PC loading time series, testing cutoff frequencies from 0.52 rad/hr to 1.92 rad/hr. For retained variance, we set four thresholds to determine the number of retained PC components per predictor: [75%, 80%, 85%, 90%]. The number of retained PC components per predictor acts as a regularization hyperparameter controlling the predictor variance fed to the ML framework. Finally, we explore a nonlinear, data-adaptive transformation of maximum wind gusts to encourage out-of-distribution generalization for extremes (Buriticá and Engelke, Reference Buriticá and Engelke2024). For each sample, we first obtain its percentile value from the CDF fitted in each region, then apply a nonlinear transformation to better distinguish extreme values in the GEV distribution tails: $ {Z}_i=-\ln \left[1-{\mathrm{CDF}}_i\left({U}_{\mathrm{gust},\mathrm{i}}\right)\right] $ , where $ Z $ is the transformed target, and $ {\mathrm{CDF}}_i $ is the cumulative distribution function of the GEV fitted in region $ i $ . All MLRs are trained using the least squares algorithms from the Scikit-Learn Python package (Pedregosa et al., Reference Pedregosa, Varoquaux, Gramfort, Michel, Thirion, Grisel and Duchesnay2011).

Pareto optimality for equation discovery. Balancing performance and complexity is a multi-objective optimization task that can be addressed through Pareto optimization (Jin and Sendhoff, Reference Jin and Sendhoff2008). Following sequential feature selection, we select optimal models from the empirical “Pareto front,” corresponding to the lowest error achieved for a given complexity before the mean (or max) validation RMSE across seven cross-validation splits plateaus. We define complexity as the number of unique features (predictor + spatial statistic) selected, as this better reflects equation simplicity than, for instance, the number of trainable parameters.

Model evaluation. Pareto optimization seeks models with the lowest error while maintaining simplicity. We evaluate model performance using the mean validation RMSE across seven folds and measure complexity by the number of selected unique features (predictor + spatial statistic). Using the max validation RMSE yields different Pareto-optimal models, which are also evaluated. For models predicting $ \mathbf{Z} $ , we convert predictions back to wind gust speed in physical units using the per-cluster GEV quantile function ( $ {G}_{\mu, \sigma, \xi}^{-1} $ ) before computing RMSE, which ensures consistency across model hierarchies.

4. Results

4.1. Pareto-optimal models for wind gust prediction

We explore the (complexity, error) space of trained MLRs by varying hyperparameters—feature smoothing, retained variance, and target transformation—as well as the cross-validation strategy (mean or max validation RMSE). This allows us to identify Pareto-optimal maximum wind gust equations in low sample size conditions, forming the empirical Pareto front (dashed black line, Figure 2b). When features are selected by minimizing the max validation RMSE, the minimum achievable RMSE stabilizes after five selected unique variables, suggesting that additional variables may not improve generalization to unseen cases. The steady decline in validation RMSE beyond five features sets an upper complexity limit, beyond which models lose their ability to generalize across regions and storms with different gust spatial patterns. Additionally, the Pareto front (Figure 2b) indicates that models perform best when retaining 90% of PC variance and applying moderate smoothing (removing oscillations with frequency $ >1.5 $ rad/hr). These hyperparameters yield features most likely to generalize well across all storm cases.

Figure 2. (a) Training and (b) validation (complexity, error) plots for MLR models predicting $ {U}_{\mathrm{gust}} $ , with features selected to minimize the maximum validation RMSE across seven randomly split cross-validation folds. Each colored symbol represents a model with different hyperparameters: cutoff frequency for smoothing is indicated by transparency levels, and retained feature variance by shape and color. The Pareto front (black dashed line in b) shows improved validation skills up to five unique features, suggesting that sparser linear models generalize better. In contrast, training RMSE decreases linearly with additional features, suggesting overfitting beyond 5 features. All models outperform the climatological baseline, which predicts the training mean $ {U}_{\mathrm{gust}} $ and is shown as a red dashed line.

As expected, the performance table for the best sparse models from different feature selection methods (Table 1) shows that models targeting the transformed gusts ( $ \boldsymbol{Z} $ ) generally underperform models directly targeting $ {\boldsymbol{U}}_{\mathrm{gust}} $ on the validation set when RMSE is measured in $ {\boldsymbol{U}}_{\mathrm{gust}} $ units. Additionally, cross-validation using the “ICP split” yields no validation RMSE improvement over naive random splitting. However, generalizable equations must perform robustly on unseen storms. While the “ICP split” sacrifices some training and validation skill, it successfully identifies sparse equations that generalize well to extreme cases like Storm Lothar, as evidenced by its low test RMSE. In contrast, equations discovered using random splits consistently perform worse on test sets, suggesting implicit overfitting to validation sets by selecting variables optimized for specific gust spatial distributions. The smaller error spread of $ {\boldsymbol{U}}_{\mathrm{gust}} $ models confirms their tendency to select sparse equations suboptimal for unseen cases, regardless of hyperparameters. Thus, splitting storms by their wind gust spatial distribution for cross-validation (“ICP split”), transforming the target using extreme value theory, and leveraging this information for feature selection leads to more reliable models, revealing features that better anticipate unseen cases.

Table 1. RMSE for the best models in the hierarchy, along with their unique feature count. We report the mean and the standard deviation (in parentheses) across feature selection methods and use bold font for the best trained, validation and test RMSE for all trained models (smallest mean RMSE and spread). Model performance is compared to the climatology mean baseline to assess skill.

4.2. Reducing geographical bias through GEV-informed wind gust transformations

Why do the $ \boldsymbol{Z} $ models outperform the $ {\boldsymbol{U}}_{\mathrm{gust}} $ models on the test set while underperforming on training and validation sets? To address this, we examine whether this performance gap varies across geographical regions. We compare the best $ {\boldsymbol{U}}_{\mathrm{gust}} $ and $ \boldsymbol{Z} $ models trained with the “ICP split” feature selection method to assess regional differences in model skill. The $ {\boldsymbol{U}}_{\mathrm{gust}} $ model exhibits a distinct west–east RMSE gradient (Figure 3a), with higher errors in Northwestern Europe, where severe windstorms frequently make landfall. This suggests that the model prioritizes inland clusters with weaker winds, potentially at the expense of distinguishing differences in strong coastal gusts (Figure 3a). It may also struggle to anticipate rare inland windstorms, such as Storm Aila (2020) (Rantanen et al., Reference Rantanen, Laurila, Sinclair and Gregow2021). Therefore, the main difference between the $ {\boldsymbol{U}}_{\mathrm{gust}} $ and $ \boldsymbol{Z} $ error maps (Figure 3) is that the nonlinear gust transformation imposes stronger spatial regularization. The $ \boldsymbol{Z} $ model shows decreased skill in Eastern Europe and the Balkans but improved performance in Northwestern Europe, suggesting reduced geographical bias. These results indicate that the GEV-informed gust transformation mitigates regional biases in MLR predictions, leading to a fairer model despite its slightly higher overall RMSE compared to $ {\boldsymbol{U}}_{\mathrm{gust}} $ MLRs.

Figure 3. Validation error maps for (a) the best $ {\boldsymbol{U}}_{\mathrm{gust}} $ model and (b) the best $ \boldsymbol{Z} $ model in physical units (m s $ {}^{-1} $ ). The nonlinear, GEV-informed transformation reduces the west–east error gradient in $ {\boldsymbol{U}}_{gust} $ and improves gust predictions in windstorm-prone Northwestern Europe.

4.3. Leveraging interpretable, pareto-optimal equations for scientific insight

Both the best 4-feature $ {\boldsymbol{U}}_{\mathbf{gust}} $ and $ \boldsymbol{Z} $ models are linear functions of the standardized PC loadings, defined as $ {\overset{\sim }{\mathrm{PC}}}_{X,j}=\frac{{\mathrm{PC}}_{X,j}-{\mathrm{PC}}_{X,j}^{\mathrm{mean}}}{{\mathrm{PC}}_{X,j}^{\mathrm{std}}} $ , where the mean and standard deviation are calculated over the training set and $ j $ indexes the selected features. After fitting, these standardized PC loadings are multiplied by region-specific weights $ {\boldsymbol{a}}_{\boldsymbol{X}}\in {\mathrm{\mathbb{R}}}^{15\times 4} $ and then added to biases $ \boldsymbol{b}\in {\mathrm{\mathbb{R}}}^{15} $ for each of the 15 regions of interest. Using the inner product from Eq. 1, we express the models as scaled projections of winterstorm fields onto data-driven temporal patterns $ {\Pi}_{X,j}(t) $ . The equation for the $ {\boldsymbol{U}}_{\mathrm{gust}} $ model is:

(3) $$ {\boldsymbol{U}}_{\mathbf{gust}}^{\boldsymbol{U}}={\boldsymbol{\beta}}^{\boldsymbol{U}}+\sum \limits_{j=1}^4{\boldsymbol{\alpha}}_{\boldsymbol{X},\hskip0.35em \boldsymbol{j}}^{\boldsymbol{U}}{\left\langle {\Pi}_{X,j}(t)|{X}_j(t)\right\rangle}_{X_j}\;\mathrm{where}:\hskip1em {\boldsymbol{\alpha}}_{\boldsymbol{X},\hskip0.35em \boldsymbol{j}}^{\boldsymbol{U}}=\frac{{\boldsymbol{a}}_{\boldsymbol{X},\hskip0.35em \boldsymbol{j}}^{\boldsymbol{U}}}{{\mathrm{PC}}_{X,j}^{\mathrm{std}}},{\boldsymbol{\beta}}^{\boldsymbol{U}}={\boldsymbol{b}}^{\boldsymbol{U}}-\sum \limits_{j=1}^4\frac{{\mathrm{PC}}_{X,j}^{\mathrm{mean}}}{{\mathrm{PC}}_{X,j}^{\mathrm{std}}}\mathbf{1}. $$

For the $ \boldsymbol{Z} $ model, using the definition of $ Z $ and the GEV quantile function from Eq. 2, we obtain:

(4) $$ {\boldsymbol{U}}_{\mathbf{gust}}^{\boldsymbol{Z}}={\mathbf{G}}_{\boldsymbol{\mu}, \boldsymbol{\sigma}, \boldsymbol{\xi}}^{-1}\left(1-\exp \left[-{\boldsymbol{\beta}}^{\boldsymbol{Z}}-\sum \limits_{j=1}^4{\boldsymbol{\alpha}}_{\boldsymbol{X},\hskip0.35em \boldsymbol{j}}^{\boldsymbol{Z}}{\left\langle {\Pi}_{X,\hskip0.35em j}(t)|{X}_j(t)\right\rangle}_{X_j}\right]\right)\;\mathrm{with}\left({\boldsymbol{\alpha}}_{\boldsymbol{X},\hskip0.35em \boldsymbol{j}}^{\boldsymbol{Z}},{\boldsymbol{\beta}}^{\boldsymbol{Z}}\right)\;\mathrm{similarly}\ \mathrm{defined}. $$

$ {\boldsymbol{U}}_{\mathrm{gust}}^{\boldsymbol{U}} $ and $ {\boldsymbol{U}}_{\mathrm{gust}}^{\boldsymbol{Z}} $ are the predicted maximum wind gusts for each region using the direct and transformed models, respectively; $ {\boldsymbol{b}}^{\boldsymbol{U}},{\boldsymbol{b}}^{\boldsymbol{Z}}\in {\mathrm{\mathbb{R}}}^{15} $ are region-specific biases; $ {\boldsymbol{a}}_{\boldsymbol{X},\hskip0.35em \boldsymbol{j}}^{\boldsymbol{U}},{\boldsymbol{a}}_{\boldsymbol{X},\hskip0.35em \boldsymbol{j}}^{\boldsymbol{Z}}\in {\mathrm{\mathbb{R}}}^{15} $ are region-specific regression weights for the selected PC loadings; $ {\Pi}_{X,j}(t) $ are the data-driven temporal patterns associated with each winterstorm feature; $ {\left\langle {\Pi}_{X,j}(t)|{X}_j(t)\right\rangle}_{X_j} $ represents the projection of the storm field $ {X}_j(t) $ onto the temporal mode $ {\Pi}_{X,j}(t) $ ; $ {\boldsymbol{G}}_{\boldsymbol{\mu}, \boldsymbol{\sigma}, \boldsymbol{\xi}}^{-1} $ is the GEV quantile function, mapping transformed values back to physical wind gust units; and $ \mathbf{1}\in {\mathrm{\mathbb{R}}}^{15} $ is the identity vector (a column vector of ones).

For a region $ i $ and a feature $ j $ , models predict higher $ {U}_{\mathrm{gust},i} $ when the product of $ {a}_{X,j,i} $ and $ {\overset{\sim }{PC}}_{X,j} $ is positive, and vice versa. Assuming $ {a}_{X,j,i}>0 $ , time series that project positively onto the PC eigenvectors lead to higher wind gust predictions. The regression coefficients $ {\boldsymbol{a}}_{\boldsymbol{X},\boldsymbol{i}} $ in the best $ \boldsymbol{Z} $ model exhibits significant geographical variability (Figure 4a-d), indicating that the influence of temporal patterns on gusts varies across regions. For example, two low-tropospheric humidity PCs in the $ \boldsymbol{Z} $ model are positively correlated with gusts in region 4 (France and the Netherlands; Figure 4a-b), while the other two PC terms are negatively correlated (Figure 4c-d). The temporal evolution of the first PC mode of $ {RH}_{\mathrm{max}}^{850} $ and the third PC mode of $ {RH}_{\mathrm{max}}^{975} $ (Figure 4e) suggests that higher gusts in France occur when drying persists within or at the top of the boundary layer (850 hPa) before storm landfall. A drier lower troposphere promotes the descent of high-speed winds via evaporation (Browning et al., Reference Browning, Smart, Clark and Illingworth2015) or through downward mixing (Pantillon et al., Reference Pantillon, Lerch, Knippertz and Corsmeier2018). In contrast, a larger standard deviation of 500 hPa geopotential height ( $ \Phi $ ) before landfall (Figure 4f) may indicate more perturbed west–east winds, which can intensify winterstorms. To summarize, Figure 4 demonstrates that we can extract important storm predictors leading to extreme wind gusts, such as low-tropospheric humidity, and quantify their relative importance. Finally, the temporal variability encoded in the PCs reveals additional insights into the timing of the drying or moistening tendency relative to landfall, which increases the likelihood of strong gusts.

Figure 4. A visual representation of the learned weights for the 4 features in equation of our best $ \boldsymbol{Z} $ model: (a) $ {RH}_{max}^{1000} $ , (b) $ {RH}_{max}^{850} $ , (c) $ {RH}_{max}^{975} $ , and (d) $ {\Phi}_{std}^{500} $ . Comparing these weights across clusters to the time series of the four selected PC modes (e-f) highlights key temporal patterns in storm history that promote or suppress extreme winds after landfall in different regions.

5. Conclusion

We derived interpretable equations linking storm history to post-landfall extreme wind gusts through a PC-regression-based framework. By combining dimensionality reduction and temporal smoothing, we trained a hierarchy of equations from small datasets while mitigating overfitting via careful cross-validation, sequential feature selection, and Pareto optimization. Our results suggest that hierarchical modeling improves generalizability in small-sample settings, with the “Invariant Causal Prediction split” providing the greatest benefit for unseen test cases. Pareto optimization indicates that equations should contain at most 3–4 features to balance generalizability and predictive skill. While the best-performing $ {\boldsymbol{U}}_{\mathrm{gust}} $ model performs well globally (mean validation RMSE), spatial analysis reveals a west–east error structure, as the model prioritizes weak-wind regions at the expense of capturing gust variations in strong-wind regions. A GEV-informed nonlinear transformation reduces this spatial bias by accentuating distinctions in gust percentiles at the distribution tail. The discovered equations provide insights into how the timing of low-tropospheric drying is associated with wind gusts in Northwestern Europe, underscoring the importance of storm thermodynamic history in predicting post-landfall wind gusts. Future extensions could apply this framework to larger datasets and explore alternative interpretable modeling approaches, including nonlinear methods such as symbolic regression.

Supplementary material

The supplementary material for this article can be http://doi.org/10.1017/eds.2025.10008.

Open peer review

To view the open peer review materials for this article, please visit http://doi.org/10.1017/eds.2025.10008.

Acknowledgements

We are grateful for the technical assistance of M. Gomez and the advice of E. Koch.

Author contribution

All authors contributed to the conceptualization, methodology, writing, and approved the final submitted draft. Data curation and visualization: F.I.T., F.A.

Competing interests

The authors declare none.

Data availability statement

The released version of the Github repository containing code to reproduce the main figures of this work can be accessed through Zenodo and has been assigned the following DOI: 10.5281/zenodo.14766994. The ECMWF Windstorm Indicator Data can be downloaded from the Copernicus Windstorm Indicator data archive. ERA5 datasets were downloaded from the Copernicus website (multiple pressure levels as well as single pressure levels).

Ethical statement

The research meets all ethical guidelines, including adherence to the legal requirements of the study country.

Financial support

This research was supported by the Canton of Vaud, Switzerland, through the base funding of UNIL.

Footnotes

This research article was awarded Open Data and Open Materials badges for transparent practices. See the Data Availability Statement for details.

References

Arthur, D and Vassilvitskii, S (2007) k-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms. pp. 10271035.Google Scholar
Browning, KA, Smart, DJ, Clark, MR and Illingworth, AJ (2015) The role of evaporating showers in the transfer of sting-jet momentum to the surface. Quarterly Journal of the Royal Meteorological Society 141(693), 29562971.10.1002/qj.2581CrossRefGoogle Scholar
Brunton, SL, Proctor, JL and Kutz, JN (2016) Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences 113(15), 39323937.10.1073/pnas.1517384113CrossRefGoogle ScholarPubMed
Buriticá, G and Engelke, S (2024) Progression: An extrapolation principle for regression. Preprint, arXiv:2410.23246.Google Scholar
Cheng, S and Alkhalifah, T (2024) Discovery of physically interpretable wave equations. Preprint, arXiv:2404.17971.Google Scholar
Copernicus Climate Change Service, Climate Data Store (2022) Winter windstorm indicators for Europe from 1979 to 2021 derived from reanalysis. Copernicus Climate Change Service (C3S) Climate Data Store (CDS). https://doi.org/10.24381/cds.9b4ea013Google Scholar
Enas, GG and Choi, SC (1986) Choice of the smoothing parameter and efficiency of K-nearest neighbor classification. Statistical Methods of Discrimination and Classification, 235244.10.1016/B978-0-08-034000-5.50011-3CrossRefGoogle Scholar
Grundner, A, Beucler, T, Gentine, P and Eyring, V (2024) Data-driven equation discovery of a cloud cover parameterization. Journal of Advances in Modeling Earth Systems 16(3), e2023MS003763.10.1029/2023MS003763CrossRefGoogle Scholar
Häfner, D, Gemmrich, J and Jochum, M (2023) Machine-guided discovery of a real-world rogue wave model. Proceedings of the National Academy of Sciences 120(48), e2306275120. https://doi.org/10.1073/pnas.2306275120.CrossRefGoogle ScholarPubMed
Hilburn, KA (2023) Understanding spatial context in convolutional neural networks using explainable methods: Application to interpretable GREMLIN. Artificial Intelligence Earth Systems 2, 220093.10.1175/AIES-D-22-0093.1CrossRefGoogle Scholar
Jin, Y and Sendhoff, B (2008) Pareto-based multiobjective machine learning: An overview and case studies. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 38(3), 397415.Google Scholar
Jylhä, K, Tuomenvirta, H, Ruosteenoja, K, Niemi-Hugaerts, H, Keisu, K and Karhu, JA (2010) Observed and projected future shifts of climatic zones in Europe and their use to visualize climate change information. Weather, Climate, and Society 2, 148167.10.1175/2010WCAS1010.1CrossRefGoogle Scholar
Lagerquist, R, Allen, JT and McGovern, A (2022) Climatology and variability of warm and cold fronts over North America from 1979 to 2018. Journal of Climate 33, 65316554.10.1175/JCLI-D-19-0680.1CrossRefGoogle Scholar
Laurila, TK, Gregow, H, Cornér, J and Sinclair, VA (2021) Characteristics of extratropical cyclones and precursors to windstorms in northern Europe. Weather and Climate Dynamics 2(4), 11111130.10.5194/wcd-2-1111-2021CrossRefGoogle Scholar
Mamalakis, A, Barnes, EA and Ebert-Uphoff, I (2023) Carefully choose the baseline: Lessons learned from applying XAI attribution methods for regression tasks in geoscience. Artificial Intelligence for the Earth Systems 2, e220058. https://doi.org/10.1175/AIES-D-22-0058.1.CrossRefGoogle Scholar
Martínez-Alvarado, O, Gray, SL, Clark, PA and Baker, LH (2022) Objective detection of sting jets in low-resolution datasets. Meteorological Applications 20(1), 4155.10.1002/met.297CrossRefGoogle Scholar
Méndez, FJ, Menéndez, M, Luceño, A and Losada, IJ (2022) Analyzing monthly extreme sea levels with a time-dependent GEV model. Journal of Atmospheric and Oceanic Technology 24, 894911.10.1175/JTECH2009.1CrossRefGoogle Scholar
Pantillon, F, Lerch, S, Knippertz, P and Corsmeier, U (2018) Forecasting wind gusts in winter storms using a calibrated convection-permitting ensemble. Quarterly Journal of the Royal Meteorological Society 144(715), 18641881.10.1002/qj.3380CrossRefGoogle Scholar
Pedregosa, F, Varoquaux, G, Gramfort, A, Michel, V, Thirion, B, Grisel, O and Duchesnay, É (2011) Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research 12, 28252830.Google Scholar
Peters, J, Bühlmann, P and Meinshausen, N (2016) Causal inference by using invariant prediction: Identification and confidence intervals. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 78(5), 9471012. https://doi.org/10.1111/rssb.12167.CrossRefGoogle Scholar
Rantanen, M, Laurila, TK, Sinclair, V and Gregow, H (2021) Storm Aila: An unusually strong autumn storm in Finland. Weather 76(9), 306312.10.1002/wea.3943CrossRefGoogle Scholar
Schmidt, M and Lipson, H (2009) Distilling free-form natural laws from experimental data. Science 324(5923), 8185.10.1126/science.1165893CrossRefGoogle ScholarPubMed
Schulz, B and Lerch, S (2022) Machine learning methods for postprocessing ensemble forecasts of wind gusts: A systematic comparison. Monthly Weather Review 150, 235257.10.1175/MWR-D-21-0150.1CrossRefGoogle Scholar
Schwierz, C, Köllner-Heck, P, Zenklusen Mutter, E, Bresch, DN, Vidale, PL, Wild, M and Schär, C (2010) Modelling European winter wind storm losses in current and future climate. Climatic Change 101, 485514.10.1007/s10584-009-9712-1CrossRefGoogle Scholar
Song, W, Jiang, S, Camps-Valls, G, Williams, M, Zhang, L, Reichstein, M, Vereecken, H, He, L, Hu, X and Shi, L (2024) Towards data-driven discovery of governing equations in geosciences. Communications Earth & Environment 5(1), 589.10.1038/s43247-024-01760-6CrossRefGoogle Scholar
Sweet, L, Müller, C, Anand, M and Zscheischler, J (2023) Cross-validation strategy impacts the performance and interpretation of machine learning models. Artificial Intelligence for the Earth Systems 2, e230026.10.1175/AIES-D-23-0026.1CrossRefGoogle Scholar
Tam, FIH, Beucler, T and Ruppert, JH Jr (2024) Identifying three-dimensional radiative patterns associated with early tropical cyclone intensification, Journal of Advances in Modeling Earth Systems 16(12), e2024MS004401.Google Scholar
Wernli, H, Dirren, S, Liniger, MA and Zillig, M (2002) Dynamical aspects of the life cycle of the winter storm ‘Lothar’ (24–26 December 1999), Quarterly Journal of the Royal Meteorological Society 128, 405429.10.1256/003590002321042036CrossRefGoogle Scholar
Xie, XL and Beni, G (1991) A validity measure for fuzzy clustering. Transactions on Pattern Analysis and Machine Intelligence 13(08), 841847.10.1109/34.85677CrossRefGoogle Scholar
Figure 0

Figure 1. (a) We derive geographically contiguous regions of the European continent via K-means++ and kNN clustering to define maximum wind gust targets. (b) Extreme gust distributions are modeled using the GEV distribution, with shape (c), location (d), and scale (e) parameters fitted for each region.

Figure 1

Figure 2. (a) Training and (b) validation (complexity, error) plots for MLR models predicting $ {U}_{\mathrm{gust}} $, with features selected to minimize the maximum validation RMSE across seven randomly split cross-validation folds. Each colored symbol represents a model with different hyperparameters: cutoff frequency for smoothing is indicated by transparency levels, and retained feature variance by shape and color. The Pareto front (black dashed line in b) shows improved validation skills up to five unique features, suggesting that sparser linear models generalize better. In contrast, training RMSE decreases linearly with additional features, suggesting overfitting beyond 5 features. All models outperform the climatological baseline, which predicts the training mean $ {U}_{\mathrm{gust}} $ and is shown as a red dashed line.

Figure 2

Table 1. RMSE for the best models in the hierarchy, along with their unique feature count. We report the mean and the standard deviation (in parentheses) across feature selection methods and use bold font for the best trained, validation and test RMSE for all trained models (smallest mean RMSE and spread). Model performance is compared to the climatology mean baseline to assess skill.

Figure 3

Figure 3. Validation error maps for (a) the best $ {\boldsymbol{U}}_{\mathrm{gust}} $ model and (b) the best $ \boldsymbol{Z} $ model in physical units (m s$ {}^{-1} $). The nonlinear, GEV-informed transformation reduces the west–east error gradient in $ {\boldsymbol{U}}_{gust} $ and improves gust predictions in windstorm-prone Northwestern Europe.

Figure 4

Figure 4. A visual representation of the learned weights for the 4 features in equation of our best $ \boldsymbol{Z} $ model: (a) $ {RH}_{max}^{1000} $, (b) $ {RH}_{max}^{850} $, (c) $ {RH}_{max}^{975} $, and (d) $ {\Phi}_{std}^{500} $. Comparing these weights across clusters to the time series of the four selected PC modes (e-f) highlights key temporal patterns in storm history that promote or suppress extreme winds after landfall in different regions.

Supplementary material: File

Tam et al. supplementary material

Tam et al. supplementary material
Download Tam et al. supplementary material(File)
File 1.4 MB

Author comment: From winter storm thermodynamics to wind gust extremes: discovering interpretable equations from data — R0/PR1

Comments

Dear Editors of Environmental Data Sciences,

I am submitting our manuscript, “From Winter Storm Thermodynamics to Wind Gust Extremes: Discovering Interpretable Equations from Data,” for consideration for publication in Environmental Data Science (EDS).

Our work attempts to discover critical temporal patterns in environmental characteristics leading to extreme winds associated with European windstorms overland with data-driven methodologies. The discovery of such patterns facilitates scientific understanding and has implications for operational forecasting and disaster prevention. The proposed framework combines dimensionality reduction, novel data splitting strategies, and nonlinear target transformations with data-driven statistical learning models. Our results show that these novel additions to our model improve the generalizability of the equations to unseen cases, despite low-sample (<100) conditions. The discovered patterns are coherent and have well-defined physical meaning. As a result, we can identify the temporal change of boundary layer moisture as one of the leading factors to extreme wind generation.

Our manuscript is suitable for the journal as it showcases the essential steps in extracting useful equations from limited data on weather extremes. This has important implications for data-driven knowledge discovery in the weather and climate domains. The framework described in this manuscript can easily be adapted to different research topics, which should interest the broader readership of the EDS journal. Finally, our manuscript also introduces several novel mitigation strategies to improve the generalizability of data-driven models in low-sample conditions, which contributes to reliable equation discovery for various atmospheric science problems.

We look forward to hearing from you, and please do not hesitate to contact us should you have any questions.

Sincerely,

Frederick Iat-Hin Tam

Review: From winter storm thermodynamics to wind gust extremes: discovering interpretable equations from data — R0/PR2

Conflict of interest statement

Reviewer declares none.

Comments

1. Summary: In this section please explain in your own words what problem the paper addresses and what it contributes to solving it.

This study tackles the challenge of identifying and understanding the temporal precursors of extreme wind gusts associated with European winter storms, as accurately predicting these events is essential for early warning systems and risk mitigation.

The authors propose a data-driven approach that integrates Principal Component Analysis (PCA) to reduce the dimensionality of storm data and feature selection to extract key predictors. Extreme Value Theory (EVT) is applied to model the distribution of wind gusts, improving the ability to distinguish between moderate and extreme events. The study also compares two cross-validation methods—Random Split and Invariant Causal Prediction (ICP) Split—to assess model generalization. Additionally, Pareto optimization is used to balance model complexity and predictive performance.

The findings indicate that models with three to four key predictors achieve the optimal balance between accuracy and generalization. A drying low-troposphere before landfall is identified as a strong predictor of extreme wind gusts, supporting existing theories on storm-induced downward momentum transport. Models incorporating EVT transformations perform particularly well in distinguishing extreme gusts, especially in Northwestern Europe, where storms frequently make landfall. Additionally, EVT helps mitigate a west-east error gradient, reducing geographical bias and leading to more consistent predictions across different regions. The ICP cross-validation split enhances generalisation to unseen storms, outperforming simple random splits in test cases like Storm Lothar.

2. Please select a score of relevance to climate informatics which promotes the interdisciplinary research between climate science, data science, and computer science.

Highly relevant

3. Relevance and Impact: Is this paper a significant contribution to interdisciplinary climate informatics?

The study demonstrates the potential of machine learning in extreme weather prediction, addressing a previously unexplored gap in data-driven equation discovery.

The interpretable equations derived from the data align with existing literature, providing valuable insights for climate scientists. These insights may also be of significant use to policymakers in disaster risk management.

Given the increasing frequency of extreme weather events linked to climate change, this research has timely and far-reaching implications for climate informatics, meteorology, and emergency preparedness.

4. Overall recommendation of the submission.

Minor Revision: Borderline, require minor changes.

5. Detailed Comments

I appreciated the methodological rigor, particularly the use of two cross-validation techniques, as well as the decision to prioritize simplicity and interpretability while maintaining strong predictive performance. I found Figure 2 especially relevant in illustrating how you addressed both complexity and overfitting issues.

I have a few minor suggestions:

• Page 2, Line 37: I believe you meant Table S6? In that table, the formatting of units appears inconsistent (e.g., “m²”, “m^2”), and in the last row, I believe it should be “Total-Totals” instead.

• Page 3, Line 18: “Through trial and error, we find = 15 […]”—Since the Silhouette Score does not seem to justify your choice (as seen in Figure 1 of the Supplementary Material), perhaps another index (e.g., Davies-Bouldin or Calinski-Harabasz scores) could provide additional support for this decision?

• Figure 3 (Supplementary Material): Double-check for overlapping images and typos in the caption.

• Page 5, Line 40: “When retaining 95% of PC variance […]”—I believe you only discuss the case of 90% PC variance retained, not 95%.

• Figure 2: The cutoff levels are somewhat difficult to differentiate—perhaps adjusting the transparency levels or using more distinct colours could enhance readability?

• Page 6, Line 47: Are you sure Figure 1c is the correct reference for the error maps? Would Figure 3 be more appropriate?

6. Reviewer’s confidence

The reviewer’s evaluation is an educated guess (least confident)

Review: From winter storm thermodynamics to wind gust extremes: discovering interpretable equations from data — R0/PR3

Conflict of interest statement

Reviewer declares none.

Comments

1. Summary: In this section please explain in your own words what problem the paper addresses and what it contributes to solving it.

The paper described a hierarchical modeling framework to understand the wind gust extremes in Europe. This research specifically addresses the challenge in the limited sample size of historical observations for developing data-driven methods to study extreme events. The paper combines dimensionality reduction, cluster-based cross-validation, feature selection, and the GEV-based nonlinear transformation of maximum wind gusts data to minimize the overfitting issue for the limited sample size.

2. Please select a score of relevance to climate informatics which promotes the interdisciplinary research between climate science, data science, and computer science.

Highly relevant

3. Relevance and Impact: Is this paper a significant contribution to interdisciplinary climate informatics?

The research developed interpretable (multiple linear regression based) equations to study the extreme wind gusts in Europe, which is relevant to the community.

4. Overall recommendation of the submission.

Minor Revision: Borderline, require minor changes.

5. Detailed Comments

The research is well-designed and well-written. There are a few comments I have for the authors to address to improve the quality of the manuscript.

Data: The data used as the input feature include 28 environmental drivers from ERA5 reanalysis. It is important to include a description of these input features in the main text of the paper instead of the supplemental materials.

Data preprocessing: authors used geographical clustering efforts to solve the ill-posed issue and reduce the number of targets. Using the kNN clustering is relevant, but I am wondering if the authors have considered existing local climate zones that are predefined and have already taken into consideration various environmental factors.

Model hierarchy: For the low-pass rectangular filtering for the PC loading time series, how are the different frequencies selected in the experiment? Also, since the model has a feature selection step, does it matter to include the percentage of the variance retained in the PCs instead of just putting all PCs into the pool of features?

Evaluation: In Section 4.1, the authors show that the performance of the best models for Z with ICP split outperforms the best models for U on the test set. However, that is only for the mean values. The standard deviation of the Z models is often much larger than the U model. What is the explanation for this pattern?

I also want to acknowledge the authors' effort to interpret the model performance with physical-grounded explanations due to the simplicity of the model that is developed for this application.

6. Reviewer’s confidence

The reviewer has general research experience in the relevant field and is fairly confident for the evaluation

Recommendation: From winter storm thermodynamics to wind gust extremes: discovering interpretable equations from data — R0/PR4

Comments

This article was accepted into the Climate Informatics 2025 Conference after the authors addressed the comments in the reviews provided. It has been accepted for publication in Environmental Data Science on the strength of the Climate Informatics Review Process.

Decision: From winter storm thermodynamics to wind gust extremes: discovering interpretable equations from data — R0/PR5

Comments

No accompanying comment.