Impact Statement
Data-driven equation discovery is a growing field in scientific machine learning, yet its application to extreme events remains underexplored, particularly for small, noisy, multidimensional datasets. Using European extreme winter storm data, we demonstrate that careful cross-validation, nonlinear transformations informed by extreme value theory, and multi-objective optimization yield interpretable equations that generalize well across regions and unseen storms. This paves the way for objective knowledge discovery for weather extremes.
1. Introduction
Damaging windstorms associated with extratropical cyclones are among the leading weather-related disasters in the mid-latitudes winters, especially in Europe (Schwierz et al., Reference Schwierz, Köllner-Heck, Zenklusen Mutter, Bresch, Vidale, Wild and Schär2010). From a weather forecasting standpoint, early detection of reliable precursors leading to extreme windstorms helps with timely disaster mitigation. Beyond horizontal temperature differences strengthening storms (Laurila et al., Reference Laurila, Gregow, Cornér and Sinclair2021), studies have highlighted how lower-atmosphere humidity fuels wind jets (Martínez-Alvarado et al., Reference Martínez-Alvarado, Gray, Clark and Baker2022) and transports high-wind air toward the surface (Browning et al., Reference Browning, Smart, Clark and Illingworth2015), yet simple models linking winterstorm characteristics to wind gusts are lacking.
Interpretable machine learning has enabled trustworthy data-driven models of thunderstorms (Hilburn, Reference Hilburn2023) and tropical cyclones (Tam et al., Reference Tam, Beucler and Ruppert2024), improving process understanding by reducing the attribution uncertainty of post-hoc explainable AI methods (Mamalakis et al., Reference Mamalakis, Barnes and Ebert-Uphoff2023). This motivates us to apply data-driven methods to uncover novel temporal patterns in environmental precursors to extreme European windstorms. A key challenge is the limited availability of extreme storm data, which restricts the use of complex nonlinear models prone to overfitting in small-sample conditions. However, this limitation also presents an opportunity to derive a simple, interpretable equation approximating complex physical processes, paving the way for data-driven discovery.
Data-driven equation discovery distills simple laws from empirical data (Schmidt and Lipson, Reference Schmidt and Lipson2009) with methods ranging from sparse selection algorithms to symbolic regression (Song et al., Reference Song, Jiang, Camps-Valls, Williams, Zhang, Reichstein, Vereecken, He, Hu and Shi2024). While data-driven approaches have successfully identified physical laws from controlled fluid dynamics experiments (Brunton et al., Reference Brunton, Proctor and Kutz2016; Cheng and Alkhalifah, Reference Cheng and Alkhalifah2024), finding interpretable equations from noisy, multidimensional weather and climate data remains under-explored. Balancing model performance and algorithmic complexity, Grundner et al. (Reference Grundner, Beucler, Gentine and Eyring2024) uncovered simple analytical cloud cover equations from high-resolution model outputs. We similarly implement a hierarchical modeling framework to track the trade-off between the model complexity and generalization capability, identifying a set of sparse yet descriptive equations that achieves satisfactory prediction skills for unseen storm cases across European regions. Additionally, we explore the best strategy for dimensionality reduction and feature selection for reliable discovery of simple equations from small yet high-dimensional datasets.
2. Data
2.1. Raw datasets description
Winter windstorm tracks for Europe. The winter storm data analyzed in this study is based on the ECMWF Winter Windstorm Indicator dataset (C3S CDS, 2022). This dataset documents historical severe windstorms between 1979 and 2021, and their footprints. The tracking of storms is performed by applying a tracking algorithm on the ERA5 reanalysis data from October through March. The tracking algorithm tracks local maxima of 850 hPa relative vorticity, with a minimal 10 m wind speed threshold of 25 m s−1. A total of 118 storms is identified in the 42-year period. This study analyzes storms in the last 30 years (96 storms). Since we focus on the impact of severe windstorms over land in this study, short-lived or non-landfalling storms are filtered out, which leaves us with 63 storms.
ERA5 meteorological reanalysis. We use the hourly ERA5 reanalysis dataset with a horizontal resolution of 0.25°–0.25° to characterize the environmental conditions for all tracked storms. Our analysis considers 28 potential environmental drivers for wind gust formation and intensity (Schulz and Lerch, Reference Schulz and Lerch2022) listed in Supplementary Information Table 6. To find the driver conditions local to each storm, we adopt a system-following 4°–4° grid box centered around the storm track of all 28 environmental drivers. These drivers include measurements of the winterstorm kinematics (e.g., 10 m wind speeds), precipitation characteristics (e.g., mean precipitation rates), and the thermodynamic characteristics (e.g., latent heat flux; K-index) near the windstorms. A full list of the ERA5 variables and their units is provided in Supplementary Information Table 6.
2.2. Data preprocessing: dimensionality reduction and partitioning
Focusing on extreme events limits the training data for data-driven models. To ensure reliable equation discovery, we reduce the dimensionality of storm history and post-landfall gust data, turning our dataset into one that can be analyzed with simple data-driven models without overfitting.
Features: prescribed spatial dimensionality reduction.
To encode the 4D storm history fields as time series, we compute four spatial statistics: maximum (max), minimum (min), mean, and standard deviation (std). Using predefined statistics instead of learned filters facilitates physical interpretability. For example, a higher temperature standard deviation may indicate the presence of weather fronts (Lagerquist et al., Reference Lagerquist, Allen and McGovern2022). To further reduce dimensionality, we apply Principal Component Analysis (PCA) to compress time series into orthogonal temporal modes (
$ {\Pi}_X(t) $
) and PC loadings (
$ {\mathrm{PC}}_X $
), which are scalars denoting the projection of the time series onto
$ {\Pi}_X(t) $
:
$$ X(t)-\overline{X(t)}=\sum \limits_{i=1}^N{\mathrm{PC}}_{X,i}{\Pi}_{X,i}(t),\hskip1em \mathrm{where}\hskip1em {\left\langle {\Pi}_{X,i}(t)|X(t)\right\rangle}_X={\mathrm{PC}}_{X,i}. $$
Here,
$ N $
is the number of retained PC modes indexed by
$ i $
, and
$ {\left\langle \cdot |\cdot \right\rangle}_X $
denotes the inner product associated with the PCA decomposition of
$ X $
. The modes
$ {\Pi}_{X,i}(t) $
are normalized such that
$ {\left\langle {\Pi}_{X,i}(t)|{\Pi}_{X,j}(t)\right\rangle}_X={\delta}_{ij} $
where
$ {\delta}_{ij} $
is the Kronecker delta, ensuring orthonormality of the PC modes.
Targets: geographical clustering.
In a low-sample-size regime, predicting wind gusts for every pixel across Europe is impractical and difficult to interpret. We use K-means++ clustering (Arthur and Vassilvitskii, Reference Arthur and Vassilvitskii2007) to partition the landmass into
$ K $
subregions, incorporating elevation, latitude, longitude, and maximum gusts from 63 severe windstorms. This ensures clusters reflect windstorm gust characteristics while remaining spatially coherent. To enforce geographical contiguity, we apply k-nearest neighbors (kNN; Enas and Choi, Reference Enas and Choi1986), reassigning edge pixels based on the most frequent cluster label. Although silhouette scores suggest an optimal
$ K=4 $
–5 (Supplementary Figure S1), these clusters group disjointed regions with distinct wind gust characteristics (Supplementary Figure S2). We therefore consider three additional clustering metrics (e.g., Xie and Beni, Reference Xie and Beni1991), which identify local optima at
$ K=9 $
, 14, and 15 (Supplementary Figure S3). Among these,
$ K=15 $
best preserves geographical compactness and aligns with established European climatic zones (Jylhä et al., Reference Jylhä, Tuomenvirta, Ruosteenoja, Niemi-Hugaerts, Keisu and Karhu2010) (Supplementary Figure S4). With clusters defined, we extract, for each storm, the highest instantaneous 10 m wind gust within each cluster during the 15 hours post-landfall. This window ensures short-lived but intense storms like Lothar (Wernli et al., Reference Wernli, Dirren, Liniger and Zillig2002) are included. Our target is the maximum 15-hour wind gust within each cluster (
$ {U}_{\mathrm{gust},i} $
), where
$ i $
indexes the 15 clusters.
Cross-validation procedure.
To ensure data-driven models select gust predictors applicable to severe windstorms, we always leave out the same seven storms for testing. The test set is randomly chosen, except for the Lothar storm, which is explicitly withheld to assess performance on an extreme, unseen case. The remaining 56 storms are split into 7 training-validation folds by randomly selecting 7 sets of 8 storms without replacement for validation (“Random split”). Alternatively, as a step toward invariant causal prediction (Peters et al., Reference Peters, Bühlmann and Meinshausen2016), we identify features robust to distributional shifts by clustering the output vector
$ {\mathbf{U}}_{\mathbf{gust}} $
into seven groups via K-means++ (Supplementary Table S2). We then train on six clusters and validate on the remaining one, repeating this process seven times (“ICP split”), following Häfner et al. (Reference Häfner, Gemmrich and Jochum2023). Sweet et al. (Reference Sweet, Müller, Anand and Zscheischler2023) show that such an “ICP split” outperforms naive k-fold cross-validation for regression.
2.3. Extreme value theory for wind gust distributions
Assessing wind gust severity relative to local climatology and distinguishing moderate from extreme gusts are key to risk assessment. We describe extreme gusts using the Generalized Extreme Value (GEV) distribution, a family of continuous distributions designed for block maxima such as extreme sea levels (Méndez et al., Reference Méndez, Menéndez, Luceño and Losada2022). The cumulative distribution function (CDF) and its functional inverse (quantile function) are
$$ {G}_{\mu, \sigma, \xi }(x)=\left\{\begin{array}{cc}\exp \left[-{\left(1+\xi \frac{x-\mu }{\sigma}\right)}^{-\frac{1}{\xi }}\right],& \xi \ne 0,\\ {}\exp \left[-\exp \left(-\frac{x-\mu }{\sigma}\right)\right],& \xi =0,\end{array}\right.\hskip1em {G}_{\mu, \sigma, \xi}^{-1}(p)=\left\{\begin{array}{cc}\mu +\frac{\sigma }{\xi}\left[{\left(-\ln p\right)}^{-\xi }-1\right],& \xi \ne 0,\\ {}\mu -\sigma \ln \left(-\ln p\right),& \xi =0.\end{array}\right. $$
where
$ \mu \in \mathrm{\mathbb{R}} $
(location),
$ \sigma \in {\mathrm{\mathbb{R}}}_{+}^{\ast } $
(scale), and
$ \xi \in \mathrm{\mathbb{R}} $
(shape). Unlike empirical CDFs, the GEV enables extrapolation beyond observed data to estimate return periods for extreme events. Due to the high spatial variability in land surface and topography across Europe, we fit a separate GEV for each cluster (Figure 1c–e; Supplementary Table S1). After testing multiple definitions and durations for block maxima sampling, we opted for daily maximum gusts over all cluster grid points during winter months (Supplementary Figure S5). This approach differentiates regions frequently impacted by strong gusts from those less affected (Figure 1a,d) and allows modeling of unseen extremes, which empirical distributions cannot capture (Figure 1b).

Figure 1. (a) We derive geographically contiguous regions of the European continent via K-means++ and kNN clustering to define maximum wind gust targets. (b) Extreme gust distributions are modeled using the GEV distribution, with shape (c), location (d), and scale (e) parameters fitted for each region.
3. Methodology
Feature selection.
We aim to derive interpretable equations linking storm characteristics to post-landfall gusts. Even with prescribed spatial statistics and PCA for dimensionality reduction, the number of features (28 predictors
$ \times $
4 spatial statistics
$ \times $
1–12 PC loadings) still far exceeds the number of training and validation samples (56). To address this underdetermined problem, we apply sequential forward feature selection, minimizing the mean or maximum validation root-mean-squared error (RMSE) across our seven cross-validation splits. Features are added until the RMSE objective no longer decreases.
Model hierarchy.
We construct a hierarchy of data-driven models expressible as simple equations by training multiple linear regressions (MLR) with PC loadings as features while varying three hyperparameters: feature smoothing, retained feature variance, and target transformation. For feature smoothing, we apply a low-pass rectangular filter to the PC loading time series, testing cutoff frequencies from 0.52 rad/hr to 1.92 rad/hr. For retained variance, we set four thresholds to determine the number of retained PC components per predictor: [75%, 80%, 85%, 90%]. The number of retained PC components per predictor acts as a regularization hyperparameter controlling the predictor variance fed to the ML framework. Finally, we explore a nonlinear, data-adaptive transformation of maximum wind gusts to encourage out-of-distribution generalization for extremes (Buriticá and Engelke, Reference Buriticá and Engelke2024). For each sample, we first obtain its percentile value from the CDF fitted in each region, then apply a nonlinear transformation to better distinguish extreme values in the GEV distribution tails:
$ {Z}_i=-\ln \left[1-{\mathrm{CDF}}_i\left({U}_{\mathrm{gust},\mathrm{i}}\right)\right] $
, where
$ Z $
is the transformed target, and
$ {\mathrm{CDF}}_i $
is the cumulative distribution function of the GEV fitted in region
$ i $
. All MLRs are trained using the least squares algorithms from the Scikit-Learn Python package (Pedregosa et al., Reference Pedregosa, Varoquaux, Gramfort, Michel, Thirion, Grisel and Duchesnay2011).
Pareto optimality for equation discovery. Balancing performance and complexity is a multi-objective optimization task that can be addressed through Pareto optimization (Jin and Sendhoff, Reference Jin and Sendhoff2008). Following sequential feature selection, we select optimal models from the empirical “Pareto front,” corresponding to the lowest error achieved for a given complexity before the mean (or max) validation RMSE across seven cross-validation splits plateaus. We define complexity as the number of unique features (predictor + spatial statistic) selected, as this better reflects equation simplicity than, for instance, the number of trainable parameters.
Model evaluation.
Pareto optimization seeks models with the lowest error while maintaining simplicity. We evaluate model performance using the mean validation RMSE across seven folds and measure complexity by the number of selected unique features (predictor + spatial statistic). Using the max validation RMSE yields different Pareto-optimal models, which are also evaluated. For models predicting
$ \mathbf{Z} $
, we convert predictions back to wind gust speed in physical units using the per-cluster GEV quantile function (
$ {G}_{\mu, \sigma, \xi}^{-1} $
) before computing RMSE, which ensures consistency across model hierarchies.
4. Results
4.1. Pareto-optimal models for wind gust prediction
We explore the (complexity, error) space of trained MLRs by varying hyperparameters—feature smoothing, retained variance, and target transformation—as well as the cross-validation strategy (mean or max validation RMSE). This allows us to identify Pareto-optimal maximum wind gust equations in low sample size conditions, forming the empirical Pareto front (dashed black line, Figure 2b). When features are selected by minimizing the max validation RMSE, the minimum achievable RMSE stabilizes after five selected unique variables, suggesting that additional variables may not improve generalization to unseen cases. The steady decline in validation RMSE beyond five features sets an upper complexity limit, beyond which models lose their ability to generalize across regions and storms with different gust spatial patterns. Additionally, the Pareto front (Figure 2b) indicates that models perform best when retaining 90% of PC variance and applying moderate smoothing (removing oscillations with frequency
$ >1.5 $
rad/hr). These hyperparameters yield features most likely to generalize well across all storm cases.

Figure 2. (a) Training and (b) validation (complexity, error) plots for MLR models predicting
$ {U}_{\mathrm{gust}} $
, with features selected to minimize the maximum validation RMSE across seven randomly split cross-validation folds. Each colored symbol represents a model with different hyperparameters: cutoff frequency for smoothing is indicated by transparency levels, and retained feature variance by shape and color. The Pareto front (black dashed line in b) shows improved validation skills up to five unique features, suggesting that sparser linear models generalize better. In contrast, training RMSE decreases linearly with additional features, suggesting overfitting beyond 5 features. All models outperform the climatological baseline, which predicts the training mean
$ {U}_{\mathrm{gust}} $
and is shown as a red dashed line.
As expected, the performance table for the best sparse models from different feature selection methods (Table 1) shows that models targeting the transformed gusts (
$ \boldsymbol{Z} $
) generally underperform models directly targeting
$ {\boldsymbol{U}}_{\mathrm{gust}} $
on the validation set when RMSE is measured in
$ {\boldsymbol{U}}_{\mathrm{gust}} $
units. Additionally, cross-validation using the “ICP split” yields no validation RMSE improvement over naive random splitting. However, generalizable equations must perform robustly on unseen storms. While the “ICP split” sacrifices some training and validation skill, it successfully identifies sparse equations that generalize well to extreme cases like Storm Lothar, as evidenced by its low test RMSE. In contrast, equations discovered using random splits consistently perform worse on test sets, suggesting implicit overfitting to validation sets by selecting variables optimized for specific gust spatial distributions. The smaller error spread of
$ {\boldsymbol{U}}_{\mathrm{gust}} $
models confirms their tendency to select sparse equations suboptimal for unseen cases, regardless of hyperparameters. Thus, splitting storms by their wind gust spatial distribution for cross-validation (“ICP split”), transforming the target using extreme value theory, and leveraging this information for feature selection leads to more reliable models, revealing features that better anticipate unseen cases.
Table 1. RMSE for the best models in the hierarchy, along with their unique feature count. We report the mean and the standard deviation (in parentheses) across feature selection methods and use bold font for the best trained, validation and test RMSE for all trained models (smallest mean RMSE and spread). Model performance is compared to the climatology mean baseline to assess skill.

4.2. Reducing geographical bias through GEV-informed wind gust transformations
Why do the
$ \boldsymbol{Z} $
models outperform the
$ {\boldsymbol{U}}_{\mathrm{gust}} $
models on the test set while underperforming on training and validation sets? To address this, we examine whether this performance gap varies across geographical regions. We compare the best
$ {\boldsymbol{U}}_{\mathrm{gust}} $
and
$ \boldsymbol{Z} $
models trained with the “ICP split” feature selection method to assess regional differences in model skill. The
$ {\boldsymbol{U}}_{\mathrm{gust}} $
model exhibits a distinct west–east RMSE gradient (Figure 3a), with higher errors in Northwestern Europe, where severe windstorms frequently make landfall. This suggests that the model prioritizes inland clusters with weaker winds, potentially at the expense of distinguishing differences in strong coastal gusts (Figure 3a). It may also struggle to anticipate rare inland windstorms, such as Storm Aila (2020) (Rantanen et al., Reference Rantanen, Laurila, Sinclair and Gregow2021). Therefore, the main difference between the
$ {\boldsymbol{U}}_{\mathrm{gust}} $
and
$ \boldsymbol{Z} $
error maps (Figure 3) is that the nonlinear gust transformation imposes stronger spatial regularization. The
$ \boldsymbol{Z} $
model shows decreased skill in Eastern Europe and the Balkans but improved performance in Northwestern Europe, suggesting reduced geographical bias. These results indicate that the GEV-informed gust transformation mitigates regional biases in MLR predictions, leading to a fairer model despite its slightly higher overall RMSE compared to
$ {\boldsymbol{U}}_{\mathrm{gust}} $
MLRs.

Figure 3. Validation error maps for (a) the best
$ {\boldsymbol{U}}_{\mathrm{gust}} $
model and (b) the best
$ \boldsymbol{Z} $
model in physical units (m s
$ {}^{-1} $
). The nonlinear, GEV-informed transformation reduces the west–east error gradient in
$ {\boldsymbol{U}}_{gust} $
and improves gust predictions in windstorm-prone Northwestern Europe.
4.3. Leveraging interpretable, pareto-optimal equations for scientific insight
Both the best 4-feature
$ {\boldsymbol{U}}_{\mathbf{gust}} $
and
$ \boldsymbol{Z} $
models are linear functions of the standardized PC loadings, defined as
$ {\overset{\sim }{\mathrm{PC}}}_{X,j}=\frac{{\mathrm{PC}}_{X,j}-{\mathrm{PC}}_{X,j}^{\mathrm{mean}}}{{\mathrm{PC}}_{X,j}^{\mathrm{std}}} $
, where the mean and standard deviation are calculated over the training set and
$ j $
indexes the selected features. After fitting, these standardized PC loadings are multiplied by region-specific weights
$ {\boldsymbol{a}}_{\boldsymbol{X}}\in {\mathrm{\mathbb{R}}}^{15\times 4} $
and then added to biases
$ \boldsymbol{b}\in {\mathrm{\mathbb{R}}}^{15} $
for each of the 15 regions of interest. Using the inner product from Eq. 1, we express the models as scaled projections of winterstorm fields onto data-driven temporal patterns
$ {\Pi}_{X,j}(t) $
. The equation for the
$ {\boldsymbol{U}}_{\mathrm{gust}} $
model is:
$$ {\boldsymbol{U}}_{\mathbf{gust}}^{\boldsymbol{U}}={\boldsymbol{\beta}}^{\boldsymbol{U}}+\sum \limits_{j=1}^4{\boldsymbol{\alpha}}_{\boldsymbol{X},\hskip0.35em \boldsymbol{j}}^{\boldsymbol{U}}{\left\langle {\Pi}_{X,j}(t)|{X}_j(t)\right\rangle}_{X_j}\;\mathrm{where}:\hskip1em {\boldsymbol{\alpha}}_{\boldsymbol{X},\hskip0.35em \boldsymbol{j}}^{\boldsymbol{U}}=\frac{{\boldsymbol{a}}_{\boldsymbol{X},\hskip0.35em \boldsymbol{j}}^{\boldsymbol{U}}}{{\mathrm{PC}}_{X,j}^{\mathrm{std}}},{\boldsymbol{\beta}}^{\boldsymbol{U}}={\boldsymbol{b}}^{\boldsymbol{U}}-\sum \limits_{j=1}^4\frac{{\mathrm{PC}}_{X,j}^{\mathrm{mean}}}{{\mathrm{PC}}_{X,j}^{\mathrm{std}}}\mathbf{1}. $$
For the
$ \boldsymbol{Z} $
model, using the definition of
$ Z $
and the GEV quantile function from Eq. 2, we obtain:
$$ {\boldsymbol{U}}_{\mathbf{gust}}^{\boldsymbol{Z}}={\mathbf{G}}_{\boldsymbol{\mu}, \boldsymbol{\sigma}, \boldsymbol{\xi}}^{-1}\left(1-\exp \left[-{\boldsymbol{\beta}}^{\boldsymbol{Z}}-\sum \limits_{j=1}^4{\boldsymbol{\alpha}}_{\boldsymbol{X},\hskip0.35em \boldsymbol{j}}^{\boldsymbol{Z}}{\left\langle {\Pi}_{X,\hskip0.35em j}(t)|{X}_j(t)\right\rangle}_{X_j}\right]\right)\;\mathrm{with}\left({\boldsymbol{\alpha}}_{\boldsymbol{X},\hskip0.35em \boldsymbol{j}}^{\boldsymbol{Z}},{\boldsymbol{\beta}}^{\boldsymbol{Z}}\right)\;\mathrm{similarly}\ \mathrm{defined}. $$
$ {\boldsymbol{U}}_{\mathrm{gust}}^{\boldsymbol{U}} $
and
$ {\boldsymbol{U}}_{\mathrm{gust}}^{\boldsymbol{Z}} $
are the predicted maximum wind gusts for each region using the direct and transformed models, respectively;
$ {\boldsymbol{b}}^{\boldsymbol{U}},{\boldsymbol{b}}^{\boldsymbol{Z}}\in {\mathrm{\mathbb{R}}}^{15} $
are region-specific biases;
$ {\boldsymbol{a}}_{\boldsymbol{X},\hskip0.35em \boldsymbol{j}}^{\boldsymbol{U}},{\boldsymbol{a}}_{\boldsymbol{X},\hskip0.35em \boldsymbol{j}}^{\boldsymbol{Z}}\in {\mathrm{\mathbb{R}}}^{15} $
are region-specific regression weights for the selected PC loadings;
$ {\Pi}_{X,j}(t) $
are the data-driven temporal patterns associated with each winterstorm feature;
$ {\left\langle {\Pi}_{X,j}(t)|{X}_j(t)\right\rangle}_{X_j} $
represents the projection of the storm field
$ {X}_j(t) $
onto the temporal mode
$ {\Pi}_{X,j}(t) $
;
$ {\boldsymbol{G}}_{\boldsymbol{\mu}, \boldsymbol{\sigma}, \boldsymbol{\xi}}^{-1} $
is the GEV quantile function, mapping transformed values back to physical wind gust units; and
$ \mathbf{1}\in {\mathrm{\mathbb{R}}}^{15} $
is the identity vector (a column vector of ones).
For a region
$ i $
and a feature
$ j $
, models predict higher
$ {U}_{\mathrm{gust},i} $
when the product of
$ {a}_{X,j,i} $
and
$ {\overset{\sim }{PC}}_{X,j} $
is positive, and vice versa. Assuming
$ {a}_{X,j,i}>0 $
, time series that project positively onto the PC eigenvectors lead to higher wind gust predictions. The regression coefficients
$ {\boldsymbol{a}}_{\boldsymbol{X},\boldsymbol{i}} $
in the best
$ \boldsymbol{Z} $
model exhibits significant geographical variability (Figure 4a-d), indicating that the influence of temporal patterns on gusts varies across regions. For example, two low-tropospheric humidity PCs in the
$ \boldsymbol{Z} $
model are positively correlated with gusts in region 4 (France and the Netherlands; Figure 4a-b), while the other two PC terms are negatively correlated (Figure 4c-d). The temporal evolution of the first PC mode of
$ {RH}_{\mathrm{max}}^{850} $
and the third PC mode of
$ {RH}_{\mathrm{max}}^{975} $
(Figure 4e) suggests that higher gusts in France occur when drying persists within or at the top of the boundary layer (850 hPa) before storm landfall. A drier lower troposphere promotes the descent of high-speed winds via evaporation (Browning et al., Reference Browning, Smart, Clark and Illingworth2015) or through downward mixing (Pantillon et al., Reference Pantillon, Lerch, Knippertz and Corsmeier2018). In contrast, a larger standard deviation of 500 hPa geopotential height (
$ \Phi $
) before landfall (Figure 4f) may indicate more perturbed west–east winds, which can intensify winterstorms. To summarize, Figure 4 demonstrates that we can extract important storm predictors leading to extreme wind gusts, such as low-tropospheric humidity, and quantify their relative importance. Finally, the temporal variability encoded in the PCs reveals additional insights into the timing of the drying or moistening tendency relative to landfall, which increases the likelihood of strong gusts.

Figure 4. A visual representation of the learned weights for the 4 features in equation of our best
$ \boldsymbol{Z} $
model: (a)
$ {RH}_{max}^{1000} $
, (b)
$ {RH}_{max}^{850} $
, (c)
$ {RH}_{max}^{975} $
, and (d)
$ {\Phi}_{std}^{500} $
. Comparing these weights across clusters to the time series of the four selected PC modes (e-f) highlights key temporal patterns in storm history that promote or suppress extreme winds after landfall in different regions.
5. Conclusion
We derived interpretable equations linking storm history to post-landfall extreme wind gusts through a PC-regression-based framework. By combining dimensionality reduction and temporal smoothing, we trained a hierarchy of equations from small datasets while mitigating overfitting via careful cross-validation, sequential feature selection, and Pareto optimization. Our results suggest that hierarchical modeling improves generalizability in small-sample settings, with the “Invariant Causal Prediction split” providing the greatest benefit for unseen test cases. Pareto optimization indicates that equations should contain at most 3–4 features to balance generalizability and predictive skill. While the best-performing
$ {\boldsymbol{U}}_{\mathrm{gust}} $
model performs well globally (mean validation RMSE), spatial analysis reveals a west–east error structure, as the model prioritizes weak-wind regions at the expense of capturing gust variations in strong-wind regions. A GEV-informed nonlinear transformation reduces this spatial bias by accentuating distinctions in gust percentiles at the distribution tail. The discovered equations provide insights into how the timing of low-tropospheric drying is associated with wind gusts in Northwestern Europe, underscoring the importance of storm thermodynamic history in predicting post-landfall wind gusts. Future extensions could apply this framework to larger datasets and explore alternative interpretable modeling approaches, including nonlinear methods such as symbolic regression.
Supplementary material
The supplementary material for this article can be http://doi.org/10.1017/eds.2025.10008.
Open peer review
To view the open peer review materials for this article, please visit http://doi.org/10.1017/eds.2025.10008.
Acknowledgements
We are grateful for the technical assistance of M. Gomez and the advice of E. Koch.
Author contribution
All authors contributed to the conceptualization, methodology, writing, and approved the final submitted draft. Data curation and visualization: F.I.T., F.A.
Competing interests
The authors declare none.
Data availability statement
The released version of the Github repository containing code to reproduce the main figures of this work can be accessed through Zenodo and has been assigned the following DOI: 10.5281/zenodo.14766994. The ECMWF Windstorm Indicator Data can be downloaded from the Copernicus Windstorm Indicator data archive. ERA5 datasets were downloaded from the Copernicus website (multiple pressure levels as well as single pressure levels).
Ethical statement
The research meets all ethical guidelines, including adherence to the legal requirements of the study country.
Financial support
This research was supported by the Canton of Vaud, Switzerland, through the base funding of UNIL.











Comments
Dear Editors of Environmental Data Sciences,
I am submitting our manuscript, “From Winter Storm Thermodynamics to Wind Gust Extremes: Discovering Interpretable Equations from Data,” for consideration for publication in Environmental Data Science (EDS).
Our work attempts to discover critical temporal patterns in environmental characteristics leading to extreme winds associated with European windstorms overland with data-driven methodologies. The discovery of such patterns facilitates scientific understanding and has implications for operational forecasting and disaster prevention. The proposed framework combines dimensionality reduction, novel data splitting strategies, and nonlinear target transformations with data-driven statistical learning models. Our results show that these novel additions to our model improve the generalizability of the equations to unseen cases, despite low-sample (<100) conditions. The discovered patterns are coherent and have well-defined physical meaning. As a result, we can identify the temporal change of boundary layer moisture as one of the leading factors to extreme wind generation.
Our manuscript is suitable for the journal as it showcases the essential steps in extracting useful equations from limited data on weather extremes. This has important implications for data-driven knowledge discovery in the weather and climate domains. The framework described in this manuscript can easily be adapted to different research topics, which should interest the broader readership of the EDS journal. Finally, our manuscript also introduces several novel mitigation strategies to improve the generalizability of data-driven models in low-sample conditions, which contributes to reliable equation discovery for various atmospheric science problems.
We look forward to hearing from you, and please do not hesitate to contact us should you have any questions.
Sincerely,
Frederick Iat-Hin Tam