1. Introduction
A distortion risk measures
$\rho$
, such as value-at-risk (VaR) and expected shortfall (ES) – also known as tail value-at-risk (TVaR) or conditional value-at-risk (CVaR) – is one of important tool used in insurance and finance for assessing the risk level of a loss random variable or a risk
$X$
faced by a decision-maker. In practice, the ‘true’ distribution of
$X$
is often unknown or uncertain, and the possible distributions of
$X$
are often assumed to lie within a set of distributions
${\mathcal S}$
, referred to as an uncertainty set for
$X$
. When the distribution of
$X$
is uncertain, a decision-maker is interested in the supremum of the risk measure
$\rho(X)$
over the uncertainty set
${\mathcal S}$
, formulated as

where
$X^F$
means that the distribution of the random variable
$X$
is
$F$
. The value of
$\sup_{F \in \mathcal S} \rho(X^F)$
is called the worst-case distortion risk measure for
$X$
. If there exists a distribution
$F^* \in \mathcal S$
such that
$\sup_{F \in \mathcal S} \rho(X^F) = \rho(X^{F^*})$
, then this distribution
$F^*$
is referred to as the worst-case distribution of
$X$
and represents the worst scenario for the loss or risk. Problem (1.1) and its applications have been well studied in the literature of insurance, finance, operations research, and many other fields. We refer to Hürlimann (Reference Hürlimann2002), Zhu and Fukushima (Reference Zhu and Fukushima2009), Cornilly et al. (Reference Cornilly, Rüschendorf and Vanduffel2018), Li (Reference Li2018), Li et al. (Reference Li, Shao, Wang and Yang2018), Liu et al. (Reference Liu, Cai, Lemieux and Wang2020), Bernard et al. (Reference Bernard, Pesenti and Vanduffel2024), Cai et al. (Reference Cai, Li and Mao2025), and references therein for solutions to problem (1.1) under various uncertainty sets
${\mathcal S}$
and risk measures
$\rho$
.
In insurance, finance, operations research, and many other fields, when dealing an underlying risk
$X$
, a decision-maker often needs to investigate the distribution of its transform
$\ell(X)$
and measure the risk level of
$\ell(X)$
, where
$\ell$
is a function. If the distribution of
$X$
is known, the distribution of
$\ell(X)$
is also certain. However, if the distribution of
$X$
is unknown and belongs to an uncertainty set
${\mathcal S}$
, then, in general, the distribution of
$\ell(X)$
is uncertain, and the possible distributions of
$\ell(X)$
may not have the same characteristics or structures as the uncertainty set
${\mathcal S}$
for
$X$
. Consequently, solutions to the problem (1.1) cannot be directly used to determine the worst-case risk measure
$\rho(\ell(X))$
over an uncertainty set
${\mathcal S}$
and the corresponding worst-case distributions (see Cai et al., Reference Cai, Liu and Yin2024 regarding this issue). Therefore, one is interested in the following optimization problem:

When
$\ell(x)=(x-d)_+$
and the uncertainty set
${\mathcal S}$
is defined by information on the mean, variance, or a finite number of moments of
$X$
, and the risk measure
$\rho$
is the expectation
$\mathbb{E}$
, the problem (1.2), or the optimization problem
$ \sup_{F \in {\mathcal S}} \mathbb{E}[(X^F-d)_+] $
, and its applications have been well studied in the literature. See, for instance, Jagannathan (Reference Jagannathan1977), De Vylder and Goovaerts (Reference De Vylder and Goovaerts1982), Kaas and Goovaerts (Reference Kaas and Goovaerts1986), Jansen et al. (Reference Jansen, Haezendonck and Goovaerts1986), Zuluaga et al. (Reference Zuluaga, Peña and Du2009), and references therein. Furthermore, Chen et al. (Reference Chen, He and Zhang2011) provide an analytical expression for
$ \sup_{F \in {\mathcal S}} \mathbb{E}[\ell(X^F)]$
when
$\ell(x)=(x-d)^2_+$
and only the first two moments of
$X$
are known. Tang and Yang (Reference Tang and Yang2023) investigate the worst-case value of
$ \sup_{F \in {\mathcal S}} \mathbb{E}[\ell(X^F)]$
when the distribution of
$X$
lies ‘partially’ in a Wasserstein ball and is ‘partially’ known, with
$\ell(x)=x^m$
or
$(x-d)^m_+$
. Recently, Cai et al. (Reference Cai, Liu and Yin2024) explore the problem (1.2) when
$\rho$
is a distortion risk measure and the set
$ {\mathcal S}$
includes distributions within a Wasserstein ball that satisfy certain constraints on the first two moments.
The studies of the problem (1.2) can further help researchers to investigate robust optimal insurance and reinsurance models under worst-case scenarios. These models aim to determine the optimal insurance and reinsurance policies that minimize the worst-case risk measure values of the risk exposures faced by insurers and reinsurers. For robust optimal insurance and reinsurance models, we refer to Hu et al. (Reference Hu, Yang and Zhang2015), Asimit et al. (Reference Asimit, Bignozzi, Cheung, Hu and Kim2017), Asimit et al. (Reference Asimit, Hu and Xie2019), Birghila and Pflug (Reference Birghila and Pflug2019), Liu and Mao (Reference Liu and Mao2022), Xie et al. (Reference Xie, Liu, Mao and Zhu2023), Landriault et al. (Reference Landriault, Liu and Shi2025), and references therein.
In this paper, we model distribution uncertainty using a Wasserstein ball, assuming that the distribution of a risk
$X$
is uncertain and belongs to the following uncertainty set:

where
$ k \geqslant 1$
,
$ \varepsilon \geqslant 0$
,
$ \hat{F}$
is a reference distribution, and
$W_k$
is the Wasserstein distance with order
$k$
for distributions, as defined in Definition 2.2. We then study the worst-case distortion risk measure of a transform
$\ell (X)$
over the set
$ \mathcal S (\hat{F};\, k, \varepsilon)$
, formulated as

Here,
$\ell(x)$
can represent the stop-loss transform
$ \ell_d (x) = (x-d)_+$
, the limited loss transform
$\ell^m (x) = x \wedge m$
, or the limited stop-loss transform
$ \ell_d^m(x) = \min\{(x-d)_+, \, m\}.$
These transforms are widely used in insurance and finance applications. In particular, the stop-loss transform and the limited stop-loss transform are optimal indemnity or compensation functions for many insurance and reinsurance designs. For applications of these transforms in insurance and reinsurance, we direct readers to Albrecher et al. (Reference Albrecher, Beirlant and Teugels2017), Klugman et al. (Reference Klugman, Panjer and Willmot2019), Cai and Chi (Reference Cai and Chi2020), and the references therein.
The set
$ \mathcal S ( \hat{F}, k, \varepsilon)$
defined in (1.3) is a popular uncertainty sets used in modelling distribution uncertainty. The reference distribution
$\hat{F}$
typically is an estimate of
$F$
, such as the empirical distribution or a specified distribution fitted to the observed data of
$X$
. It is important to note that the Wasserstein distance used in Bernard et al. (Reference Bernard, Pesenti and Vanduffel2024) and Cai et al. (Reference Cai, Liu and Yin2024) is a second-order Wasserstein distance. In this paper, the Wasserstein distance is a general
$k$
-order Wasserstein distance
$(k \geqslant 1)$
. With this generalization, we solve the optimization problem (1.4) when
$\rho$
is a spectral risk measure. We would like to emphasize that the results of Bernard et al. (Reference Bernard, Pesenti and Vanduffel2024) and Cai et al. (Reference Cai, Liu and Yin2024) cannot be directly applied to problem (1.4). Furthermore, we point out that the mean-variance-constrained distribution set
$ {\mathcal S}(\mu,\sigma)\triangleq \big \{ F\,:\, \mathbb{E} [X^F ] = \mu, \, \mathrm{var} (X^F) =\sigma^2 \big \}$
can be recovered from the constrained Wasserstein ball
$ {\mathcal S}( \hat{F};\, k, \varepsilon,\mu,\sigma) \triangleq \big \{ F\,:\, W_k( F, \hat{F}) \leqslant \varepsilon, \,\, \mathbb{E} [X^F ] = \mu, \, \mathrm{var} (X^F) =\sigma^2 \big\}$
, which was considered in Bernard et al. (Reference Bernard, Pesenti and Vanduffel2024) and Cai et al. (Reference Cai, Liu and Yin2024) when
$k=2$
, by taking the limit
$\varepsilon \to \infty$
. On the other hand, as shown in the Proof of Lemma B.6 in Bernard et al. (Reference Bernard, Pesenti and Vanduffel2024), for
$k=2$
, one has
${\mathcal S}( \hat{F};\, 2, \varepsilon)= \big \{ G \in \mathcal{S} (\hat{F} ;\, 2, \varepsilon, \mu,\sigma) \,:$
$(\hat{\mu} - \mu)^2 + (\hat{ \sigma} - \sigma)^2 \leqslant \varepsilon^2 , \sigma \gt 0 \big \} .$
The rest of this paper is organized as follows. Section 2 introduces the notations and definitions used in this study. Section 3 solves the problem (1.4) when the transform is the stop-loss transform, the limited loss transform, and the limited stop-loss transform, respectively. Section 4 discusses the applications of the results obtained in Section 3 to limited stop-loss reinsurance premiums under distribution uncertainty. Finally, Section 5 presents concluding remarks, and the proofs of the main theorems are provided in the appendix.
2. Preliminaries
We assume that all random variables considered in this paper are defined on an atomless probability space
$(\Omega, \mathcal F, \mathbb{P})$
, ensuring all considered distributions exist. Furthermore, we use
$L^p$
for
$p \geqslant 1 $
to denote the set of all integrable random variables on the space
$(\Omega, \mathcal F, \Pr)$
with respect to
$L^p$
-norm.
We represent a loss random variable of an insurance or investment portfolio by
$X \in L^p$
where a positive value of
$X$
indicates a loss and a negative value indicates a profit. A random variable
$X$
following a given distribution
$F$
is denoted by
$X^F$
. We use
$U$
to denote a uniform random variable on (0,1). It is well known that
$X^F$
and
$ F^{-1}(U)$
follow the same distribution
$F$
, denoted by
$ X^F \buildrel \mathrm{d} \over = F^{-1}(U)$
, where
$F^{-1}(u)=\inf\{x\,:\, F(x) \geqslant u\}$
for any
$0 \lt u \lt 1$
is the (left-continuous) quantile function of
$F$
.
For any set
$ A \in \mathcal F $
, we denote the corresponding indicator function by
$\mathbb{I}_A$
. For
$x,y \in \mathbb{R}$
, we define
$ (x)_+ = \max \{x, 0 \} $
,
$x \vee y =\max\{x, y\}$
, and
$x\wedge y =\min\{x,y\}$
.
Definition 2.1. A function
$g\,:\,[0,1]\mapsto[0,1]$
is said to be a distortion function if it is non-decreasing satisfying
$g(0)=0$
and
$g(1)=1$
. The distortion risk measure of a random variable
$Y$
induced by a distortion function g is defined as

provided that at least one of the two integrals in (2.1) is finite, where
$G_Y$
denotes the distribution function of
$Y$
.
In this paper, we assume that the distortion function
$g$
is absolutely continuous, with a weight function
$\gamma(u)=\partial_-g(x)|_{x=1-u}$
for
$0\lt u\lt 1$
, satisfying that
$\int_0^1 \gamma(u) \mathrm{d} u=1$
, where
$\partial_-$
denotes the derivative from the left. Thus, the distortion risk measure
$\rho^g$
of
$X^F$
has an alternative representation:

If
$\gamma(u)$
is increasing, a distortion risk measure with expression (2.2) is a coherent risk measure and is referred to as a spectral risk measure (Acerbi, Reference Acerbi2002). In this paper, we make the following assumption about the weight function
$\gamma$
:
Assumption 2.1. For
$k \geqslant 1 $
, define
$\bar{k} = (1-\frac{1}{k})^{-1}$
. Specifically,
$\bar{k}=\infty$
when
$k=1$
, and
$\bar{k}=1$
when
$k=\infty$
. Assume that the weight function
$\gamma$
of the distortion function g is non-decreasing and that
$ \Vert \gamma \Vert_k = (\!\int_0^1 |\gamma(u)|^k \mathrm{d} u )^{1/k}$
and
$ \Vert \gamma \Vert_{\bar{k}} = \big (\int_0^1 |\gamma(u)|^{\bar{k}} \, \mathrm{d} u \big )^{1/\bar{k}} $
are well-defined whenever they appear.Footnote 1
Definition 2.2. Let
$ k \geqslant 1 $
be an integer. For two distributions
$F$
and
$G$
, the Wasserstein distance of order
$k$
between
$F$
and
$G$
is given by

where
$ F^{-1} $
and
$ G^{-1}$
are the quantile functions of
$F$
and
$G$
, respectively.
According to the definition, the
$k$
-order Wasserstein distance between two distributions
$F$
and
$G$
is determined by their quantile functions
$F^{-1}$
and
$G^{-1}$
. Since a distribution and its quantile function are in one-to-one correspondence, we use
$W_k(F, \, G)$
or
$W_k(F^{-1}, \, G^{-1}) $
interchangeably to denote the Wasserstein distance between two distributions
$F$
and
$G$
. In addition, the quantile function of a worst-case distribution is called the worst-case quantile function.
Assume that a decision-maker has a reference distribution
$\hat{F}$
for a risk
$X$
, and all possible distributions for
$X$
belong to the uncertainty set
$\mathcal S ( \hat{F};\, k, \varepsilon)$
. Correspondingly, for the reference distribution
$\hat{F}$
or the reference quantile function
$\hat{F}^{-1}$
, we define the uncertainty set of quantile functions for
$X$
as

Thus,
$F \in \mathcal S ( \hat{F};\, k, \varepsilon) $
if and only if
$F^{-1} \in \mathcal Q( \hat{F}^{-1};\, k, \varepsilon)$
.
3. Worst-case distortion risk measures and worst-case distributions
If
$\ell (x) = x $
, then problem (1.4) reduces to
$ \sup_{F \in {\mathcal S }(\hat{F};\, k, \varepsilon)} \rho(X^F)$
, which was solved in Liu et al. (Reference Liu, Mao, Wang and Wei2022) for the case where
$\rho$
is a coherent distortion risk measure. However, the results and proofs from Liu et al. (Reference Liu, Mao, Wang and Wei2022) cannot be directly applied to problem (1.4) when
$\ell$
is a non-degenerated stop-loss, limited loss, or limited stop-loss transform.
In this section, we address problem (1.4) for
$\ell= \ell_d$
,
$\ell^m$
, and
$\ell_d^m$
. To this end, we reformulate problem (1.4) with
$\ell= \ell_d$
or
$\ell^m$
as a sup-sup or sup-inf problem, thereby reducing it to a one-variable maximization problem. For the case when
$\ell= \ell_d^m$
, we can reformulate the problem and solve it using the solutions obtained for
$\ell= \ell^m$
.
3.1. Stop-loss transform
In this subsection, we solve problem (1.4) when
$\ell= \ell_d$
and
$\rho$
is the distortion risk measure
$ \rho^g$
, as represented in (2.2). Specifically, we solve the following optimization problem:

To avoid discussing the trivial cases, we assume
$ \mathrm{ess\mbox{-}inf} X^{\hat{F}} \lt d \lt \mathrm{ess\mbox{-}sup} X^{\hat{F}}$
. For any distribution
$F \in {\mathcal S }(\hat{F};\, k, \varepsilon) $
, by applying arguments similar to those in Lemma 4.1 of Cai et al. (Reference Cai, Liu and Yin2024), we have

where, for any given
$\beta \in [0, 1]$
and the weight function
$\gamma$
of the distortion function
$g$
, we define

Note that the weight function
$\gamma$
itself is non-negative, with
$\Vert \gamma \Vert_1 = \int_0^1 \gamma(u) \mathrm{d} u = g(1)= 1 $
. Under Assumption 2.1,
$\gamma $
is non-decreasing on (0,1). Therefore, there exists
$ \delta \in (0,1) $
such that
$\gamma \gt 0 $
holds on the interval
$ ( \delta, 1 ) $
. Consequently,
$\gamma_{1,\beta} $
is non-negative, with
$ \Vert \gamma_{1,\beta }\Vert_1 \geqslant \int_{\beta \vee \delta}^1 \, \gamma (u) \, \mathrm{d} u \gt 0 $
for all
$ \beta\lt 1$
. Following (3.2), the worst-case risk measure value in (3.1) can be written as

We can first solve the inner maximization problem of (3.3)

by using Proposition 4 of Liu et al. (Reference Liu, Mao, Wang and Wei2022). Then, we reduce problem (3.1) to a feasible one-variable maximization problem. The following theorem characterizes the worst-case risk measure value and the worst-case distribution, if any, for the problem (3.1).
Theorem 3.1. Suppose that Assumption 2.1 holds and
$ \mathrm{ess\mbox{-}inf} X^{\hat{F}} \lt d \lt \mathrm{ess\mbox{-}sup} X^{\hat{F}}$
in the problem (3.1).
-
(i) The worst-case distortion risk measure for the problem (3.1) is given by
(3.5)where\begin{align} \sup_{F \in {\mathcal S }(\hat{F};\, k, \varepsilon)} \rho\left ( (X^F-d)_+\right ) = \max_{\beta \in [0, \, 1]} H(\beta),\end{align}
$ H(\beta) $ defined in (3.4) is a continuous function of
$\beta \in [0, 1]$ and has the following expression:
(3.6)\begin{align} H(\beta) = \int_0^1 \gamma_{1,\beta} (u) \, \hat{F}^{-1}(u) \, \mathrm{d} u + \varepsilon \, \Vert \gamma_{1,\beta} \Vert_{\bar{k}} -d\, \Vert \gamma_{1,\beta} \Vert_1, \,\,\,0 \leqslant \beta \leqslant 1. \end{align}
-
(ii) For
$k \gt 1 $ , if problem (3.1) has a maximizer
$ F^* \in \mathcal S(\hat{F};\, k, \varepsilon) $ , then the quantile function of
$F^*$ is given by
\begin{align*}(F^*)^{-1} (u) = \hat{F}^{-1}(u) + \varepsilon \, \frac{ \left (\gamma_{1,F^*(d)}(u) \right )^{\bar{k}-1}}{\Vert \gamma_{1,F^*(d)} \Vert^{\bar{k}/ k }_{\bar{k}}}, \,\,\,0 \lt u \lt 1 ,\end{align*}
$F^*(d)$ is a maximizer of the problem (3.5).
-
(iii) For
$ k \gt 1 $ , let
$\beta^*= \mathrm{arg\,max}_{\beta \in [0, \, 1]} H(\beta)$ and
(3.7)If\begin{equation} (F_{\beta^*})^{-1} (u) = \hat{F}^{-1}(u) + \varepsilon \, \frac{ \left (\gamma_{1,\beta^*}(u) \right )^{\bar{k}-1}}{\Vert \gamma_{1,\beta^*} \Vert^{\bar{k}/ k }_{\bar{k}}}, \,\,\,0 \lt u \lt 1 .\end{equation}
$F_{\beta^*} (d{-}) \leqslant \beta^* \leqslant F_{\beta^*} (d) $ , then
$F_{\beta^*}$ is a maximizer of the problem (3.1).
Proof. The proof of this theorem is given in the appendix.
Remark 3.1. The function
$H(\beta)$
defined in (3.6) is continuous on [0, 1]. Therefore, there always exists a
$\beta^* \in [0, 1]$
such that
$\sup_{F \in {\mathcal S }(\hat{F};\, k, \varepsilon)} \rho\left ( (X^F-d)_+\right ) =H(\beta^*)$
. In Theorem 3.1, we first conclude that the supremum of problem (3.1) is finite. Then, we show the form a worst-case distribution must take if such a distribution exists. Finally, we provide sufficient conditions under which a worst-case distribution – or equivalently, a worst-case quantile function – for the problem (3.1) exists.
Furthermore, Theorem 3.1(ii) indicates that if the problem (3.1) has a worst-case distribution
$ F^* \in \mathcal S(\hat{F};\, k, \varepsilon) $
, then it holds that
$ (F^*)^{-1}(u) \geqslant \hat{F}^{-1}(u)$
for any
$0\lt u\lt 1$
. This implies that
$\Pr(X^{F^*}\gt x) \geqslant \Pr(X^{\hat{F}} \gt x)$
for all
$x$
, meaning that the worst-case distribution is riskier than the reference distribution in the sense of the first-order stochastic dominance (FSD). Additionally, if the reference distribution
$\hat{F}$
is a non-negative distribution, that is,
$\hat{F}(0{-})=\Pr(X^{\hat{F}} \lt 0)= 0$
, then the worst-case distribution
$ F^* $
for the problem (3.1) is also non-negative.
In the context of insurance, where the insurance loss
$X$
(such as the amount of a claim or the number of claims) is typically non-negative, it is natural for the insurer or reinsurer to use a non-negative distribution as a reference and to consider only non-negative distributions within a Wasserstein ball.
Specifically, the insurer would be interested in the following subset of the Wasserstein ball:

Thus, Theorem 3.1 implies that if the reference quantile
$\hat{F}^{-1}$
is non-negative and if problem (3.1) has a worst-case distribution
$F^*$
, then
$(F^*)^{-1}$
is non-negative and also serves as a worst-case distribution
$F^*$
for
$\sup_{F \in {\mathcal S}^+ (\hat{F};\, k, \varepsilon)} \rho ( ( X^F-d)_+ ).$
In the following example, as applications of Theorem 3.1, we give the worst-case expected shortfall or the worst-case TVaR of a stop-loss transform over a
$k$
-order Wasserstein ball.
Example 3.1. Let
$\rho = \mathrm{ES}_\alpha$
or
$\mathrm{TVaR}_\alpha$
with
$ 0 \lt \alpha \lt 1$
be the expected shortfall or TVaR, that is,
$\rho (X^F ) =\mathrm{ES}_\alpha(X^F) = \mathrm{TVaR}_\alpha(X^F) = \frac{1}{1-\alpha} \int_\alpha^1 F^{-1} (u) \mathrm{d} u. $
It is well known that the ES is a coherent distortion risk measure with distortion function
$g(t) = \min\{ \frac{1}{1-\alpha} \, t , 1 \} $
,
$0 \lt t \lt 1$
, and the corresponding weight function is
$\gamma (t) = \frac{1}{1-\alpha} \, \mathbb{I}_{ [\alpha , 1 ] }(t) $
,
$0 \lt t \lt 1$
. Consequently, for any
$0 \lt \beta \lt 1$
, we have

Then (3.5) in Theorem 3.1 implies

In particular, if we take
$\alpha=0$
, then
$\mathrm{ES}_0= \mathbb{E} $
. The expression (3.9) implies that

which recovers the result of Proposition 2 in Guan et al. (Reference Guan, Jiao and Wang2023).
3.2. Limited loss transform
In this subsection, we solve the problem (1.4) when
$\ell= \ell^m$
and
$\rho$
is the distortion risk measure
$ \rho^g$
, as represented in (2.2). Specifically, we solve the following optimization problem:

To avoid discussing the trivial case, we assume
$ m \lt \mathrm{ess\mbox{-}sup} X^{\hat{F}}.$
To solve the problem (3.11), we denote
$ q_1 \triangleq \hat{F} (m) $
and

We point out that if
$ q_0^k = 0 $
, then

that is,
$m \vee \hat{F}^{-1} \in \mathcal Q (\hat{F}^{-1}, k,\varepsilon)$
. It is easy to see that
$ \rho \big ( ( m \vee \hat{F}^{-1} )\wedge m \big ) = \rho(m) = m $
. Meanwhile,
$F^{-1} \wedge m \leqslant m $
for any quantile function
$F^{-1}$
. By the monotonicity of
$ \rho $
, we know that
$ \rho ( X^F \wedge m ) \leqslant m $
for any
$ F \in \mathcal S(\hat{F}^{-1}, k,\varepsilon)$
. Therefore,
$ q_0^k = 0 $
implies
$ \sup_{F \in {\mathcal S }(\hat{F};\, k, \varepsilon)} \rho(X^F \wedge m) =m$
, which is a trivial case in which the worst-case risk measure is equal to the limit of
$m$
and it is achieved at a worst-case distribution
$ m \vee \hat{F}^{-1} \in \mathcal S(\hat{F}^{-1}, k,\varepsilon) $
. If
$ q_0^k \gt 0$
, then
$ m \vee \hat{F}^{-1} \notin \mathcal S(\hat{F}^{-1}, k,\varepsilon) $
. In this case, since
$ \hat{F}^{-1} (u) $
is finite for any
$ 0 \lt u \lt 1 $
, the integral
$\int_q^1 \big \vert m - \hat{F}^{-1} (u) \wedge m \big \vert^k \, \mathrm{d} u $
is continuous in
$q$
. Therefore,
$ q_0^k $
satisfies the equation

The problem (3.11) is more challenging compared to the problem (3.1) because
$\ell^m (x) = x \wedge m $
is a concave function. Additionally, using Lemmas 3.1 and 3.2 of Cai et al. (Reference Cai, Liu and Yin2024), we have

where

This means that the problem (3.11) reduces to a sup-inf problem, whereas the problem (3.1) reduces to a sup-sup problem. Additionally, note that the weight function
$ \gamma_{2,\beta} $
in
$ L(\beta,F^{-1})$
for the problem (3.14) is not a non-decreasing function. As a result, the distortion risk measure induced by
$ \gamma_{2,\beta} $
is not coherent. Therefore, the approach used for solving the problem (3.1) cannot be applied to the problem (3.11).
In this subsection, we first solve the problem (3.11) for
$k=1$
and
$k=2$
. To proceed, for any
$k \geqslant 1$
, we introduce two sets of quantile functions as follows:


In the following lemma, we show that the problem (3.11) reduces to an optimization problem over the set
$ \mathcal{A}_1$
or
$\mathcal{A}_2$
.
Lemma 3.2. Let U be a uniform random variable on (0,1). The following equations hold:

Proof. For any quantile function
$F^{-1}$
and its distribution function
$F$
, we have
$ X^F \buildrel \mathrm{d} \over = F^{-1} (U) $
,
$ X^F \wedge m \buildrel \mathrm{d} \over = F^{-1} (U) \wedge m $
, and we can write

From the proof of Proposition 3 of Liu et al. (Reference Liu, Mao, Wang and Wei2022), we have

For any
$ F \in \mathcal S(\hat{F};\, k, \varepsilon)$
satisfying
$ \hat{F}^{-1} \leqslant F^{-1}$
, we see that
$ \hat{F}^{-1 } \wedge m \leqslant F^{-1} \wedge m \leqslant m $
, and

It follows that
$ W_k ( F^{-1}\wedge m, \hat{F}^{-1}\wedge m ) \leqslant W_k ( F^{-1}, \hat{F}^{-1} ) \leqslant \varepsilon$
. Thus,
$ F^{-1}\wedge m \in \mathcal A_2 $
. Together with (3.18) and (3.19), we have

For any
$ F^{-1} \in \mathcal A_2 $
, we define a quantile function
$ \tilde{F}^{-1} (u)$
as

It is easy to see that
$\tilde{F}^{-1} \geqslant \hat{F}^{-1} \geqslant \hat{F}^{-1} \wedge m $
,
$ \tilde{F}^{-1}\wedge m = F^{-1} \wedge m = F^{-1}$
, and

where the last inequality holds because
$ F^{-1} \in \mathcal A_2$
. Therefore,
$ \tilde{F}^{-1} \in \mathcal A_1 $
and
$ \rho(X^{\tilde{F}} \wedge m ) \geqslant \rho(X^{F} )$
. It implies that

From (3.18) and (3.19), write
$ \sup_{F \in {\mathcal S }(\hat{F};\, k, \varepsilon)} \rho(X^F \wedge m) = \sup \Big \{ \int_0^1 \gamma (u) G^{-1} (u) \mathrm{d} u , G \in \mathcal{G} \Big \} $
where

Arbitrarily take
$ F^{-1} \in \mathcal{A}_1 $
. It is easy to see that
$\hat{F}^{-1} \leqslant \max\{F^{-1} , \hat{F}^{-1}\} $
and

Meanwhile,

implies
$ W_k \big (\max\big \{ F^{-1} , \hat{F}^{-1} \big \}, \hat{F}^{-1} \big ) \leqslant W_k ( F, \hat{F} ) \leqslant \varepsilon$
, where the second inequality is due to
$ F^{-1} \in \mathcal{A}_1 $
. Therefore,
$F^{-1} \wedge m = \max \big \{ F^{-1} , \hat{F}^{-1} \big \} \wedge m \in \mathcal{G}.$
It follows that

The above inequality, together with (3.20) and (3.21), imply that (3.17) is achieved as desired.
Now, we first apply Lemma 3.2 to solve the problem (3.11) when
$k=1$
.
Theorem 3.3 Assume
$k=1$
and
$ m \lt \mathrm{ess\mbox{-}sup} X^{\hat{F}} $
in the problem (3.11). Then, the quantile function of the worst-case distribution
$F^*$
for the problem (3.11) is

where
$q_1 = \hat{F}(m)$
and
$ q_0^1 $
is defined in (3.12). Furthermore, the worst-case risk measure under
$F^*$
is

Proof. The proof of this theorem is given in the appendix.
Remark 3.2. We make two noteworthy observations about Theorem 3.3. First, the worst-case distribution, or equivalently, the worst-case quantile function given in (3.22) depends only on the limit
$m$
, the reference distribution
$\hat{F}$
, and the radius
$\varepsilon$
, and is independent of the coherent distortion risk measure
$\rho$
. Specifically, when
$k=1$
, the Wasserstein distance between two distributions ordered by the first-order stochastic dominance (FSD) simplifies to the difference of their means. Consequently, if a law-invariant risk measure
$\rho$
preserves FSD and the convex order, similar arguments used in the proof of Theorem 3.3 can be applied to derive the worst-case distribution in (3.22). Thus, the result in Theorem 3.3 holds for any coherent distortion risk measure that preserves FSD and the convex order.
Second, from (3.23), we have
$ \sup_{F \in {\mathcal S }(\hat{F};\, 1, \varepsilon)} \rho(X^F \wedge m) = \rho (X^{\hat{F}} ) + \varepsilon $
if
$q_0^1 \gt 0 $
. According to Proposition 4 of Liu et al. (Reference Liu, Mao, Wang and Wei2022), it follows that
$ \sup\{\rho (X^F) \,:\, W_1 (F,\hat{F}) \leqslant \varepsilon \} = \rho (X^{\hat{F}} ) +\varepsilon $
. Therefore, regardless of the choice of
$\rho$
, when
$ m \gt \rho (X^{\hat{F}} ) +\varepsilon$
, the worst-case distribution should fully utilize the distance tolerance
$\varepsilon$
between the reference distribution and a candidate distribution in the left tail (before hitting
$m$
). This ensures that the difference between the worst-case risk measure value and
$\rho (X^{\hat{F} } ) $
remains
$ \varepsilon$
. Mathematically, this implies that
$ \sup_{F \in {\mathcal S }(\hat{F};\, 1, \varepsilon)} \rho(X^F \wedge m) = \rho (X^{\hat{F}} ) + \varepsilon $
is a constant for all
$m \in [ \rho (X^{\hat{F}} ) + \varepsilon, \, \mathrm{ess\mbox{-}sup} X^{\hat{F}}] $
.
Additionally, we note that (3.22) indicates that if the reference distribution
$\hat{F}$
is non-negative, then the worst-case distribution
$F^*$
for problem (3.11) is also non-negative. Furthermore, it serves as the worst-case distribution for the problem
$\sup_{F \in {\mathcal S}^+ (\hat{F};\, 1, \varepsilon)} \rho(X^F \wedge m)$
with
$\sup_{F \in {\mathcal S}^+ (\hat{F};\, 1, \varepsilon)} \rho(X^F \wedge m) =\sup_{F \in {\mathcal S} (\hat{F};\, 1, \varepsilon)} \rho(X^F \wedge m)$
, whose expression is given in (3.23).
Next, we solve the problem (3.11) when
$k=2$
. To do so, we introduce the concept of isotonic projection. Let
$\mathcal{K}= \big \{ k \,:\,(0,1)\mapsto\mathbb{R} \ \big \vert \ \int_0^1 k (u)^2\mathrm{d} u\lt \infty, \, k \, \text{is non-decreasing and left-continuous} \big\} $
be the space of square-integrable, non-decreasing and left-continuous functions on (0,1). Denote the isotonic projection of a function
$ f \in L^2(0,1) $
onto the space
$\mathcal{K} $
as
$ f^\uparrow=\mathrm{arg\,min}_{k \in\mathcal{K}}|| f -k ||_2.$
The properties of isotonic projection are discussed in Rüschendorf and Vanduffel (Reference Rüschendorf and Vanduffel2020) and the references therein.
Theorem 3.4. Suppose that Assumption 2.1 holds with
$k=2$
and
$m \lt \mathrm{ess\mbox{-}sup} X^{\hat{F}} $
in the problem (3.11), and
$q_0^2 \gt 0$
. Then, there exists a worst-case distribution
$F^* \in {\mathcal S}(\hat{F};\, 2, \varepsilon)$
such that
$ \sup_{F \in {\mathcal S }(\hat{F};\, 2, \varepsilon)} \rho(X^F \wedge m)= \rho \left ( X^{F^*} \wedge m \right ) $
, and

where
$\lambda^* \geqslant 0 $
and
$\theta^* \in (0,q_1) $
satisfy
$ W_2 ( F^* , \hat{F}) = \varepsilon$
. Furthermore, the worst-case risk measure value under
$F^*$
is

Proof. The proof of this theorem is given in the appendix.
Remark 3.3. Unlike the worst-case quantile in (3.22) for
$k=1$
, the worst-case quantile in (3.24) for
$k=2$
depends not only on the uncertainty set but also the risk measure
$\rho$
with the weight function
$\gamma$
. However, similar to the case when
$k=1$
, (3.24) implies that if the reference distribution
$\hat{F}$
is non-negative, then the worst-case distribution
$F^*$
for the problem (3.11) with
$k=2$
is also non-negative. Furthermore,
$F^*$
serves as the worst-case distribution for the problem
$\sup_{F \in {\mathcal S}^+ (\hat{F};\, 2, \varepsilon)} \rho(X^F \wedge m)$
, with
$\sup_{F \in {\mathcal S}^+ (\hat{F};\, 2, \varepsilon)} \rho(X^F \wedge m) =\sup_{F \in {\mathcal S} (\hat{F};\, 2, \varepsilon)} \rho(X^F \wedge m)$
, whose expression is given in (3.25).
Additionally, we note that the worst-case quantile functions for
$k=1$
and
$k=2$
share a common feature in the right tail: there exists
$0 \lt p \leqslant q_1$
such that
$ (F^*)^{-1}(u) = m$
for
$ p \lt u \leqslant q_1$
and
$(F^*)^{-1}(u) = \hat{F}^{-1}(u) $
for
$ q_1 \lt u \lt 1$
. Indeed, we can show that this feature in the right tail of a worst-case distribution holds true for all
$k \geqslant 1$
, as presented in the following proposition.
Remark 3.4. For
$k=2$
, following the argument used in Theorem 4.8 of Bernard et al. (Reference Bernard, Pesenti and Vanduffel2024), one has

The inner problem on the right hand of the equation, that is,
$ \sup_{F \in {\mathcal S}( \hat{F};\, 2, \varepsilon, \mu,\sigma) } \rho (\ell (X^F) ) $
, is studied in Cai et al. (Reference Cai, Liu and Yin2024) for stop-loss and limited loss
$\ell$
, and its optimal solutions have a relatively complex mathematical structure. Therefore, the two-step optimization problem is not mathematically tractable.
Proposition 3.5. Suppose that Assumption 2.1 holds with
$k \geqslant 1$
and
$m \lt \mathrm{ess\mbox{-}sup} X^{\hat{F}} $
in the problem (3.11).
If the problem
$\sup \left \{ \rho(F^{-1} (U)) \,:\, F^{-1} \in \mathcal A_2 \right \} $
has a maximizer
$(\tilde{F}^*)^{-1}$
, that is,
$ (\tilde{F}^*)^{-1} \in \mathrm{arg\,max} \left \{ \rho(F^{-1} (U)) \,:\, F^{-1} \in \mathcal A_2 \right \},$
then the quantile
$(F^*)^{-1}$
defined as

is the worst-case quantile for the problem (3.11).
Proof. Define
$(F^*)^{-1} (u) = \max\{(\tilde{F}^*)^{-1} (u) , \hat{F}^{-1}(u) \} $
. Note that
$ \hat{F}^{-1} (u) \leqslant (\tilde{F}^*)^{-1} (u) \lt m $
for
$ u \lt \tilde{F}^*(m{-}) $
and
$(\tilde{F}^*)^{-1} (u) = m $
for
$ u\gt \tilde{F}^*(m{-}) $
. Then we can further write
$(F^*)^{-1}$
in (3.26). It is easy to check that
$ W_k ((F^*)^{-1}, \hat{F}^{-1} ) =W_k ( (\tilde{F}^*)^{-1} , \hat{F}^{-1} \wedge m ) \leqslant \varepsilon $
and

By Lemma 3.2,
$F^* $
is a worst-case distribution for the problem (3.11).
Remark 3.5. Proposition 3.5 also means that like the cases when
$k=1$
and
$k=2$
, if the reference distribution
$\hat{F}$
is non-negative and the problem
$\sup \left \{ \rho(F^{-1} (U)) \,:\, F^{-1} \in \mathcal A_2 \right \} = \sup_{ F^{-1} \in \mathcal A_2 }\rho(F^{-1} (U))$
has a maximixer
$(\tilde{F}^*)^{-1} $
, then the worst-case distribution
$F^*$
for the problem (3.11) with
$k \geqslant 3$
is nonnegative and it is also the worst-case distribution for the problem
$\sup_{F \in {\mathcal S}^+ (\hat{F};\, k, \varepsilon)} \rho(X^F \wedge m)$
with
$\sup_{F \in {\mathcal S}^+ (\hat{F};\, k, \varepsilon)} \rho(X^F \wedge m) =\sup_{F \in {\mathcal S} (\hat{F};\, k, \varepsilon)} \rho(X^F \wedge m)$
. However, it is difficult to solve problem (3.11) for
$k \geqslant 3$
or the problem
$\sup \left \{ \rho(F^{-1} (U)) \,:\, F^{-1} \in \mathcal A_2 \right \} = \sup_{ F^{-1} \in \mathcal A_2 }\rho(F^{-1} (U))$
for
$k \geqslant 3$
in Proposition 3.5. We expect that new approaches can be developed for solving problem (3.11) when
$k \geqslant 3$
.
3.3. Limited stop-loss transform
In this subsection, we solve the problem (1.4) when
$\ell= \ell_d^m$
and
$\rho$
is the distortion risk measure
$ \rho^g$
, as represented in (2.2). Specifically, we solve the following optimization problem:

To avoid discussing trivial cases, for the problem (3.27), we assume
$ \mathrm{ess\mbox{-}inf} (X^{\hat{F}} ) \lt d \lt m+d \lt \mathrm{ess\mbox{-}sup} (X^{\hat{F}}).$
Note that the stop-loss function
$ (x-d)_+$
is convex in
$x$
, and the limited loss function
$x \wedge m $
is concave in
$x$
, but, the limited loss function
$ (x-d)_+ \wedge m $
is neither convex nor concave in
$x$
. Hence, we cannot apply the arguments used in Sections 3.1 and 3.2 to solve problem (3.27). However, we can reformulate problem (3.27) so that we can use the solutions in Section 3.2 to solve the problem (3.27). In Section 3.2, solutions for problem (3.11) are available only when
$k=1$
and
$k=2$
. Thus, for the problem (3.27), we only consider the cases when
$k=1$
and
$k=2$
.
Recall the set of quantile functions
$\mathcal Q_d (\hat{F}^{-1};\, k, \varepsilon)= \{\hat{G}^{-1}\,:\, \hat{G}^{-1}+d \in \mathcal Q ( \hat{F}^{-1};\, k, \varepsilon) \}$
defined in the proof of Theorem 3.1 and
$\rho_{1,\beta_1}$
is a coherent distortion risk measure induced by distortion function
$g_{1,\beta_1}$
defined in (A1). Thus, we reduce the problem (3.27) to a problem involving the problem (3.11), as stated in the following proposition:
Proposition 3.6. Suppose that Assumption 2.1 holds,
$ m \gt 0$
, and
$ \mathrm{ess\mbox{-}inf} (X^{\hat{F}} ) \lt d \lt d+m \lt \mathrm{ess\mbox{-}sup} (X^{\hat{F}}) $
. Then, for
$k=1,2$
, we have

Proof. For any
$ F \in \mathcal S (\hat{F};\, k, \varepsilon)$
, we have

where
$\gamma_{1,\beta} = \gamma\cdot \mathbb{I}_{[\beta , 1]}$
with
$\Vert \gamma_{1, \beta} \Vert_1 = \int_{\beta}^1 \gamma(u) \mathrm{d} u = 1 - \int_0^{\beta} \gamma(u) \mathrm{d} u$
. Using the same argument in Section 3.1, for any
$\beta \lt 1 $
, we can define a coherent distortion risk measure
$\rho_{1,\beta}$
induced by distortion function
$g_{1,\beta}$
defined in (A1). Denote the quantile function
$F^{-1}(u) - d$
by
$F_{-d}^{-1}(u)$
and denote
$ F(d+m)$
by
$ F_{-d} (m) $
, that is,
$F_{-d}^{-1} = F^{-1} - d$
and
$ F(d+m) = F_{-d} (m)$
. It is easy to check that

where the last equality is from (A6). If
$ \beta \geqslant F_{-d} (m) $
, then
$ \gamma_{1, \beta } (u) = \gamma (u) \cdot \mathbb{I}_{[\beta,1]} (u) = 0 $
for all
$ u \lt F_{-d} (m) $
and
$\int_0^{F_{-d}(m) } \gamma_{1, \beta} (u) \left ( F_{-d}^{-1} (u) - m \right ) \mathrm{d} u = 0$
. Therefore, for any
$\beta \geqslant F_{-d} (m)$
, we have

It says that
$ F_{-d} (m)$
is sub-optimal to all larger probability levels for the maximization problem (3.29). As a consequence, we have

For any
$ \beta \in [0,1]$
, the function
$\gamma_{1,\beta} = \gamma \cdot \mathbb{I}_{[\beta,1]}$
is increasing and thus the risk measure
$\rho_{1, \beta }$
is coherent. The problem
$ \sup_{F^{-1} \in \mathcal Q_d } \rho_{1,\beta} ( X^{F} \wedge m) $
can be solved by Theorem 3.3 and Theorem 3.4 when
$k=1$
and
$k=2$
, respectively. Therefore, the expression (3.28) is obtained.
Using the solutions for (3.11) derived in Section 3.2, for
$k=1,2$
, we can solve the problem
$\max_{F^{-1} \in \mathcal Q_d (\hat{F}^{-1};\, k, \varepsilon) } \big \{ \rho_{1,\beta} ( X^{F} \wedge m) \big \} $
in (3.28) and reduce (3.11) to a feasible one-variable maximization problem.
Corollary 3.7. Suppose that Assumption 2.1 holds,
$ m \gt 0$
, and
$ \mathrm{ess\mbox{-}inf} (X^{\hat{F}} ) \lt d \lt d+m \lt \mathrm{ess\mbox{-}sup} (X^{\hat{F}}) $
.
-
(1) If
$k=1$ , then
(3.30)where, for any\begin{align} \sup_{F \in \mathcal S(\hat{F};\, 1, \varepsilon) } \rho \left ( (X^F-d)_+ \wedge m ) \right ) & = \max_{0\leqslant \beta \leqslant 1} \big \{ \Vert \gamma_{1, \beta} \Vert_1 \cdot \rho_{1,\beta} ( X^{F^*} \wedge m) \big \},\end{align}
$\beta \in [0, 1]$ ,
\begin{align*}(F^*)^{-1} (u) = \begin{cases}\hat{F}^{-1}(u)-d , & 0 \lt u \leqslant \tilde{q}_0^1, \\ m-d , & \tilde{q}_0^1 \lt u \leqslant \tilde{q}_1,\\\hat{F}^{-1} (u) -d, & \tilde{q}_1 \lt u \lt 1, \end{cases}\end{align*}
$ \tilde{q}_1 = \hat{F} (m+d )$ ,
$ \tilde{q}_0^1 = \inf \big \{q \geqslant 0 \,:\, \int_q^{q_1} \big \vert m - \hat{F}^{-1} (u) + d \big \vert \, \mathrm{d} u \leqslant \varepsilon \big \}$ .
-
(2) If
$k=2$ , then
(3.31)where, for any\begin{align} \sup_{F \in \mathcal S(\hat{F};\, 2, \varepsilon) } \rho \left ( (X^F-d)_+ \wedge m ) \right ) & = \max_{0\leqslant \beta \leqslant 1} \big \{ \Vert \gamma_{1, \beta} \Vert_1 \cdot \rho_{1,\beta} ( X^{F^*_{\beta}} \wedge m) \big \},\end{align}
$\beta \in [0, 1]$ ,
\begin{align*}(F^{*}_{\beta})^{-1} (u) = \begin{cases} \hat{F}^{-1}(u) + \lambda^* \, \gamma_{1,\beta_1}(u) -d , & 0 \lt u \leqslant \theta^* , \\ m-d , & \theta^* \lt u \leqslant \tilde{q}_1 , \\ \hat{F}^{-1}(u)-d , & \tilde{q}_1 \lt u \lt 1, \end{cases}\end{align*}
$\lambda^* \gt 0 $ and
$\theta^* \in (0,1) $ satisfy
$ W_2 ( F^*_{\beta} , \hat{F}-d) = \varepsilon$ .
Proof. For any
$\beta \in [0, 1]$
, (3.22) implies that
$(F^*)^{-1}$
is the maximizer of
$ \max_{F^{-1} \in \mathcal Q_d } \left \{ \rho_{1,\beta} ( X^{F} \wedge m ) \right \}$
when
$k=1$
, while (3.24) implies that
$ (F^{*}_{\beta_1})^{-1}$
is the maximizer of
$ \max_{F^{-1} \in \mathcal Q_d } \left \{ \rho_{1,\beta} ( X^{F} \wedge m) \right \}$
when
$k=2$
. Then the statement of the corollary follows consequently.
Both (3.30) and (3.31) are feasible one-variable maximization problems and can be easily solved numerically. We will illustrate their applications in the next section.
4. Numerical illustrations of worst-case distributions, worst-case risk measures, and their applications
In a reinsurance treaty, an insurer cedes part of its underlying insurance loss
$X$
to a reinsurer. The ceded loss, denoted by
$\ell(X)$
, is determined by the ceded loss function
$\ell(x)$
. The reinsurer charges a (reinsurance) premium based on a premium principle
$\pi$
, so the premium paid by the insurer is
$\pi(\ell(X))$
, where
$\pi$
is a mapping from random variables to real numbers. Specifically,
$\pi(\ell(X))= (1+\theta) \mathbb{E}[ \ell(X) ]$
is known as the expected value principle, while
$\pi(\ell(X))= (1+\theta) \rho(\ell(X))$
is called the (extended) distortion premium principle, where
$\theta \geqslant 0$
and
$\rho$
is a distortion risk measure. Additionally, if
$\ell(X)=(X-d)_+$
, the (net) premium
$\mathbb{E}[(X-d)_+] $
is referred to as the stop-loss premium. If
$\ell(X) = X \wedge m$
, the (net) premium
$\mathbb{E}[X \wedge m] $
is known as the limited expected value, representing the net premium of an insurance or reinsurance policy with a payment limit of
$m$
. For applications of the transforms of
$(x-d)_+$
,
$x \wedge m$
, and
$(x-d)_+ \wedge m$
in insurance, we refer to Albrecher et al. (Reference Albrecher, Beirlant and Teugels2017), Klugman et al. (Reference Klugman, Panjer and Willmot2019), and references therein.
In general, calculating the premium
$\pi(\ell(X))$
requires complete information about the distribution of the underlying insurance loss
$X$
. However, when the distribution of
$X$
is uncertain, the premium
$\pi(\ell(X))$
is often unknown. In such cases, the reinsurer needs to estimate the premium
$\pi(\ell(X))$
. Suppose that the reinsurer uses the uncertainty set
$ {\mathcal S }$
to represent all possible distributions of
$X$
and has an initial estimate
$\hat{F}$
for the distribution of
$X$
. Based on this estimate, the reinsurer might determine the premium as
$\pi(\ell(X^{\hat{F}}))$
. However, this approach may yield a premium that is lower than necessary, as it does not account for the distribution uncertainty.
In fact, when the distribution of
$X$
is uncertain, the reinsurer is more interested in understanding the highest premium that could be potentially be required, given this uncertainty. Therefore, the reinsurer is concerned with the problem of finding
$\sup_{F \in {\mathcal S}}\pi(\ell(X^F))$
, which represents the sharp upper bound of the premium that accounts for all possible distributions within the uncertainty set
${\mathcal S }$
.
If
${\mathcal S}$
is the set of distributions with given mean and variance, or with given moments up to a higher order, the upper bounds for the stop-loss premium
$\mathbb{E}[(X-d)_+] $
have been well studied in the literature (see, for instance, Jansen et al., Reference Jansen, Haezendonck and Goovaerts1986; Kaas and Goovaerts, Reference Kaas and Goovaerts1986, and the references therein). When
${\mathcal S}$
consists of distributions lying within a Wasserstein ball and satisfying constraints on the first two moments, and
$\rho$
is a coherent distortion risk measure, sharp upper bounds for the distortion premiums associated with stop-loss reinsurance
$(X-d)_+ $
and limited loss reinsurance
$X \wedge m$
have been derived in Cai et al. (Reference Cai, Liu and Yin2024). Utilizing the results from Section 3 of this paper, we can extend this analysis to obtain sharp upper bounds for distortion premiums associated with limited stop-loss reinsurance. Specifically, when the distribution of
$X$
is uncertain and belongs to
$ \mathcal S(\hat{F};\, k, \varepsilon)$
, and
$\rho$
is a coherent distortion risk measure, we are able to provide sharp upper bounds for the distortion premiums associated with stop-loss reinsurance
$(X-d)_+$
, limited loss reinsurance
$X \wedge m$
and limited stop-loss reinsurance
$(X-d)_+ \wedge m$
.
To illustrate the results derived in Section 3 and their applications, in this section, we employ Wang’s premium principle along with numerical examples. This approach will help us analyze the effects of the reference distribution
$\hat{F}$
, the radius
$ \varepsilon $
in the Wasserstein ball
$ \mathcal S(\hat{F};\, k, \varepsilon)$
, and retention levels on the premium of limited stop-loss reinsurance.
Wang’s premium principle, introduced in Wang (Reference Wang2000), is a special distortion premium principle that utilizes a specific distortion risk measure
$\rho_\alpha$
that has the following distortion function:

where
$\Phi^{-1}(u)$
is the quantile function of the standard normal distribution function
$\Phi(x)$
. The distortion function
$g_\alpha(u)$
has a weight function
$\gamma_\alpha(u)=\partial_-g(x)|_{x=1-u} =g'(1-u)= e^{-\alpha \Phi^{-1}(1-u) - \frac{\alpha^2}{2}}$
, which is non-negative and increasing in
$u \in (0, 1)$
for any
$\alpha \in (0, 1)$
. As illustrated in Wang (Reference Wang2000), the distortion function or the transform defined in (4.1) has various interesting applications in pricing insurance and finance products.
In this section, we assume
$k=2$
in the Wasserstein ball
$ \mathcal S(\hat{F};\, k, \varepsilon)$
. We point out that the weight function
$\gamma_\alpha(u) = g'_\alpha(u) = e^{-\alpha \Phi^{-1}(u) - \frac{\alpha^2}{2}}$
satisfies Assumption 2.1. Specifically, for any
$\alpha \in (0,1) $

where
$Z$
follows the standard normal distribution
$ \Phi $
and
$ \mathbb{E} [e^{-2\alpha Z }] =e^{\frac{4\alpha^2}{2}}$
is the moment generating function of
$Z$
.
In the first example of this section, we illustrate the worst-case distributions for the problems of
$\sup_{F \in \mathcal S(\hat{F};\, k, \varepsilon)} \rho_\alpha((X^F -d)_+)$
,
$\sup_{F \in \mathcal S(\hat{F};\, k, \varepsilon)} \rho_\alpha(X^F \wedge m)$
, and
$\sup_{F \in \mathcal S(\hat{F};\, k, \varepsilon)} \rho_\alpha((X^F-d)_+ \wedge m)$
.
Example 4.1. Assume that the reference distribution
$\hat{F}$
in the set
$ \mathcal S(\hat{F};\, k, \varepsilon)$
is a Pareto distribution given by
$\hat{F}(x)=1-(\frac{12}{x+12})^4$
for
$x\geqslant 0$
, with
$\varepsilon=2$
and
$k=2$
. Using Theorem 3.1(iii), Theorem 3.4, and Proposition 3.6, we can identify the quantile functions
$(F^*)^{-1}$
of the worst-case distributions
$F^*$
such that
$\sup_{F \in \mathcal S(\hat{F};\, k, \varepsilon)} \rho_\alpha((X^F -d)_+) =\rho_\alpha((X^{F^*} -d)_+)$
,
$\sup_{F \in \mathcal S(\hat{F};\, k, \varepsilon)} \rho_\alpha(X^{F} \wedge m)=\rho_\alpha(X^{F^*} \wedge m) $
, and
$\sup_{F \in \mathcal S(\hat{F};\, k, \varepsilon)} \rho_\alpha((X^{F} -d)_+ \wedge m)=\rho_\alpha((X^{F^*}-d)_+ \wedge m)$
. The quantile functions
$(F^*)^{-1}$
, with
$\alpha =0.5$
for different deductibles d and limits m, are plotted in Figures 1–2, respectively.

Figure 1. The quantile functions of the worst-case distributions for stop-loss (
$d=5,10,15 $
and
$M = \infty$
) and limited loss (
$d=0$
and
$M=5,10,15$
) reinsurance.

Figure 2. The quantile function of the worst-case distributions for limited stop-loss reinsurance.
From these figures, we observe that the worst-case quantile function dominates the quantile function of the reference distribution in different ways. Specifically, for the stop-loss reinsurance, the worst-case quantile function exceeds the quantile function of the reference distribution in the right tails. In contrast, for the limited loss reinsurance, the worst-case quantile function dominates in the left tails. However, for the limited stop-loss reinsurance, the worst-case quantile function dominates in the middle range. These numerical results are consistent with the objectives of these reinsurances. In fact, for stop-loss reinsurance and limited loss reinsurance, the left-tail and right-tail risks of the reinsurer are eliminated, respectively. In the case of limited stop-loss reinsurance, both left-tail and right-tail risks of the reinsurer are eliminated. Therefore, the worst-case scenarios for the premiums of stop-loss reinsurance, limited loss reinsurance, and limited stop-loss reinsurance are expected to primarily involve right-tail, left-tail, and middle-range risks, respectively.
In the following example, we illustrate the effects of the radius
$\varepsilon$
and the reference distribution
$\hat{F}$
in
$\mathcal S(\hat{F};\, k, \varepsilon)$
, as well as the retention levels
$d$
and
$m$
on Wang’s premium for limited stop-loss reinsurance
$(X-d)_+ \wedge m$
. Specifically, we examine how these factors influence
$\sup_{F \in \mathcal S(\hat{F};\, k, \varepsilon)} \rho_\alpha((X^{F} -d)_+ \wedge m)$
, which represents the highest premium that could be expected when the distribution of the underlying insurance loss
$X$
is uncertain and belongs to the set
$ \mathcal S(\hat{F};\, k, \varepsilon)$
.
Example 4.2. For the uncertainty set
$ \mathcal S(\hat{F};\, k, \varepsilon)$
with
$k=2$
, we consider two different reference distributions. The first is an exponential reference distribution given by
$\hat{F}_1(x) = 1-e^{-x/4}$
for
$ x\geqslant 0$
, and the second is a Pareto reference distribution given by
$\hat{F}_2(x) = 1-\big (\frac{12}{x+12}\big )^4$
for
$ x\geqslant 0$
. Both distributions
$\hat{F}_1$
and
$\hat{F}_2$
have the same mean of 4, but they differ in variance:
$\hat{F}_1$
has a variances of 16, while
$\hat{F}_2$
has a variances of 32. In these senses,
$X^{\hat{F}_2}$
is risker than
$X^{\hat{F}_1}$
. Indeed, in the context of insurance, the exponential distribution is typically used to model light-tail risks, whereas the Pareto distribution is used for heavy-tail risks.
We consider a limited stop-loss reinsurance with the ceded loss function
$\ell_d^m(x) = (x - d)_+ \wedge m$
. To illustrate the effect of those factors on
$\sup_{F \in \mathcal S(\hat{F};\, 2, \varepsilon)} \rho_\alpha((X^{F} -d)_+ \wedge m)$
, we calculate
$\sup_{F \in \mathcal S(\hat{F};\, 2, \varepsilon)} \rho_\alpha((X^{F} -d)_+ \wedge m)$
when
$\hat{F}=\hat{F}_1$
and
$\hat{F}=\hat{F}_2$
with
$\alpha=0.5$
for different values of
$\varepsilon$
,
$d$
, and
$m$
.
In Figure 3 and columns 1–3 in Table 1, we examine how the radius
$\varepsilon $
of the Wasserstein ball affects
$\sup_{F \in \mathcal S(\hat{F};\, 2, \varepsilon)} \rho_{0.5}((X^{F} -d)_+ \wedge m)$
. To do this, we vary
$\varepsilon $
from 0.1 to 1.9, while keeping
$d=5$
and
$m=5$
constant. In Table 1, the values in the column under the heading
$\text{EXP}^{w}$
represent
$\sup_{F \in \mathcal S(\hat{F}_1;\, 2, \varepsilon)} \rho_{0.5}((X^{F} -d)_+ \wedge m)$
with the exponential reference distribution
$\hat{F}_1$
. The values in the column under the heading
$\text{PAR}^{w}$
represent
$\sup_{F \in \mathcal S(\hat{F}_2;\, 2, \varepsilon)} \rho_{0.5}((X^{F} -d)_+ \wedge m)$
with the Pareto reference distribution
$\hat{F}_2$
. Additionally, the values in the column under the heading
$\text{EXP}^{r}$
represent
$ \rho_{0.5}((X^{{F}_1} -d)_+ \wedge m)$
with the exponential reference distribution
$\hat{F}_1$
. The values in the column under the heading
$\text{PAR}^{r}$
represent
$\rho_{0.5}((X^{{F}_2} -d)_+ \wedge m)$
with the Pareto reference distribution
$\hat{F}_2$
. From these calculations, we observe that as
$\varepsilon$
increases – indicating a larger uncertainty set for the distribution of
$X$
– the worst-case Wang’s premium also rises. In other words, greater uncertainty about the underlying loss distribution necessitates a higher premium to cover the worst-case scenarios. This reflects the intuitive notion that higher uncertainty about the distribution of losses requires a higher premium to safeguard against potential worst-case scenarios.

Figure 3.
${\small \sup_{F \in \mathcal S(\hat{F}; 2, \varepsilon)} \rho_{0.5}((X^{{F}} -5)_+ \wedge 5)} \text{vs} \varepsilon$
.
Table 1. Wang’s premiums for the limited stop-loss reinsurance
$(X-d)_+ \wedge m$
.

In Figure 4 and columns 4–8 in Table 1, we analyze how the limit
$m$
affects
$\sup_{F \in \mathcal S(\hat{F};\, 2, \varepsilon)} \rho_{0.5}((X^{F} -d)_+ \wedge m)$
. We vary
$m$
from 4 to 13, while keeping
$d=5$
and
$\varepsilon =2 $
fixed. Additionally, in Figure 5 and columns 9–13 in Table 1, we examine how the deductible
$d$
affects
$\sup_{F \in \mathcal S(\hat{F};\, 2, \varepsilon)} \rho_{0.5}((X^{F} -d)_+ \wedge m)$
. We vary
$d$
from 0.5 to 9.5, while keeping
$m=5$
and
$\varepsilon =2 $
fixed. From these calculations, we observe that as
$m$
increases – indicating a larger amount of loss covered by the reinsurer – the worst-case Wang’s premium also increases. Conversely, as
$d$
increases – indicating that the reinsurer covers less of the underlying insurance loss – the worst-case Wang’s premium decreases.

Figure 4.
$ {\small \sup_{F \in \mathcal S(\hat{F};\, 2, 2)} \rho_{0.5}((X^{{F}} -5)_+ \wedge m) } \, \text{vs} \, m$
.

Figure 5.
${\small \sup_{F \in \mathcal S(\hat{F};\, 2, 2)} \rho_{0.5}((X^{{F}} -d)_+ \wedge 5)} \, \text{vs} \, d$
.
Furthermore, form Figures 3–5 and Table 1, we also observe that Wang’s premiums for the limited stop-loss reinsurance with the exponential reference distribution is higher than that with the Pareto reference distribution. For instance,
$\rho_{0.5}((X^{\hat{F}_1}-5)_+\wedge 5)=1.5535$
is greater than
$\rho_{0.5}((X^{\hat{F}_2}-5)_+\wedge 5)=1.4748$
. These results are also reasonable. In fact, we have
$\mathbb{E}[(X^{\hat{F}_1}-5)_+\wedge 5] = 0.817679 $
, which is greater than
$\mathbb{E}[(X^{\hat{F}_2}-5)_+\wedge 5] =0.757744$
, and
$\text{var}((X^{\hat{F}_1}-5)_+\wedge 5) =2.58943 $
, which is greater than
$\text{var}((X^{\hat{F}_2}-5)_+\wedge 5) = 2.57043.$
Thus, in these senses, the exponential distribution is risker than the Pareto distribution for the limited stop-loss reinsurance
$(X-5)_+ \wedge 5$
. These numerical results along with the fact that the reinsurer, selling the limited stop-loss reinsurance
$(X-5)_+ \wedge 5$
, is mainly concerned with the middle-range risks. The exponential survival function dominates in left tail, while the Pareto survival function dominates in right tail. Indeed, for the two distributions
$\hat{F}_1$
and
$\hat{F}_2$
, it holds that
$\Pr ( X^{F_1} \gt x)\gt \Pr(X^{F_2} \gt x)$
for
$0 \lt x \lt 8.80321$
, while
$\Pr( X^{F_1} \gt x ) \lt \Pr( X^{F_2} \gt x)$
for
$x \gt 8.80321$
. In these senses, the exponential distribution is risker than the Pareto distributions for the limited stop-loss reinsurance
$(X-5)_+ \wedge 5$
.
However, if we consider the stop-loss reinsurance
$(X-5)_+$
, we have
$\mathbb{E}[(X^{\hat{F}_1}-5)_+] = 9.16815$
, which is less than
$ \mathbb{E}[(X^{\hat{F}_2}-5)_+] = 23.917$
, and
$\text{var}((X^{\hat{F}_1}-5)_+) =7.85479 $
, which is less than
$\text{var}((X^{\hat{F}_2}-5)_+) =21.9376.$
Thus, in these senses, the Pareto distribution is risker than the exponential distribution for the stop-loss distribution
$(X-5)_+$
. Using the results derived in Section 3, we can also calculate Wang’s premiums
$\rho_{0.5}((X^{\hat{F}_1}-5)_+)$
,
$\rho_{0.5}((X^{\hat{F}_2}-5)_+)$
,
$\sup_{F \in \mathcal S(\hat{F}_1;\, 2, \varepsilon)} \rho_{0.5}((X^{{F}} -d)_+)$
and
$\sup_{F \in \mathcal S(\hat{F}_2;\, 2, \varepsilon)} \rho_{0.5}((X^{{F}} -d)_+)$
for the stop-loss reinsurance
$(X-5)_+$
and conclude that the Pareto distribution is risker than the exponential distributions for the stop-loss reinsurance
$(X-5)_+$
. Detailed calculations for the stop-loss reinsurance
$(X-5)_+$
are omitted.
In addition, form Figures 3–5 and Table 1, we also observe that the worst-case Wang’s premiums or the highest Wang’s premiums for the limited stop-loss reinsurance is significantly higher than those calculated based on the reference distributions. For instance,

and

Sharp upper bounds for the premiums of reinsurance contracts, or the worst-case premiums, are helpful for reinsurers in pricing reinsurance contracts when the distribution of the underlying insurance loss is uncertain or unknown.
Besides the applications illustrated above, at the end of this section, we point out that other possible applications of the results derived in Sections 3 and 4, as well as a connection between the following two problems

and

where
$\ell(x)$
is a loss function,
$R_{\ell}(X)=X-\ell(X)$
is the retained loss for the insurer under the reinsurance contract
$\ell$
, and
$\pi(\ell(X))$
is the reinsurance premium.
The problem (4.3) is a popular topic in robust optimal reinsurance studies; see, for example, Landriault et al. (Reference Landriault, Liu and Shi2025) and references therein. If the regularity conditions in the min-max theorem (see, e.g., Fan, Reference Fan1953) are satisfied by the problem (4.3), then it is equivalent to the following problem:

For any fixed
$F \in \mathcal{S}$
, denote by
$I^*(x) = I^*(x, F)$
the optimal solution to the inner problem in (4.4), if such a solution exists. Thus, the problem (4.4) further reduces to:

which can be viewed as generalization of the problem (4.2) with
$\ell(x) = R_{I^*}(x) + \pi(I^*(x))$
.
Using the above idea and framework (4.5), Boonen and Jiang (Reference Boonen and Jiang2025) solve the problem (4.3) for the case where
$\mathcal{S} = \mathcal{S}(\hat{F};\, k, \varepsilon)$
,
$\rho$
is a convex distortion risk measure,
$I$
satisfies the incentive compatibility condition,
$\mathcal I$
is a compact set, and
$X$
is a bounded random variable.
However, the equivalence between problems (4.3) and (4.4) holds only if the order of
$\min$
and
$\sup$
can be exchanged. Solutions to the problem (4.5) heavily rely on the solutions to the inner problem in (4.4). In general, the objective functions in problems (4.2) and (4.5) differ, requiring different methodologies to solve them. This distinction is illustrated in our paper and Cai et al. (Reference Cai, Liu and Yin2024) for the problem (4.2), compared to the approach used by Boonen and Jiang (Reference Boonen and Jiang2025) for the problem (4.5). We believe that investigating problems (4.2), (4.3), and (4.5) with various uncertainty sets
$\mathcal S$
and different functions
$\ell$
is a valuable topic for future research.
5. Conclusion
In this paper, we investigate the problem of finding the worst-case values of distortion risk measures for transformed loss random variables over a Wasserstein ball. We derive expressions for worst-case coherent distortion risk measures for stop-loss, limited loss, and limited stop-loss transforms when the underlying insurance loss is uncertain and lies within a general
$k$
-order Wasserstein ball. Additionally, we identify the worst-case distributions – equivalently, the worst-case quantile functions – under which these worst-case risk measures are attained. This problem presents significant challenges as it involves infinite-dimensional optimization; however, our approach simplifies it to a one-variable optimization problem. Specifically, we provide solutions for the stop-loss transform over a general
$k$
-order Wasserstein ball, while for the limited loss and limited stop-loss transforms, we focus on first or second-order Wasserstein balls.
In the context of insurance, we concentrate on the subset of the Wasserstein ball that includes only non-negative distributions. Our findings show that the worst-case coherent risk measures for these transforms over the Wasserstein ball are equivalent to those over this subset of non-negative distributions. This result enhances the practical application of these worst-case values in the insurance industry.
Our numerical illustrations of the worst-case distributions and risk measures, as well as their applications in reinsurance premiums, demonstrate that the results are both intuitive and sensible. The limited stop-loss transforms, including their special cases of stop-loss and limited loss transforms, along with coherent risk measures, have extensive applications across insurance, finance, operations research, and beyond. We plan to explore further applications of the results derived in this paper in future research.
A. Appendix
Proof of Theorem 3.1: (i) First, note that by using arguments similar to those in the proof of Theorem 4.1 in Cai et al. (Reference Cai, Liu and Yin2024),
$H(\beta)$
, as defined in (3.4), can be written as

where
$ \mathcal Q_d ( \hat{F}^{-1};\, k, \varepsilon) \triangleq \{\hat{G}^{-1}\,:\, \hat{G}^{-1}+d \in \mathcal Q ( \hat{F}^{-1};\, k, \varepsilon) \}. $
If
$\beta= 1 $
, then
$\gamma_{1,\beta} = 0 $
and
$ H(1) = 0 $
for all
$G^{-1}$
. In the following, we focus on the case when
$ \beta \in [0,1)$
. Then,
$\gamma_{1,\beta } $
is not the constant zero, and
$\gamma_{1,\beta } \leqslant \gamma $
implies
$ \Vert \gamma_{1,\beta} \Vert_1$
,
$ \Vert \gamma_{1,\beta} \Vert_{\bar{k}} $
and
$ \Vert \gamma_{1,\beta} \Vert_k$
are well defined, where
$\bar{k} = (1-1/k)^{-1}$
. Note that the function
$ \frac{ \gamma_{1,\beta}(u)}{ \Vert \gamma_{1,\beta} \Vert_1} \geqslant 0 $
is non-decreasing in
$u\in (0,1)$
and
$ \int_0^1 \frac{ \gamma_{1,\beta}(u)}{ \Vert \gamma_{1,\beta} \Vert_1} \mathrm{d} u = 1.$
Therefore, function
$g_{1,\beta} (q)$
defined as

is non-decreasing and concave with
$g_{1,\beta}(0)=0 $
and
$g_{1,\beta}(1)=1$
. That is,
$g_{1,\beta}$
is a concave distortion function, whose weight function is
$\frac{\gamma_{1,\beta}}{||\gamma_{1,\beta}||_1}$
. Denote
$\rho_{1,\beta}$
to be the convex distortion risk measure induced by
$g_{1,\beta}$
. Then, for any
$\beta \in [0,1) $
, we have

where the second last equality is given by the Proposition 4 in Liu et al. (Reference Liu, Mao, Wang and Wei2022). Combining the cases when
$\beta = 1 $
and
$\beta \in [0,1)$
, we verify that
$H(\beta) $
satisfies the expression (3.6) for all
$\beta \in [0,1]$
.
Since
$ \vert \gamma (u) \vert \lt \infty $
and
$ \vert \hat{F}(u)\vert\lt \infty $
for all
$ u \in (0,1)$
, all integrals
$\int_0^1 \gamma_{1,\beta} (u) \hat{F}^{-1}(u) \mathrm{d} u = \int_{\beta}^1 \gamma (u) \hat{F}^{-1}(u) \mathrm{d} u $
,
$ \Vert \gamma_{1,\beta} \Vert_{\bar{k}} = \big ( \int_{\beta}^1 \gamma(u)^{\bar{k}} \mathrm{d} u \big )^{1/\bar{k}}$
, and
$ \Vert \gamma_{1,\beta} \Vert_1 = \int_{\beta}^1 \gamma(u) \mathrm{d} u $
are continuous in
$\beta$
. Therefore,
$ H(\beta) $
is a continuous function. Hence, the supremum of
$ H (\beta) $
can be achieved on the compact set [0,1]. The expression (3.5) holds as a consequence. Furthermore, since
$d \lt \mathrm{ess\mbox{-}sup} X^{\hat{F}} $
, we have
$ \sup_{F \in {\mathcal S }(\hat{F};\, k, \varepsilon)} \rho\left ( (X^F-d)_+\right ) \geqslant \rho\big ( (X^{\hat{F}}-d)_+\big ) \gt 0$
. Therefore, the maximum of
$ H (\beta) $
can be achieved in [0,1) because
$ H ( 1 )= 0 $
.
(ii) Take
$k\gt 1$
. Suppose that
$ F^* \in \mathcal S(\hat{F};\, k, \varepsilon) $
is a maximizer of problem (3.1). For a fixed
$F^*(d)$
, the maximizer of the following problem

is given in the proof of Proposition 4 of Liu et al. (Reference Liu, Mao, Wang and Wei2022) and the maximizer for problem (A2) can be expressed by its quantile function

Indeed, the supremum value of problem (A2) can be achieve if and only if
$F^{-1}(u) - \hat{F}^{-1} (u) \geqslant 0 $
for all
$0\lt u\lt 1$
, and the following Hölder’s inequality achieves its equal sign

which is further equivalent to the condition that
$ \gamma_{1,F^*(d)}^{\bar{k}} $
and
$( F^{-1}- \hat{F}^{-1} )^{k} $
are linearly dependent. That is
$F_0^{-1} $
is the unique solution to the problem (A2). Then by the similar argument in Proposition 4.1-(i) in Cai et al. (Reference Cai, Liu and Yin2024), we have
$ (F^*)^{-1}= F_0^{-1}$
a.s., and equivalently,
$ F^* = F_0$
. Furthermore,

Therefore,
$F^*(d) $
is a maximizer for the problem (3.5).
(iii) The proof of the statement is similar to the argument used in Proposition 4.1-(ii) in Cai et al. (Reference Cai, Liu and Yin2024), and therefore, we omit the detail proof here.
Proof of Theorem
3.3: For
$k=1$
, the Wasserstein distance between
$F$
and
$\hat{F}$
becomes
$W_1 ( F, \hat{F} ) = \int_0^1 \big \vert F^{-1} (u)- \hat{F}^{-1} (u) \big \vert \, \mathrm{d} u .$
Moreover, for any
$ F \in \mathcal S$
with
$ \hat{F}^{-1} \leqslant F^{-1}$
, we have
$W_1 ( F, \hat{F} ) = \int_0^1 ( F^{-1} (u)- \hat{F}^{-1} (u) ) \, \mathrm{d} u =\mathbb{E}[ X^F] - \mathbb{E}[X^{\hat{F}}] $
.
For any
$ F \in \mathcal A_2$
, it holds that
$\hat{F}^{-1}\wedge m \leqslant F^{-1} \leqslant m $
and

Define
$ h({\kern1pt}p) \triangleq \int_0^p \hat{F}^{-1}(u) \mathrm{d} u + m (1-p) $
as a function of
$p \in [0,1]$
. Since
$ \hat{F}^{-1} (u) $
is bounded for all
$ u \in (0,1)$
, the function
$h({\kern1pt}p)$
is continuous in
$p$
. With
$q_1 = \hat{F}(m)$
, we have

where the second equality holds because, for any
$0 \lt u \lt 1 $
and quantial function
$G^{-1}$
,
$ u \leqslant G(x) $
if and only if
$ G^{-1} (u) \leqslant x $
. On the other hand, if
$ q_0^1 $
defined in (3.12) is zero, then
$ h(0) = m \geqslant \mathbb{E} [F^{-1} (U) ] $
. If
$ q_0^1 \gt 0 $
, then we have
$ \hat{F}^{-1} (u) \leqslant m $
for
$ 0 \lt u \leqslant q_0^1 \leqslant q_1$
, and

where the third equality comes from (3.13) with
$k=1$
. From (A3), we have
$h(q_0^1) \geqslant \mathbb{E} [ F^{-1} (U) ] $
. Together with (A4) and the continuity of
$h({\kern1pt}p)$
, there exists
$p_0 \in [q_0^1, q_1 ] $
such that
$\mathbb{E} [F^{-1} (U) ] = h(p_0) = \int_0^{p_0} \hat{F}^{-1} (u) \mathrm{d} u + m (1-p_0) $
. Then define quantile function
$G^{-1}$
as

satisfying
$ \mathbb{E} [G^{-1}(U) ] = h(p_0)= \mathbb{E} [F^{-1} (U)]$
. Since
$ p_0 \leqslant q_1 $
, we have
$ \hat{F}^{-1} (u) \leqslant m $
for
$ 0 \leqslant u \leqslant p_0 $
, and futhermore,
$ \hat{F}^{-1} (u)=\hat{F}^{-1} (u)\wedge m \leqslant F^{-1} (u) $
for all
$0 \lt u \lt p_0 $
. Therefore,
$ G^{-1} (u) = \hat{F}^{-1}(u) \leqslant F^{-1}(u) $
for
$ 0 \leqslant u \leqslant p_0 $
, and
$G^{-1} (u) = m \geqslant F^{-1}(u) $
for
$ p_0 \lt u \leqslant 1 $
. That is, the function
$ G^{-1} $
up-crosses the function
$ F^{-1}$
. Together with
$ \mathbb{E} [G^{-1}(U) ] = \mathbb{E} [F^{-1} (U) ]$
, by Lemma 3 of Ohlin (Reference Ohlin1969),
$ G^{-1} (U) $
is larger than
$ F^{-1}(U) $
in the sense of convex order. Since
$ \rho$
is a coherent distortion risk measure which preserves the convex order, we have
$ \rho (X^F ) = \rho ( F^{-1} (U) ) \leqslant \rho ( G^{-1} (U) ) = \rho (X^G)$
. Also note that, by the definition of
$q_0^1$
in (3.12), we have
$ W_1 (G^{-1}, \, \hat{F}^{-1}\wedge m) = \int_{p_0}^{1} \big \vert m - \hat{F}^{-1} (u) \wedge m \big \vert \mathrm{d} u \leqslant \varepsilon$
, that is,
$G\in \mathcal A_2$
. Since
$ F^{-1}$
is arbitrarily taken from
$\mathcal A_2$
, we conclude that
$\sup \left \{ \rho(X^F) \,:\, F \in \mathcal A_2 \right \} = \sup \left \{ \rho(X^G) \,:\, G \in \mathcal A_3 \right \} ,$
where

Thus, Lemma 3.2 further implies
$ \sup_{F \in {\mathcal S }(\hat{F};\, 1, \varepsilon)} \rho(X^F \wedge m) = \sup \left \{ \rho(X^G ) \,:\, G \in \mathcal A_3 \right \}.$
The set
$\mathcal A_3 $
can be viewed as a set indexed by a single parameter
$ p \in [ q_0^1, q_1 ]$
. As
$p$
increases, the associated quantile function
$G = \hat{F}^{-1} \mathbb{I}_{ [0, p] } + m\mathbb{I}_{( p, 1 ]} $
decreases in the sense of the first-order stochastic dominance (FSD). Since
$ \rho $
preserves FSD,
$ \rho (X^G) $
decreases as
$p$
increases. As a consequence, we conclude that

which is the largest quantile in
$\mathcal A_3$
in the sense of FSD, maximizes the integral
$ \int_0^1 \gamma (u) G^{-1} (u) \mathrm{d} u$
on
$\mathcal A_3$
. It is easy to see that
$F^* $
defined in (3.22) satisfies
$(G^*)^{-1} (u) = (F^*)^{-1} (u) \wedge m $
for
$ 0 \lt u \lt 1 $
and
$ F^* \in \mathcal S$
. Therefore,
$ \rho ( X^{F^*} \wedge m ) = \sup_{F \in {\mathcal S }(\hat{F};\, 1, \varepsilon)} \rho(X^F \wedge m)$
, that is,
$ F^*$
is the worst-case distribution to the problem (3.11) with
$k=1$
. Equation (3.23) can be verified directly by calculating
$ \rho ( X^{F^*} \wedge m ) $
with (3.11).
Proof of Theorem 3.4: Based on (3.14) and the well-known minimax theorems from Sion (Reference Sion1958), along with arguments similar to those used in the proof of Theorem 3.1 in Cai et al. (Reference Cai, Liu and Yin2024), we first obtain


Define

Since
$\gamma $
is non-negative, it is easy to see
$\gamma(u) = 0 $
for
$0 \leqslant u \leqslant \beta_0$
. For
$\beta \leqslant \beta_0 $
, we have
$ \gamma_{2,\beta } =0 $
which implies
$ \max_{F^{-1} \in \mathcal Q( \hat{F}^{-1};\, 2, \varepsilon)} L(\beta,F^{-1}) = 0$
.
In the following, we focus on the case when
$\beta_0 \lt \beta \lt 1 $
. It is easy to see that
$\gamma_{2, \beta } $
is not the constant zero with
$\Vert\gamma_{2,\beta} \Vert_1 = \int_0^\beta \gamma(u) \mathrm{d} u \gt 0 $
being well defined, and therefore, we can further define

which is a non-decreasing function with
$ g_{2,\beta}(0) =0$
and
$ g_{2,\beta}(1) =1$
. We can use
$g_{2,\beta}$
as a distortion function to induce a distortion risk measure
$\rho_{2,\beta }$
. It should be pointed out that
$\rho_{2,\beta }$
may not be coherent since
$g_{2,\beta}$
may not be a concave distortion function. For
$\beta \in (\beta_0,1)$
,

For simplicity, define
$m_\beta$
as
$ m_\beta = \max_{F \in \mathcal Q( \hat{F}^{-1};\, 2, \varepsilon)} \rho_{2,\beta} (F^{-1} (U)) $
for a given
$\beta_0 \lt \beta \lt 1 $
. Let
$\mathcal M $
be the set of qunatile functions with finite second moments. From Theorem 1 of Pesenti (Reference Pesenti2022), the problem

has a unique solution given by
$(F_\beta^*)^{-1} = \big (\hat{F}^{-1} + \lambda_\beta \, \gamma_{2,\beta}\big )^\uparrow$
with
$\lambda_\beta \geqslant 0 $
such that
$ \rho_{2,\beta } ((F_\beta^*)^{-1} (U)) = m_\beta $
. Next, we are going to show
$ W_2 ((F_\beta^*)^{-1}, \hat{F}^{-1}) = \varepsilon$
.
-
(i) Suppose
$ W_2 ((F_\beta^*)^{-1},\hat{F}^{-1}) \gt \varepsilon$ . Then there exists
$\delta $ such that
$ 0 \lt \delta \lt W_2 ((F_\beta^*)^{-1},\hat{F}^{-1}) - \varepsilon $ . Take a sequence
$\{ F_n^{-1} , n=1,2,...\} \subset \mathcal Q ( \hat{F}^{-1};\, 2, \varepsilon) $ such that
$M_\beta = \lim_{n\rightarrow \infty } \rho_{2, \beta } ( F_n^{-1} (U) ) $ . For any
$n=1,2,...$ ,
$ W_2 (F_n^{-1} + \delta ,\hat{F}^{-1}) \leqslant \delta+ W_2 (F_n^{-1} ,\hat{F}^{-1}) \leqslant \delta + \varepsilon \lt W_2 ((F_\beta^*)^{-1},\hat{F}^{-1}) $ . Meanwhile,
$ \rho_{2, \beta } ( F_n^{-1} (U) + \delta ) = \rho_{2, \beta } ( F_n^{-1} (U) )+ \delta \rightarrow m_\beta + \delta$ as
$n \rightarrow \infty$ . There exists
$N$ such that
$ \rho_{2, \beta } ( F_n^{-1} (U) + \delta ) \gt m_\beta$ for all
$n \geqslant N $ . Meanwhile, we know
$W_2 (F_n^{-1} + \delta ,\hat{F}^{-1}) \lt W_2 ((F_\beta^*)^{-1},\hat{F}^{-1}) $ for all
$n\geqslant N$ . In particular, we take
$N$ and the problem
\begin{align*}\min_{F \in \mathcal{M} } W_2 (F^{-1}, \hat{F}^{-1}) \,\,\,\text{ s.t. } \rho_{2,\beta } (X^F) = \rho_{2, \beta } ( F_N^{-1} (U) + \delta )\end{align*}
$\tilde{F}^{-1} = \big (\hat{F}^{-1} + \tilde{\lambda} \, \gamma_{2,\beta}\big )^\uparrow$ with
$\tilde{\lambda} \geqslant 0 $ such that
$ \rho_{2,\beta } (\tilde{F}^{-1} (U)) = \rho_{2, \beta } ( F_N^{-1} (U) + \delta ) $ . Since
$ \rho_{2,\beta } (\tilde{F}^{-1} (U)) = \rho_{2, \beta } ( F_N^{-1} (U) + \delta )\gt m_\beta = \rho_{2,\beta } ((F_\beta^*)^{-1} (U)) $ , we have
$ \lambda_\beta \lt \tilde{\lambda}$ . It implies
$ W_2 ((F_\beta^*)^{-1},\hat{F}^{-1}) \lt W_2 ( \tilde{F}^{-1},\hat{F}^{-1}) \leqslant W_2 (F_N^{-1} + \delta ,\hat{F}^{-1})$ , which contradicts with the fact
$W_2 (F_n^{-1} + \delta ,\hat{F}^{-1}) \lt W_2 ((F_\beta^*)^{-1},\hat{F}^{-1}) $ for all
$n\geqslant N$ .
-
(ii) If
$W_2 ((F_\beta^*)^{-1},\hat{F}^{-1}) \lt \varepsilon$ , then take
$\delta$ such that
$ 0\lt \delta \lt \varepsilon-W_2 ((F_\beta^*)^{-1},\hat{F}^{-1}) $ . It is easy to see that
\begin{align*} W_2 ((F_\beta^*)^{-1} + \delta,\hat{F}^{-1}) \leqslant W_2 ((F_\beta^*)^{-1} + \delta,(F_\beta^*)^{-1} ) +W_2 ((F_\beta^*)^{-1} ,\hat{F}^{-1}) \lt \varepsilon ,\end{align*}
$ (F_\beta^*)^{-1} + \delta \in \mathcal Q ( \hat{F}^{-1};\, 2, \varepsilon)$ . Meanwhile, we see that
$\rho_{2,\beta} ((F^*_\beta)^{-1} (U) + \delta ) = \rho_{2,\beta} ((F^*_\beta)^{-1} (U) + \delta \gt \rho_{2,\beta} ((F^*_\beta)^{-1} (U) ) = m_\beta,$ which contradicts with the definition that
$m_\beta = \max_{F^{-1} \in \mathcal Q} \rho_{2,\beta} (F^{-1} (U)) $ .
Therefore,
$ (F_\beta^*)^{-1}$
satisfies
$ W_2 ((F_\beta^*)^{-1}, \hat{F}^{-1}) = \varepsilon$
and
$\rho_{\beta} (X^{F_\beta^*}) = m_\beta$
. Consequently, for any
$\beta\in (\beta_0,1)$
,

where

with
$\lambda_\beta\geqslant 0 $
satisfying
$ W_2 ((F_\beta^*)^{-1}, \hat{F}^{-1}) = \varepsilon$
. Furthermore, we have

where
$ \beta^* \in \arg \min_{\beta\in[0,1]} \big \{ \Vert\gamma_{2,\beta} \Vert_1 \cdot \rho_{2,\beta} (X^{F_\beta^*}-m)\big \}.$
From its projection isotonic representation given in (A9) and arguments similar to those used in the proof of Proposition 1 of Bernard et al. (Reference Bernard, Pesenti and Vanduffel2024), the quantile function
$F_{\beta^*}^* $
can be expressed as

for some
$\lambda_{\beta^*} \geqslant 0 $
,
$ \theta_{\beta^*} \leqslant \beta^* \leqslant p \leqslant 1 $
and constant
$c$
. Since
$ \theta_{\beta^*} \leqslant \beta^* $
,
$ \gamma_{2,\beta^*} (u) = \gamma (u)$
for
$0 \lt u \leqslant \theta_{\beta^*}$
. We can also write
$ (F_{\beta^*}^*)^{-1}(u) =\hat{F}^{-1} (u)+ \lambda_{\beta^*} \, \gamma (u) $
for
$0 \lt u\leqslant \theta_{\beta^*}$
.
We first claim that
$ \hat{F}(c{-}) \leqslant p \leqslant \hat{F}(c)$
. Suppose
$ \hat{F}(c{-}) \gt p$
, then there exists small
$\delta \gt 0$
such that
$ (F_{\beta^*}^*)^{-1}(u) = \hat{F}^{-1}(u) \lt c = (F_{\beta^*}^*)^{-1}({\kern1pt}p) $
for
$ u \in (p, \, p+\delta ) $
. It implies that
$(F_{\beta^*}^*)^{-1}$
is not non-decreasing, which contradicts with the fact that
$ (F_{\beta^*}^*)^{-1}$
is a quantile function. Suppose
$ p \gt \hat{F}(c)$
, then there exists small
$\delta \gt 0$
such that
$ (F_{\beta^*}^*)^{-1}(u) = c \lt \hat{F}^{-1}(u) $
for
$ u \in (p-\delta, \, p ) $
. Since
$\hat{F}^{-1} ( u ) + \lambda_{\beta^*} \, \gamma_{2,\beta^*}(u) \geqslant \hat{F}^{-1} ( u ) $
for al
$u \in (0,1)$
, we can strictly decrease the
$L^2$
-norm between
$\hat{F}^{-1} + \lambda_{\beta^*} \, \gamma_{2,\beta^*}$
and
$ (F_{\beta^*}^*)^{-1}$
in (A10) by taking
$ (F_{\beta^*}^*)^{-1} (u) = \hat{F}^{-1} (u)$
for
$u \in (p-\delta,p ) $
. This is a contradiction with
$ (F_{\beta^*}^*)^{-1}$
in (A10) is the isotonic projection of
$\hat{F}^{-1} + \lambda_{\beta^*} \, \gamma_{2,\beta^*}$
. Therefore,
$ \hat{F}(c{-}) \leqslant p \leqslant \hat{F}(c)$
. In particular, we can take
$ p = \hat{F}(c)$
.
Second, we verify that
$c=m$
. Indeed, if
$c \gt m $
, we can take
$ G^{-1} = \min\{ (F_{\beta^*}^*)^{-1}, \, \max\{m , \hat{F}^{-1} \}\} $
. Then,
$G^{-1} \wedge m = (F_{\beta^*}^*)^{-1} \wedge m $
and thus
$\rho (G^{-1} (U)\wedge m ) = \rho ( (F_{\beta^*}^*)^{-1}(U) \wedge m ) $
. Meanwhile,
$\hat{F}^{-1}(u) \leqslant G^{-1} (u) \leqslant (F_{\beta^*}^*)^{-1} (u) $
for all
$0 \lt u \lt 1 $
with
$G^{-1} (u) \lt (F_{\beta^*}^*)^{-1} (u) $
for some
$ u\in (\theta,p] $
. Therefore,
$ G^{-1} (U) $
is strictly smaller than
$ (F_{\beta^*}^*)^{-1} (U) $
in the sense of FSD, and
$ W_2 ( G^{-1}, \hat{F}^{-1}) \lt W_2 ((F_{\beta^*}^*)^{-1} , \hat{F}^{-1}) \leqslant \varepsilon$
. Take
$ 0 \lt \delta \lt W_2 ((F_{\beta^*}^*)^{-1} , \hat{F}^{-1}) -W_2 ( G^{-1}, \hat{F}^{-1}) $
and construct
$ \tilde{G}^{-1} = G^{-1} + \delta$
. Then,
$ W_2 ( \tilde{G}^{-1}, \hat{F}^{-1} ) \leqslant W_2 (\tilde{G}^{-1}, G^{-1} ) + W_2 (G^{-1} ,\hat{F}^{-1} ) \leqslant \varepsilon$
, that is,
$ \tilde{G}^{-1} \in \mathcal Q ( \hat{F}^{-1};\, 2, \varepsilon)$
. Since it is assumed
$q_0^2 \gt 0 $
,
$ (F_{\beta^*}^*)^{-1}(u )\lt m $
for some
$u$
, and so does
$G^{-1}$
. Then,
$\rho ( \tilde{G}^{-1}(U) \wedge m ) \gt \rho ( G^{-1} (U)\wedge m ) = \rho ( (F_{\beta^*}^*)^{-1}(U) \wedge m ) $
, which contradicts with the optimality of
$ F_{\beta^*}^*$
. On the other hand, if
$c \lt m $
, then we have
$ \beta^* \leqslant p \lt q_1 \,:\!=\, \hat{F} (m)$
because
$ (F_{\beta^*}^*)^{-1} \geqslant \hat{F}^{-1}$
. Since
$(F_{\beta^*}^*)^{-1} (u) = \hat{F}^{-1} (u) $
for
$ u \gt p $
, we also have
$ q_1 = \hat{F} (m)=F_{\beta^*}^*(m) $
. Then,
$ \rho ( (F_{\beta^*}^*)^{-1} \wedge m ) = m + L( q_1 , (F_{\beta^*}^*)^{-1} ) \gt m + L( \beta^* , (F_{\beta^*}^*)^{-1} ) = \sup_{F \in \mathcal S } \rho (X^F\wedge m) $
, which is a contradiction. Therefore, we proof that
$ c = m$
, and furthermore, we can take
$ p = \hat{F}(m) = p_1$
. In conclusion, we characterize the optimal quantile function as given in (3.24) by taking
$\lambda^*= \lambda_{\beta^*}$
and
$\theta^* = \theta_{\beta^*}$
.