1 Introduction
Machine learning has recently become one of the hottest topics, with many real-world applications transforming society. Since the Dartmouth Conference in 1956, there have been efforts to develop a deeper theoretical understanding of learning. Several frameworks, such as PAC learning and Gold-style limit learning, have been proposed to define learning, explain it, and explore its capabilities and limits.
This article explores the theoretical limits of learning based on Solomonoff’s universal induction or algorithmic probability theory.
We consider the following problem. We predict the next bit in an infinite binary sequence. We know the infinite binary sequence is sampled from the Cantor space with an unknown computable probability measure.
In the standard setting of the theory of universal induction, the measure used for prediction is c.e., that is, it is computably approximable from below but not computable in general. The reason for considering this broader class of measures than that of computable measures is that there exists an optimal prediction for c.e., while no computable prediction is optimal. The theory of universal induction concerns the properties of optimal predictions. This theory is elegant from a theoretical standpoint and has succeeded in deepening our understanding of learning. However, optimal predictions cannot be implemented directly in a computer, and its claims about machine learning algorithms used in practice are quite limited.
Even though there is no optimal computable prediction, can we prove any sufficiently good one that approximates the optimal one has specific properties? This article gives a positive answer to this question by introducing the concept of generality.
We call a measure more general than another measure if it dominates the other. We then prove the prediction induced from a more general measure performs well for sample points of more computable measures. In other words, a more general prediction can solve more tasks. More precisely, the prediction induced from a more general measure has smaller error sums when measured by KL divergence (Theorem 3.2).
Furthermore, if we fix a computable measure to take samples, the error sum of sufficiently general predictions is always a finite Martin-Löf random real (Theorem 4.1). This means the errors converge to zero more slowly than any monotone computable function. A sufficiently general prediction cannot converge quickly, and its convergence rate is uniquely determined up to a multiplicative constant (Theorem 4.2). While simple intuition suggests that good predictions should have small errors, general-purpose algorithms that can solve many tasks will converge slower than specialized algorithms.
 As special cases, we analyse the convergence speed using the 
 $L^p$
-norm when the model measure
$L^p$
-norm when the model measure 
 $\mu $
 is either a Dirac measure (Proposition 4.9) or a separated measure (Proposition 4.16).
$\mu $
 is either a Dirac measure (Proposition 4.9) or a separated measure (Proposition 4.16).
This article is a sequel to [Reference Miyabe, Hammer, Agrawal, Goertzel and Iklé17]. While the notion of generality has already been defined in [Reference Miyabe, Hammer, Agrawal, Goertzel and Iklé17], we consider this notion more carefully in this article. In particular, we give a necessary and sufficient condition of domination in Theorem 3.2. Theorem 4.1 strengthens [Reference Miyabe, Hammer, Agrawal, Goertzel and Iklé17, Theorem 3.1] and Proposition 4.13 strengthens [Reference Miyabe, Hammer, Agrawal, Goertzel and Iklé17, Theorems 4.3 and 4.4].
2 Preliminaries
In this section, we fix the notation and review notions from some theories.
2.1 Notations
 The sets of all positive integers, rational numbers, and reals are denoted by 
 $\mathbb {N}=\{1,2,3,\dots \}$
,
$\mathbb {N}=\{1,2,3,\dots \}$
, 
 $\mathbb {Q}$
, and
$\mathbb {Q}$
, and 
 $\mathbb {R}$
, respectively.
$\mathbb {R}$
, respectively.
 The set of all finite binary strings is denoted by 
 $\{0,1\}^*$
. We denote finite binary strings using
$\{0,1\}^*$
. We denote finite binary strings using 
 $\sigma $
 and
$\sigma $
 and 
 $\tau $
. The length of a string
$\tau $
. The length of a string 
 $\sigma $
 is denoted by
$\sigma $
 is denoted by 
 $|\sigma |$
. For
$|\sigma |$
. For 
 $\sigma ,\tau \in \{0,1\}^*$
, the concatenation of
$\sigma ,\tau \in \{0,1\}^*$
, the concatenation of 
 $\sigma $
 and
$\sigma $
 and 
 $\tau $
 is denoted by
$\tau $
 is denoted by 
 $\sigma \tau $
.
$\sigma \tau $
.
 The set of all infinite binary sequences is denoted by 
 $\{0,1\}^{\mathbb {N}}$
. We use
$\{0,1\}^{\mathbb {N}}$
. We use 
 $X, Y, Z$
 to denote infinite binary sequences. We write
$X, Y, Z$
 to denote infinite binary sequences. We write 
 $X=X_1 X_2 X_3\dots $
 and let
$X=X_1 X_2 X_3\dots $
 and let 
 $X_{<n}=X_1 X_2 \dots X_{n-1}$
 and
$X_{<n}=X_1 X_2 \dots X_{n-1}$
 and 
 $X_{\le n}=X_1 X_2 \dots X_n$
 for
$X_{\le n}=X_1 X_2 \dots X_n$
 for 
 $n\in \mathbb {N}$
.
$n\in \mathbb {N}$
.
 The Cantor space, also denoted by 
 $\{0,1\}^{\mathbb {N}}$
, is the space of all infinite binary sequences equipped with the topology generated from the cylinder sets
$\{0,1\}^{\mathbb {N}}$
, is the space of all infinite binary sequences equipped with the topology generated from the cylinder sets 
 $[\sigma ]=\{X\in \{0,1\}^{\mathbb {N}}\ :\ \sigma \prec X\}$
 for
$[\sigma ]=\{X\in \{0,1\}^{\mathbb {N}}\ :\ \sigma \prec X\}$
 for 
 $\sigma \in \{0,1\}^*$
 where
$\sigma \in \{0,1\}^*$
 where 
 $\prec $
 is the prefix relation.
$\prec $
 is the prefix relation.
2.2 Computability theory
We follow the standard notation and terminology in computability theory and computable analysis. For details, see, for instance, [Reference Brattka, Hertling, Weihrauch, Cooper, Löwe and Sorbi6, Reference Soare23, Reference Weihrauch28].
 A partial function 
 $f:\subseteq \{0,1\}^*\to \{0,1\}^*$
 is a partial computable function if it can be computed using a Turing machine. A real
$f:\subseteq \{0,1\}^*\to \{0,1\}^*$
 is a partial computable function if it can be computed using a Turing machine. A real 
 $x\in \mathbb {R}$
 is called computable if there exists a computable sequence
$x\in \mathbb {R}$
 is called computable if there exists a computable sequence 
 $(q_n)_{n\in \mathbb {N}}$
 of rationals such that
$(q_n)_{n\in \mathbb {N}}$
 of rationals such that 
 $|x-q_n|<2^{-n}$
 for all n. A real
$|x-q_n|<2^{-n}$
 for all n. A real 
 $x\in \mathbb {R}$
 is called left-c.e. if there exists an increasing computable sequence
$x\in \mathbb {R}$
 is called left-c.e. if there exists an increasing computable sequence 
 $(q_n)_n$
 converging to x. A real
$(q_n)_n$
 converging to x. A real 
 $x\in \mathbb {R}$
 is called right-c.e. if
$x\in \mathbb {R}$
 is called right-c.e. if 
 $-x$
 is left-c.e.
$-x$
 is left-c.e.
 A function 
 $f:\{0,1\}^*\to \mathbb {R}$
 is called computable if
$f:\{0,1\}^*\to \mathbb {R}$
 is called computable if 
 $f(\sigma )$
 is uniformly computable in
$f(\sigma )$
 is uniformly computable in 
 $\sigma \in \{0,1\}^*$
. A (probabilistic) measure
$\sigma \in \{0,1\}^*$
. A (probabilistic) measure 
 $\mu $
 on
$\mu $
 on 
 $\{0,1\}^{\mathbb {N}}$
 is computable if the function
$\{0,1\}^{\mathbb {N}}$
 is computable if the function 
 $\sigma \mapsto \mu ([\sigma ])=:\mu (\sigma )$
 is computable. For details on computable measure theory, see, for instance, [Reference Bienvenu, Gács, Hoyrup, Rojas and Shen3, Reference Weihrauch27, Reference Weihrauch29].
$\sigma \mapsto \mu ([\sigma ])=:\mu (\sigma )$
 is computable. For details on computable measure theory, see, for instance, [Reference Bienvenu, Gács, Hoyrup, Rojas and Shen3, Reference Weihrauch27, Reference Weihrauch29].
2.3 Theory of inductive inference
Now, we review the theory of inductive inference initiated by Solomonoff. The primary references for this are [Reference Hutter13, Reference Li and Vitányi15]. For a more philosophical discussion, see [Reference Rathmanner and Hutter20].
 We use 
 $\mu $
 to denote a computable measure on the Cantor space
$\mu $
 to denote a computable measure on the Cantor space 
 $\{0,1\}^{\mathbb {N}}$
. This
$\{0,1\}^{\mathbb {N}}$
. This 
 $\mu $
 represents an unknown model. We call this measure
$\mu $
 represents an unknown model. We call this measure 
 $\mu $
 a model measure.
$\mu $
 a model measure.
 Suppose an infinite binary sequence is sampled from the Cantor space with this 
 $\mu $
. When given the first
$\mu $
. When given the first 
 $n-1$
 bits
$n-1$
 bits 
 $X_{<n}$
 of X, the next bit follows the conditional model measure on
$X_{<n}$
 of X, the next bit follows the conditional model measure on 
 $\{0,1\}$
 represented by
$\{0,1\}$
 represented by 
 $$ \begin{align} k\mapsto\mu(k|X_{<n})=\frac{\mu(X_{<n}k)}{\mu(X_{<n})}. \end{align} $$
$$ \begin{align} k\mapsto\mu(k|X_{<n})=\frac{\mu(X_{<n}k)}{\mu(X_{<n})}. \end{align} $$
Our ultimate goal is to construct a computable measure 
 $\xi $
 such that the prediction
$\xi $
 such that the prediction 
 $\xi (\cdot |X_{<n})$
 is close to
$\xi (\cdot |X_{<n})$
 is close to 
 $\mu (\cdot |X_{<n})$
. We call this measure
$\mu (\cdot |X_{<n})$
. We call this measure 
 $\xi $
 a prediction measure and call the measure
$\xi $
 a prediction measure and call the measure 
 $\xi (\cdot |\cdot )$
 a conditional prediction.
$\xi (\cdot |\cdot )$
 a conditional prediction.
 Solomonoff’s celebrated result states that every optimal prediction behaves rather well. A semi-measure is a function 
 $\xi :\{0,1\}^*\to [0,1]$
 such that
$\xi :\{0,1\}^*\to [0,1]$
 such that 
 $\xi (\epsilon )\le 1$
 and
$\xi (\epsilon )\le 1$
 and 
 $\xi (\sigma )\ge \xi (\sigma 0)+\xi (\sigma 1)$
 for every
$\xi (\sigma )\ge \xi (\sigma 0)+\xi (\sigma 1)$
 for every 
 $\sigma \in \{0,1\}^*$
 where
$\sigma \in \{0,1\}^*$
 where 
 $\epsilon $
 is the empty string. A function
$\epsilon $
 is the empty string. A function 
 ${f:\{0,1\}^*\to \mathbb {R}}$
 is called c.e. or lower semi-computable if
${f:\{0,1\}^*\to \mathbb {R}}$
 is called c.e. or lower semi-computable if 
 $f(\sigma )$
 is left-c.e. uniformly in
$f(\sigma )$
 is left-c.e. uniformly in 
 $\sigma \in \{0,1\}^*$
.
$\sigma \in \{0,1\}^*$
.
 Let 
 $\mu ,\xi $
 be semi-measures on
$\mu ,\xi $
 be semi-measures on 
 $\{0,1\}^{\mathbb {N}}$
. We say that
$\{0,1\}^{\mathbb {N}}$
. We say that 
 $\xi $
 (multiplicatively) dominates
$\xi $
 (multiplicatively) dominates 
 $\mu $
 if, there exists
$\mu $
 if, there exists 
 $c\in \mathbb {N}$
 such that
$c\in \mathbb {N}$
 such that 
 $\mu (\sigma )\le c\cdot \xi (\sigma )$
 for all
$\mu (\sigma )\le c\cdot \xi (\sigma )$
 for all 
 $\sigma \in \{0,1\}^*$
. A c.e. semi-measure
$\sigma \in \{0,1\}^*$
. A c.e. semi-measure 
 $\xi $
 is called optimal if
$\xi $
 is called optimal if 
 $\xi $
 dominates every c.e. semi-measure. An optimal c.e. semi-measure exists while no computable measure is optimal. The conditional prediction
$\xi $
 dominates every c.e. semi-measure. An optimal c.e. semi-measure exists while no computable measure is optimal. The conditional prediction 
 $\xi (\cdot |\cdot )$
 induced by this optimal c.e. semi-measure is sometimes called algorithmic probability.
$\xi (\cdot |\cdot )$
 induced by this optimal c.e. semi-measure is sometimes called algorithmic probability.
Theorem 2.1 [Reference Solomonoff24], see also [Reference Hutter13, Theorem 3.19].
 Let 
 $\mu $
 be a computable measure on
$\mu $
 be a computable measure on 
 $\{0,1\}^{\mathbb {N}}$
. Let
$\{0,1\}^{\mathbb {N}}$
. Let 
 $\xi $
 be an optimal c.e. semi-measure. Then, for both
$\xi $
 be an optimal c.e. semi-measure. Then, for both 
 $k\in \{0,1\}$
 we have
$k\in \{0,1\}$
 we have 
 $$\begin{align*}\xi(k|X_{<n})-\mu(k|X_{<n})\to0\end{align*}$$
$$\begin{align*}\xi(k|X_{<n})-\mu(k|X_{<n})\to0\end{align*}$$
as 
 $n\to \infty $
 almost surely when X follows
$n\to \infty $
 almost surely when X follows 
 $\mu $
.
$\mu $
.
 The prediction semi-measure 
 $\xi $
 is arbitrary and lacks information about the model measures
$\xi $
 is arbitrary and lacks information about the model measures 
 $\mu $
. The prediction by
$\mu $
. The prediction by 
 $\xi $
 investigates
$\xi $
 investigates 
 $X_{<n}$
, which contains some information of
$X_{<n}$
, which contains some information of 
 $\mu $
, and predicts the next bit
$\mu $
, and predicts the next bit 
 $X(n)$
. The theorem above states that the conditional predictions
$X(n)$
. The theorem above states that the conditional predictions 
 $\xi (\cdot |X_{<n})$
 are getting close to the true conditional model measures
$\xi (\cdot |X_{<n})$
 are getting close to the true conditional model measures 
 $\mu (\cdot |X_{<n})$
 almost surely.
$\mu (\cdot |X_{<n})$
 almost surely.
The rate of the convergence has been briefly discussed in [Reference Hutter and Muchnik14] but has yet to be established.
3 Generality
In this section, we introduce the concept of generality. Generality is a tool for comparing the well-behavedness of two measures. Just as optimality is defined by domination, generality is defined by domination. We expect that when one measure dominates another measure, the induced prediction also behaves better than the other. The question here is: what does it mean for one prediction to behave better than another? We answer this question by considering the sum of the prediction errors.
3.1 Definition of generality
 Let 
 $\nu ,\xi $
 be two measures on
$\nu ,\xi $
 be two measures on 
 $\{0,1\}^{\mathbb {N}}$
. We say that
$\{0,1\}^{\mathbb {N}}$
. We say that 
 $\xi $
 is more general than
$\xi $
 is more general than 
 $\nu $
 if
$\nu $
 if 
 $\xi $
 dominates
$\xi $
 dominates 
 $\nu $
; that is, there exists
$\nu $
; that is, there exists 
 $c\in \mathbb {N}$
 such that
$c\in \mathbb {N}$
 such that 
 $\nu (\sigma )\le c\cdot \xi (\sigma )$
 for all
$\nu (\sigma )\le c\cdot \xi (\sigma )$
 for all 
 $\sigma \in \{0,1\}^*$
.
$\sigma \in \{0,1\}^*$
.
 The intuition is as follows. We are sequentially given a sequence 
 $X\in \{0,1\}^{\mathbb {N}}$
. The sequence
$X\in \{0,1\}^{\mathbb {N}}$
. The sequence 
 $X\in \{0,1\}^{\mathbb {N}}$
 may be a binary expansion of e or
$X\in \{0,1\}^{\mathbb {N}}$
 may be a binary expansion of e or 
 $\pi $
, or a random sequence of
$\pi $
, or a random sequence of 
 $P(X_n=0)=P(X_n=1)=\frac {1}{2}$
 independently. The task is to find such regularity and make a good prediction. The regularity is expressed as (or identified with) the measure
$P(X_n=0)=P(X_n=1)=\frac {1}{2}$
 independently. The task is to find such regularity and make a good prediction. The regularity is expressed as (or identified with) the measure 
 $\mu $
 such that X is random with respect to
$\mu $
 such that X is random with respect to 
 $\mu $
. The measure is a Dirac computable measure in the deterministic case, such as e or
$\mu $
. The measure is a Dirac computable measure in the deterministic case, such as e or 
 $\pi $
. In general, the measure need not be deterministic; it can be an arbitrary computable measure.
$\pi $
. In general, the measure need not be deterministic; it can be an arbitrary computable measure.
 Essentially, a prediction 
 $\xi $
 is more general than another prediction
$\xi $
 is more general than another prediction 
 $\nu $
 if the prediction
$\nu $
 if the prediction 
 $\xi $
 behaves well for
$\xi $
 behaves well for 
 $\mu $
 such that
$\mu $
 such that 
 $\nu $
 behaves well for
$\nu $
 behaves well for 
 $\mu $
. Thus,
$\mu $
. Thus, 
 $\xi $
 performs better for a larger class of
$\xi $
 performs better for a larger class of 
 $\mu $
 than
$\mu $
 than 
 $\nu $
. As we will see in Theorem 3.2, this relation is formalized by domination. This is the reason for using the terminology ‘general’ for domination.
$\nu $
. As we will see in Theorem 3.2, this relation is formalized by domination. This is the reason for using the terminology ‘general’ for domination.
 We are interested in the property of sufficiently general computable predictions. We often say that a property P holds for all sufficiently large natural numbers if there exists N such that 
 $P(n)$
 holds for all natural numbers
$P(n)$
 holds for all natural numbers 
 $n\ge N$
. As an analogy, we say that a property P holds for all sufficiently general computable prediction measures if there exists a computable prediction measure
$n\ge N$
. As an analogy, we say that a property P holds for all sufficiently general computable prediction measures if there exists a computable prediction measure 
 $\nu $
 such that the property
$\nu $
 such that the property 
 $P(\xi )$
 holds for all computable prediction measure
$P(\xi )$
 holds for all computable prediction measure 
 $\xi $
 dominating
$\xi $
 dominating 
 $\nu $
. The author came up with the idea inspired by the study of Solovay functions, such as [Reference Bienvenu, Downey, Nies and Merkle2]. In particular, the computational complexity of computing such functions may be very low [Reference Hölzl, Kräling and Merkle12, Theorem 2].
$\nu $
. The author came up with the idea inspired by the study of Solovay functions, such as [Reference Bienvenu, Downey, Nies and Merkle2]. In particular, the computational complexity of computing such functions may be very low [Reference Hölzl, Kräling and Merkle12, Theorem 2].
In the inductive inference theory, we discuss the properties of an optimal c.e. semi-measure and its induced prediction. Similarly, we will see some properties of a sufficiently general computable measure and its induced prediction.
3.2 Domination and convergence
We claim that domination means better behavior by giving a necessary and sufficient condition for the convergence of the sum of the prediction errors. Here, the error is measured by Kullback–Leibler divergence.
The Kullback–Liebler divergence is the primary tool for discussing the convergence of the predictions. For details, see any standard text on information theory, such as [Reference Cover and Thomas7].
 Let 
 $\mu ,\xi $
 be measures on the discrete space
$\mu ,\xi $
 be measures on the discrete space 
 $\{0,1\}$
. The KL divergence of
$\{0,1\}$
. The KL divergence of 
 $\mu $
 with respect to
$\mu $
 with respect to 
 $\xi $
 is defined by
$\xi $
 is defined by 
 $$\begin{align*}d(\mu||\xi)=\sum_{k\in\{0,1\}}\mu(k)\ln\frac{\mu(k)}{\xi(k)},\end{align*}$$
$$\begin{align*}d(\mu||\xi)=\sum_{k\in\{0,1\}}\mu(k)\ln\frac{\mu(k)}{\xi(k)},\end{align*}$$
where 
 $0\cdot \log \frac {0}{z}=0$
 for
$0\cdot \log \frac {0}{z}=0$
 for 
 $z\ge 0$
,
$z\ge 0$
, 
 $y\log \frac {y}{0}=\infty $
 for
$y\log \frac {y}{0}=\infty $
 for 
 $y>0$
, and
$y>0$
, and 
 $\ln $
 is the natural logarithm.
$\ln $
 is the natural logarithm.
 Next, let 
 $\mu ,\xi $
 be measures on the continuous space
$\mu ,\xi $
 be measures on the continuous space 
 $\{0,1\}^{\mathbb {N}}$
. We use the notation:
$\{0,1\}^{\mathbb {N}}$
. We use the notation: 
- 
•  $d_\sigma (\mu ||\xi )=d(\mu (\cdot |\sigma )||\xi (\cdot |\sigma ))$
, $d_\sigma (\mu ||\xi )=d(\mu (\cdot |\sigma )||\xi (\cdot |\sigma ))$
,
- 
•  $D_n(\mu ||\xi )=\sum _{k=1}^n E_\mu [d_{X_{<k}}(\mu ||\xi )]$
, $D_n(\mu ||\xi )=\sum _{k=1}^n E_\mu [d_{X_{<k}}(\mu ||\xi )]$
,
- 
•  $D_\infty (\mu ||\xi )=\lim _{n\to \infty } D_n(\mu ||\xi )$
, $D_\infty (\mu ||\xi )=\lim _{n\to \infty } D_n(\mu ||\xi )$
,
where 
 $\mu (\cdot |\sigma ),\xi (\cdot |\sigma )$
 are the measures on
$\mu (\cdot |\sigma ),\xi (\cdot |\sigma )$
 are the measures on 
 $\{0,1\}$
 defined in (1). Thus,
$\{0,1\}$
 defined in (1). Thus, 
 $d_\sigma (\mu ||\xi )$
 is the prediction error conditioning on
$d_\sigma (\mu ||\xi )$
 is the prediction error conditioning on 
 $\sigma $
,
$\sigma $
, 
 $D_n(\mu ||\xi )$
 is the expected sum of the prediction errors until the nth round when X follows
$D_n(\mu ||\xi )$
 is the expected sum of the prediction errors until the nth round when X follows 
 $\mu $
, and
$\mu $
, and 
 $D_\infty $
 is its limit. Since KL divergence is non-negative,
$D_\infty $
 is its limit. Since KL divergence is non-negative, 
 $D_n$
 is non-decreasing in n. Note that the finiteness of the sum of the prediction errors is a condition stronger than the convergence of the errors to
$D_n$
 is non-decreasing in n. Note that the finiteness of the sum of the prediction errors is a condition stronger than the convergence of the errors to 
 $0$
.
$0$
.
Remark 3.1. The chain rule for KL divergence states that
 $$\begin{align*}D_n(\mu||\xi)=E_\mu[\ln\frac{\mu(X_{\le n})}{\xi(X_{\le n})}].\end{align*}$$
$$\begin{align*}D_n(\mu||\xi)=E_\mu[\ln\frac{\mu(X_{\le n})}{\xi(X_{\le n})}].\end{align*}$$
See such as Hutter [Reference Hutter13, (3.18)] and Cover and Thomas [Reference Cover and Thomas7, Theorem 2.5.3].
Theorem 3.2. For two measures 
 $\xi ,\nu $
 on
$\xi ,\nu $
 on 
 $\{0,1\}^{\mathbb {N}}$
, the following are equivalent.
$\{0,1\}^{\mathbb {N}}$
, the following are equivalent. 
- 
(i)  $\xi $
 dominates $\xi $
 dominates $\nu $
. $\nu $
.
- 
(ii) There exists a constant  $c\in \mathbb {N}$
 such that for every measure $c\in \mathbb {N}$
 such that for every measure $\mu $
 on $\mu $
 on $\{0,1\}^{\mathbb {N}}$
, we have $\{0,1\}^{\mathbb {N}}$
, we have $D_\infty (\mu ||\xi )\le D_\infty (\mu ||\nu )+c$
. $D_\infty (\mu ||\xi )\le D_\infty (\mu ||\nu )+c$
.
 From this, domination means rapid convergence of a larger class of model measures. If 
 $\xi $
 dominates
$\xi $
 dominates 
 $\nu $
 and
$\nu $
 and 
 $\nu $
 behaves well for
$\nu $
 behaves well for 
 $\mu $
 (the error sum is finite), then
$\mu $
 (the error sum is finite), then 
 $\xi $
 also behaves well for
$\xi $
 also behaves well for 
 $\mu $
 (the error sum is finite). Furthermore, the difference of the sums of the errors is, at most, a constant uniformly in
$\mu $
 (the error sum is finite). Furthermore, the difference of the sums of the errors is, at most, a constant uniformly in 
 $\mu $
. Thus, the error sum of
$\mu $
. Thus, the error sum of 
 $\nu $
 is small, so is that of
$\nu $
 is small, so is that of 
 $\xi $
.
$\xi $
.
 Note that KL divergence can be infinity, and the finiteness of KL divergence is an essential aspect in the formulation of Theorem 3.2. Some other distances are discussed in [Reference Hutter13, Section 3.2.5]. One example is the Hellinger distance, which plays a vital role in the proof of Theorem 2.1, but is bounded by 
 $1$
. Thus, KL divergence seems helpful in the formulation.
$1$
. Thus, KL divergence seems helpful in the formulation.
Proof. (i)
 $\Rightarrow $
(ii). Suppose that
$\Rightarrow $
(ii). Suppose that 
 $$ \begin{align} \nu\le c\,\xi \end{align} $$
$$ \begin{align} \nu\le c\,\xi \end{align} $$
for some 
 $c\in \mathbb {N}$
.
$c\in \mathbb {N}$
.
 Suppose that there exists a string 
 $\sigma \in \{0,1\}^*$
 such that
$\sigma \in \{0,1\}^*$
 such that 
 $\mu (\sigma )>0$
 and
$\mu (\sigma )>0$
 and 
 $\nu (\sigma )=0$
. Then, there exist a string
$\nu (\sigma )=0$
. Then, there exist a string 
 $\tau \in \{0,1\}^*$
 and a bit
$\tau \in \{0,1\}^*$
 and a bit 
 $k\in \{0,1\}$
 such that
$k\in \{0,1\}$
 such that 
 ${\mu (\tau 0)>0}$
,
${\mu (\tau 0)>0}$
, 
 $\mu (\tau 1)>0$
,
$\mu (\tau 1)>0$
, 
 $\nu (\tau )>0$
 and
$\nu (\tau )>0$
 and 
 $\nu (\tau k)=0$
. For this
$\nu (\tau k)=0$
. For this 
 $\tau $
, we have
$\tau $
, we have 
 $d_\tau (\mu ||\nu )=\infty $
 and
$d_\tau (\mu ||\nu )=\infty $
 and 
 $D_\infty (\mu ||\nu )=\infty $
. Thus, the condition (ii) holds.
$D_\infty (\mu ||\nu )=\infty $
. Thus, the condition (ii) holds.
Now assume that
 $$ \begin{align} \mu(\sigma)>0 \Rightarrow \nu(\sigma)>0 \end{align} $$
$$ \begin{align} \mu(\sigma)>0 \Rightarrow \nu(\sigma)>0 \end{align} $$
for all 
 $\sigma \in \{0,1\}^*$
. Fix an arbitrary
$\sigma \in \{0,1\}^*$
. Fix an arbitrary 
 $n\in \mathbb {N}$
. For all
$n\in \mathbb {N}$
. For all 
 $\sigma \in \{0,1\}^n$
 such that
$\sigma \in \{0,1\}^n$
 such that 
 $\mu (\sigma )>0$
, we have
$\mu (\sigma )>0$
, we have 
 $$ \begin{align} \ln\frac{\mu(\sigma)}{\xi(\sigma)}\le\ln\frac{\mu(\sigma)}{\nu(\sigma)}+ \ln c \end{align} $$
$$ \begin{align} \ln\frac{\mu(\sigma)}{\xi(\sigma)}\le\ln\frac{\mu(\sigma)}{\nu(\sigma)}+ \ln c \end{align} $$
by (2). Here note that 
 $\xi (\sigma )>0$
 by (3) and (2). By taking the integral of (4) with respect to
$\xi (\sigma )>0$
 by (3) and (2). By taking the integral of (4) with respect to 
 $\mu $
, we have
$\mu $
, we have 
 $$\begin{align*}D_n(\mu||\xi)\le D_n(\mu||\nu)+\ln c\end{align*}$$
$$\begin{align*}D_n(\mu||\xi)\le D_n(\mu||\nu)+\ln c\end{align*}$$
by Remark 3.1. Since both 
 $D_n$
 are non-decreasing, this implies the condition (ii).
$D_n$
 are non-decreasing, this implies the condition (ii).
 (ii)
 $\Rightarrow $
(i). Let
$\Rightarrow $
(i). Let 
 $\sigma \in \{0,1\}^*$
 be an arbitrary string. We construct a measure
$\sigma \in \{0,1\}^*$
 be an arbitrary string. We construct a measure 
 $\mu $
 such that the condition (ii) for this
$\mu $
 such that the condition (ii) for this 
 $\mu $
 implies
$\mu $
 implies 
 $\nu (\sigma )\le e^c\xi (\sigma )$
. We define the measure
$\nu (\sigma )\le e^c\xi (\sigma )$
. We define the measure 
 $\mu $
 by
$\mu $
 by 
 $$\begin{align*}\mu(\tau)= \begin{cases} \nu(\tau)/\nu(\sigma),\ &\ \text{ if }\sigma\preceq\tau,\\ 1,\ &\ \text{ if }\tau\preceq\sigma,\\ 0,\ &\ \text{ otherwise.} \end{cases}\end{align*}$$
$$\begin{align*}\mu(\tau)= \begin{cases} \nu(\tau)/\nu(\sigma),\ &\ \text{ if }\sigma\preceq\tau,\\ 1,\ &\ \text{ if }\tau\preceq\sigma,\\ 0,\ &\ \text{ otherwise.} \end{cases}\end{align*}$$
In other words, 
 $\mu $
 is zero outside the cylinder
$\mu $
 is zero outside the cylinder 
 $[\sigma ]$
 and is proportional to
$[\sigma ]$
 and is proportional to 
 $\nu $
 inside
$\nu $
 inside 
 $[\sigma ]$
. Note that for any string
$[\sigma ]$
. Note that for any string 
 $\rho \in \{0,1\}^*$
 such that
$\rho \in \{0,1\}^*$
 such that 
 $|\rho |=|\sigma |$
, the ratio
$|\rho |=|\sigma |$
, the ratio 
 $\mu (\rho \tau )/\nu (\rho \tau )$
 is constant for all
$\mu (\rho \tau )/\nu (\rho \tau )$
 is constant for all 
 $\tau \in \{0,1\}^*$
. Thus,
$\tau \in \{0,1\}^*$
. Thus, 
 $D_{|\sigma |}(\mu ||\nu )=D_\infty (\mu ||\nu )$
. Hence,
$D_{|\sigma |}(\mu ||\nu )=D_\infty (\mu ||\nu )$
. Hence, 
 $$\begin{align*}c\ge D_\infty(\mu||\xi)-D_\infty(\mu||\nu) \ge D_{|\sigma|}(\mu||\xi)-D_{|\sigma|}(\mu||\nu) =\ln\frac{\mu(\sigma)}{\xi(\sigma)}-\ln\frac{\mu(\sigma)}{\nu(\sigma)}, \end{align*}$$
$$\begin{align*}c\ge D_\infty(\mu||\xi)-D_\infty(\mu||\nu) \ge D_{|\sigma|}(\mu||\xi)-D_{|\sigma|}(\mu||\nu) =\ln\frac{\mu(\sigma)}{\xi(\sigma)}-\ln\frac{\mu(\sigma)}{\nu(\sigma)}, \end{align*}$$
where the last equality follows by Remark 3.1. Hence we have 
 $\nu (\sigma )\le e^c\xi (\sigma )$
.
$\nu (\sigma )\le e^c\xi (\sigma )$
.
 Since 
 $\sigma $
 is arbitrary, the condition (i) holds.
$\sigma $
 is arbitrary, the condition (i) holds.
3.3 Infinite chain rule for KL divergence
 Here, with independent interest, we show that 
 $D_\infty (\mu ||\xi )$
 is nothing but the usual KL divergence.
$D_\infty (\mu ||\xi )$
 is nothing but the usual KL divergence.
 Let us recall the KL divergence on a non-discrete space. Let 
 $\mu ,\xi $
 be measures on
$\mu ,\xi $
 be measures on 
 $\{0,1\}^{\mathbb {N}}$
. Then, the KL divergence of
$\{0,1\}^{\mathbb {N}}$
. Then, the KL divergence of 
 $\mu $
 with respect to
$\mu $
 with respect to 
 $\xi $
 is defined by
$\xi $
 is defined by 
 $$\begin{align*}D(\mu||\xi)=\int \frac{d\mu}{d\xi}\ln\frac{d\mu}{d\xi}d\xi=\int \ln\frac{d\mu}{d\xi}d\mu\end{align*}$$
$$\begin{align*}D(\mu||\xi)=\int \frac{d\mu}{d\xi}\ln\frac{d\mu}{d\xi}d\xi=\int \ln\frac{d\mu}{d\xi}d\mu\end{align*}$$
where 
 $0\cdot \log 0=0$
 and
$0\cdot \log 0=0$
 and 
 $\ln $
 is the natural logarithm, and
$\ln $
 is the natural logarithm, and 
 $\frac {d\mu }{d\xi }$
 is the Radon–Nikodym derivative of
$\frac {d\mu }{d\xi }$
 is the Radon–Nikodym derivative of 
 $\mu $
 with respect to
$\mu $
 with respect to 
 $\xi $
. If
$\xi $
. If 
 $\mu $
 is the derivative
$\mu $
 is the derivative 
 $\frac {d\mu }{d\xi }$
 does not exist, then let
$\frac {d\mu }{d\xi }$
 does not exist, then let 
 $D(\mu ||\xi )=\infty $
.
$D(\mu ||\xi )=\infty $
.
Proposition 3.3. Let 
 $\xi ,\mu $
 be measures on
$\xi ,\mu $
 be measures on 
 $\{0,1\}^{\mathbb {N}}$
. Then,
$\{0,1\}^{\mathbb {N}}$
. Then, 
 $$\begin{align*}D_\infty(\mu||\xi)=D(\mu||\xi).\end{align*}$$
$$\begin{align*}D_\infty(\mu||\xi)=D(\mu||\xi).\end{align*}$$
 This is an infinite version of the chain rule for KL divergence in Remark 3.1. The essential reason for this is that the Radon–Nikodym derivative 
 $\frac {d\mu }{d\xi }$
 can be approximated by
$\frac {d\mu }{d\xi }$
 can be approximated by 
 $\frac {\mu (X_{\le n})}{\xi (X_{\le n})}$
. For proof, we use the following facts.
$\frac {\mu (X_{\le n})}{\xi (X_{\le n})}$
. For proof, we use the following facts.
Lemma 3.4 (Theorem 5.3.3 in [Reference Durrett11] in our terminology).
 Suppose that 
 $\xi (\sigma )=0\Rightarrow \mu (\sigma )=0$
 for all
$\xi (\sigma )=0\Rightarrow \mu (\sigma )=0$
 for all 
 $\sigma \in \{0,1\}^*$
. Let
$\sigma \in \{0,1\}^*$
. Let 
 $f(X)=\limsup _n\frac {\mu (X_{\le n})}{\xi (X_{\le n})}$
. Then,
$f(X)=\limsup _n\frac {\mu (X_{\le n})}{\xi (X_{\le n})}$
. Then, 
 $$\begin{align*}\mu(A)=\int_A f\ d\xi+\mu(A\cap\{f(X)=\infty\})\end{align*}$$
$$\begin{align*}\mu(A)=\int_A f\ d\xi+\mu(A\cap\{f(X)=\infty\})\end{align*}$$
for all measurable sets A.
Remark 3.5.
- 
(i) The sequence  $(\frac {\mu (X_{\le n})}{\xi (X_{\le n})})_n$
 is a non-negative martingale with respect to $(\frac {\mu (X_{\le n})}{\xi (X_{\le n})})_n$
 is a non-negative martingale with respect to $\xi $
 (see [Reference Durrett11, Theorem 5.3.4]). $\xi $
 (see [Reference Durrett11, Theorem 5.3.4]).
- 
(ii) Hence,  $\xi (\{f(X)=\infty \})=0$
 by Doob’s martingale maximal inequality. $\xi (\{f(X)=\infty \})=0$
 by Doob’s martingale maximal inequality.
- 
(iii) If  $\mu \ll \xi $
, then $\mu \ll \xi $
, then $f=\lim _n\frac {\mu (X_{\le n})}{\xi (X_{\le n})}=\frac {d\mu }{d\xi }$
, $f=\lim _n\frac {\mu (X_{\le n})}{\xi (X_{\le n})}=\frac {d\mu }{d\xi }$
, $\xi $
-almost surely. $\xi $
-almost surely.
Proposition 3.3.
We divide the proof into four cases.
 
Case 1. 
 $\frac {d\mu }{d\xi }$
 exists and
$\frac {d\mu }{d\xi }$
 exists and 
 $D(\mu ||\xi )<\infty $
.
$D(\mu ||\xi )<\infty $
.
 We will show that 
 $(\frac {\mu (X_{\le n})}{\xi (X_{\le n})}\ln \frac {\mu (X_{\le n})}{\xi (X_{\le n})})_n$
 is uniformly integrable with respect to
$(\frac {\mu (X_{\le n})}{\xi (X_{\le n})}\ln \frac {\mu (X_{\le n})}{\xi (X_{\le n})})_n$
 is uniformly integrable with respect to 
 $\xi $
. For
$\xi $
. For 
 $K\in \mathbb {N}$
, let
$K\in \mathbb {N}$
, let 
 $$\begin{align*}U_n^K=\{X\in\{0,1\}^{\mathbb{N}}\ :\ \frac{\mu(X_{\le n})}{\xi(X_{\le n})}>K\}.\end{align*}$$
$$\begin{align*}U_n^K=\{X\in\{0,1\}^{\mathbb{N}}\ :\ \frac{\mu(X_{\le n})}{\xi(X_{\le n})}>K\}.\end{align*}$$
It suffices to show that
 $$\begin{align*}\sup_n\int_{U_n^K}\left|\frac{\mu(X_{\le n})}{\xi(X_{\le n})}\ln\frac{\mu(X_{\le n})}{\xi(X_{\le n})}\right|d\xi\to0\ \ \text{ as }K\to\infty.\end{align*}$$
$$\begin{align*}\sup_n\int_{U_n^K}\left|\frac{\mu(X_{\le n})}{\xi(X_{\le n})}\ln\frac{\mu(X_{\le n})}{\xi(X_{\le n})}\right|d\xi\to0\ \ \text{ as }K\to\infty.\end{align*}$$
Let 
 $A_n^K=\{\sigma \in \{0,1\}^n\ :\ \mu (\sigma )/\xi (\sigma )>K\}$
. For
$A_n^K=\{\sigma \in \{0,1\}^n\ :\ \mu (\sigma )/\xi (\sigma )>K\}$
. For 
 $K>1$
, we have
$K>1$
, we have 
 $\ln (\mu (\sigma )/\xi (\sigma ))>\ln K>0$
. Thus,
$\ln (\mu (\sigma )/\xi (\sigma ))>\ln K>0$
. Thus, 
 $$ \begin{align} \int_{U_n^K}\left|\frac{\mu(X_{\le n})}{\xi(X_{\le n})}\ln\frac{\mu(X_{\le n})}{\xi(X_{\le n})}\right|d\xi =&\sum_{\sigma\in A_n^K}\xi(\sigma)\frac{\mu(\sigma)}{\xi(\sigma)}\ln\frac{\mu(\sigma)}{\xi(\sigma)} \notag \\ \le&\sum_{\sigma\in A_n^K}\int_{[\sigma]}\frac{d\mu}{d\xi}\ln\frac{d\mu}{d\xi}d\xi =\int_{U_n^K}\frac{d\mu}{d\xi}\ln\frac{d\mu}{d\xi}d\xi \end{align} $$
$$ \begin{align} \int_{U_n^K}\left|\frac{\mu(X_{\le n})}{\xi(X_{\le n})}\ln\frac{\mu(X_{\le n})}{\xi(X_{\le n})}\right|d\xi =&\sum_{\sigma\in A_n^K}\xi(\sigma)\frac{\mu(\sigma)}{\xi(\sigma)}\ln\frac{\mu(\sigma)}{\xi(\sigma)} \notag \\ \le&\sum_{\sigma\in A_n^K}\int_{[\sigma]}\frac{d\mu}{d\xi}\ln\frac{d\mu}{d\xi}d\xi =\int_{U_n^K}\frac{d\mu}{d\xi}\ln\frac{d\mu}{d\xi}d\xi \end{align} $$
Here, we used Jensen’s inequality on 
 $[\sigma ]$
 with the convex function
$[\sigma ]$
 with the convex function 
 $g(x)=x\ln x$
:
$g(x)=x\ln x$
: 
 $$ \begin{align} g(\frac{1}{\xi(\sigma)}\int_{[\sigma]}\frac{d\mu}{d\xi}d\xi)\le\frac{1}{\xi(\sigma)}\int_{[\sigma]} g(\frac{d\mu}{d\xi})d\xi. \end{align} $$
$$ \begin{align} g(\frac{1}{\xi(\sigma)}\int_{[\sigma]}\frac{d\mu}{d\xi}d\xi)\le\frac{1}{\xi(\sigma)}\int_{[\sigma]} g(\frac{d\mu}{d\xi})d\xi. \end{align} $$
Since 
 $\mu (X_{\le n})/\xi (X_{\le n})$
 is a non-negative martingale by Remark 3.5, we have
$\mu (X_{\le n})/\xi (X_{\le n})$
 is a non-negative martingale by Remark 3.5, we have 
 ${\mu (U_n^K)<\frac {1}{K}}$
. From the epsilon-delta type characterization of absolute continuity (see [Reference Nielsen18, Proposition 15.5] for a general measure space and [Reference Bogachev5, Theorem 2.5.7] for the Lebesgue integral), the supremum of the last term in (5) goes to
${\mu (U_n^K)<\frac {1}{K}}$
. From the epsilon-delta type characterization of absolute continuity (see [Reference Nielsen18, Proposition 15.5] for a general measure space and [Reference Bogachev5, Theorem 2.5.7] for the Lebesgue integral), the supremum of the last term in (5) goes to 
 $0$
 as
$0$
 as 
 $K\to \infty $
. This shows uniform integrability.
$K\to \infty $
. This shows uniform integrability.
Finally, we use the Vitali convergence theorem to deduce
 $$\begin{align*}D_\infty(\mu||\xi)=\lim_n E[\frac{\mu(X_{\le n})}{\xi(X_{\le n})}\ln\frac{\mu(X_{\le n})}{\xi(X_{\le n})}] =E[\frac{d\mu}{d\xi}\ln\frac{d\mu}{d\xi}]=D(\mu||\xi)\end{align*}$$
$$\begin{align*}D_\infty(\mu||\xi)=\lim_n E[\frac{\mu(X_{\le n})}{\xi(X_{\le n})}\ln\frac{\mu(X_{\le n})}{\xi(X_{\le n})}] =E[\frac{d\mu}{d\xi}\ln\frac{d\mu}{d\xi}]=D(\mu||\xi)\end{align*}$$
by Remark 3.5(iii).
 
Case 2. 
 $\frac {d\mu }{d\xi }$
 exists and
$\frac {d\mu }{d\xi }$
 exists and 
 $D(\mu ||\xi )=\infty $
.
$D(\mu ||\xi )=\infty $
.
 Then, 
 $D_\infty (\mu ||\xi )=\infty $
 because, by the finite chain rule for KL divergence, we have
$D_\infty (\mu ||\xi )=\infty $
 because, by the finite chain rule for KL divergence, we have 
 $$\begin{align*}D_\infty(\mu||\xi)=\lim_n E_\mu[\ln\frac{\mu(X_{\le n})}{\xi(X_{\le n})}] \ge E_\mu[\ln\frac{d\mu}{d\xi}]=D(\mu||\xi),\end{align*}$$
$$\begin{align*}D_\infty(\mu||\xi)=\lim_n E_\mu[\ln\frac{\mu(X_{\le n})}{\xi(X_{\le n})}] \ge E_\mu[\ln\frac{d\mu}{d\xi}]=D(\mu||\xi),\end{align*}$$
where we have used Fatou’s lemma in deducing the inequality.
 
Case 3. 
 $\frac {d\mu }{d\xi }$
 does not exist and
$\frac {d\mu }{d\xi }$
 does not exist and 
 $\xi (\sigma )=0\Rightarrow \mu (\sigma )=0$
 for all
$\xi (\sigma )=0\Rightarrow \mu (\sigma )=0$
 for all 
 $\sigma \in \{0,1\}^*$
.
$\sigma \in \{0,1\}^*$
.
 By Lemma 3.4, 
 $\mu (\{f(X)=\infty \})=\epsilon>0$
. Then, for each
$\mu (\{f(X)=\infty \})=\epsilon>0$
. Then, for each 
 $K>0$
, we have
$K>0$
, we have 
 $\mu (\{\lim _n\frac {\mu (X_{\le n})}{\xi (X_{\le n})}>K\})\ge \epsilon $
, and thus, there exists
$\mu (\{\lim _n\frac {\mu (X_{\le n})}{\xi (X_{\le n})}>K\})\ge \epsilon $
, and thus, there exists 
 $n\in \mathbb {N}$
 such that
$n\in \mathbb {N}$
 such that 
 $\mu (\{\frac {\mu (X_{\le n})}{\xi (X_{\le n})}>K\}) >\epsilon /2$
, which implies
$\mu (\{\frac {\mu (X_{\le n})}{\xi (X_{\le n})}>K\}) >\epsilon /2$
, which implies 
 $D_n(\mu ||\xi )\ge \frac {\epsilon \ln K}{2}$
. Since K is arbitrary, we have
$D_n(\mu ||\xi )\ge \frac {\epsilon \ln K}{2}$
. Since K is arbitrary, we have 
 $D_\infty (\mu ||\xi )=\infty $
.
$D_\infty (\mu ||\xi )=\infty $
.
 
Case 4. 
 $\xi (\sigma )=0$
 and
$\xi (\sigma )=0$
 and 
 $\mu (\sigma )>0$
 for some
$\mu (\sigma )>0$
 for some 
 $\sigma \in \{0,1\}^*$
.
$\sigma \in \{0,1\}^*$
.
 In this case, we have 
 $D_{|\sigma |}(\mu ||\xi )\ge \mu (\sigma )\ln \frac {\mu (\sigma )}{\xi (\sigma )}=\infty $
. Thus,
$D_{|\sigma |}(\mu ||\xi )\ge \mu (\sigma )\ln \frac {\mu (\sigma )}{\xi (\sigma )}=\infty $
. Thus, 
 $D_\infty (\mu ||\xi )=\infty $
. Since
$D_\infty (\mu ||\xi )=\infty $
. Since 
 $\mu \not \ll \xi $
, we also have
$\mu \not \ll \xi $
, we also have 
 $D(\mu ||\xi )=\infty $
.
$D(\mu ||\xi )=\infty $
.
4 Rate of convergence
 Let 
 $\mu $
 be a computable model measure on
$\mu $
 be a computable model measure on 
 $\{0,1\}^{\mathbb {N}}$
. Then, for any computable measure
$\{0,1\}^{\mathbb {N}}$
. Then, for any computable measure 
 $\xi $
 that dominates
$\xi $
 that dominates 
 $\mu $
, we have
$\mu $
, we have 
 $D_\infty (\mu ||\xi )<\infty $
 by Theorem 3.2. Hence, any sufficiently general prediction converges to the conditional model measure, almost surely. In this section, we discuss its rate of convergence. The main result here is Martin-Löf randomness of the KL divergence, from which we show that the convergence rate is almost the same for any sufficiently general prediction.
$D_\infty (\mu ||\xi )<\infty $
 by Theorem 3.2. Hence, any sufficiently general prediction converges to the conditional model measure, almost surely. In this section, we discuss its rate of convergence. The main result here is Martin-Löf randomness of the KL divergence, from which we show that the convergence rate is almost the same for any sufficiently general prediction.
4.1 Martin-Löf randomness of KL divergence
We review Martin-Löf random left-c.e. reals to analyze the convergence rate. For details, see such as [Reference Downey and Hirschfeldt9, Chapter 9].
 A set 
 $U\subseteq \mathbb {R}$
 is a c.e. open set if there exists a computable sequence
$U\subseteq \mathbb {R}$
 is a c.e. open set if there exists a computable sequence 
 $(a_n,b_n)_{n\in \mathbb {N}}$
 of open intervals with rational endpoints such that
$(a_n,b_n)_{n\in \mathbb {N}}$
 of open intervals with rational endpoints such that 
 $U=\bigcup _n (a_n, b_n)$
. Let
$U=\bigcup _n (a_n, b_n)$
. Let 
 $\lambda $
 be the Lebesgue measure on
$\lambda $
 be the Lebesgue measure on 
 $\mathbb {R}$
. A ML-test with respect to
$\mathbb {R}$
. A ML-test with respect to 
 $\lambda $
 is a sequence
$\lambda $
 is a sequence 
 $(U_n)_n$
 of uniformly c.e. open sets with
$(U_n)_n$
 of uniformly c.e. open sets with 
 $\lambda (U_n)\le 2^{-n}$
 for all
$\lambda (U_n)\le 2^{-n}$
 for all 
 $n\in \mathbb {N}$
. A real
$n\in \mathbb {N}$
. A real 
 $\alpha \in \mathbb {R}$
 is called ML-random if
$\alpha \in \mathbb {R}$
 is called ML-random if 
 $\alpha \not \in \bigcap _n U_n$
 for every ML-test
$\alpha \not \in \bigcap _n U_n$
 for every ML-test 
 $(U_n)_n$
.
$(U_n)_n$
.
 An example of left-c.e. ML-random reals is the halting probability. The halting probability 
 $\Omega _U$
 of a prefix-free Turing machine U is defined by
$\Omega _U$
 of a prefix-free Turing machine U is defined by 
 ${\Omega _U=\sum _{\sigma \in \mathrm {dom(U)}}2^{-|\sigma |}}$
. Then,
${\Omega _U=\sum _{\sigma \in \mathrm {dom(U)}}2^{-|\sigma |}}$
. Then, 
 $\Omega _U$
 is a left-c.e. ML-random real for each universal prefix-free Turing machine U. This
$\Omega _U$
 is a left-c.e. ML-random real for each universal prefix-free Turing machine U. This 
 $\Omega _U$
 is known as Chaitin’s omega. Conversely, any left-c.e. ML-random real in
$\Omega _U$
 is known as Chaitin’s omega. Conversely, any left-c.e. ML-random real in 
 $(0,1)$
 is the halting probability of some universal machine (see [Reference Downey and Hirschfeldt9, Theorems 9.2.2 and 9.2.3]).
$(0,1)$
 is the halting probability of some universal machine (see [Reference Downey and Hirschfeldt9, Theorems 9.2.2 and 9.2.3]).
Theorem 4.1. Let 
 $\mu $
 be a computable model measure on
$\mu $
 be a computable model measure on 
 $\{0,1\}^{\mathbb {N}}$
. Then,
$\{0,1\}^{\mathbb {N}}$
. Then, 
 $D_\infty (\mu ||\xi )$
 is a finite left-c.e. ML-random real for all sufficiently general computable measures
$D_\infty (\mu ||\xi )$
 is a finite left-c.e. ML-random real for all sufficiently general computable measures 
 $\xi $
.
$\xi $
.
We can discuss the convergence rate from this Martin-Löf randomness. This is because all ML-random reals have almost the same rate of convergence, as follows:
Theorem 4.2 [Reference Barmpalias and Lewis-Pye1], see also [Reference Miller, Day, Fellows, Greenberg, Khoussainov, Melnikov and Rosamond16].
 Let 
 $\alpha ,\beta $
 be left-c.e. reals with their increasing computable approximations
$\alpha ,\beta $
 be left-c.e. reals with their increasing computable approximations 
 $(\alpha _s),(\beta _s)$
. If
$(\alpha _s),(\beta _s)$
. If 
 $\beta $
 is ML-random, then
$\beta $
 is ML-random, then 
 $$\begin{align*}\lim_{s\to\infty}\frac{\alpha-\alpha_s}{\beta-\beta_s}\text{ exists }\end{align*}$$
$$\begin{align*}\lim_{s\to\infty}\frac{\alpha-\alpha_s}{\beta-\beta_s}\text{ exists }\end{align*}$$
and is independent from the approximation. Furthermore, the limit is zero if and only if 
 $\alpha $
 is not ML-random.
$\alpha $
 is not ML-random.
This theorem means that the convergence rate of ML-random left-c.e. reals is the same up to a multiplicative constant and much slower than that of non-ML-random left-c.e. reals.
 Now we give a proof of Theorem 4.1. First we construct a computable measure 
 $\nu $
 such that
$\nu $
 such that 
 $D_\infty (\mu ||\nu )$
 is ML-random. Then, we claim that if a computable measure
$D_\infty (\mu ||\nu )$
 is ML-random. Then, we claim that if a computable measure 
 $\xi $
 dominates
$\xi $
 dominates 
 $\nu $
, then
$\nu $
, then 
 $D_\infty (\mu ||\xi )-D_\infty (\mu ||\nu )$
 is a left-c.e. real, which implies ML-randomness of
$D_\infty (\mu ||\xi )-D_\infty (\mu ||\nu )$
 is a left-c.e. real, which implies ML-randomness of 
 $D_\infty (\mu ||\xi )$
 by a result of Solovay reducibility.
$D_\infty (\mu ||\xi )$
 by a result of Solovay reducibility.
Lemma 4.3. Let 
 $\mu $
 be a computable measure. Then, there exists a computable measure
$\mu $
 be a computable measure. Then, there exists a computable measure 
 $\nu $
 such that
$\nu $
 such that
 $:$
$:$
 
- 
• the Radon–Nikodym derivative  $\frac {d\mu }{d\nu }$
 exists, $\frac {d\mu }{d\nu }$
 exists,
- 
•  $\frac {d\mu }{d\nu }$
 is a constant function on a $\frac {d\mu }{d\nu }$
 is a constant function on a $\mu $
-measure $\mu $
-measure $1$
 set and $1$
 set and $0$
 outside it, $0$
 outside it,
- 
• the constant value is a finite left-c.e. ML-random real. 
In particular, 
 $D_\infty (\mu ||\nu )$
 is a finite left-c.e. ML-random real.
$D_\infty (\mu ||\nu )$
 is a finite left-c.e. ML-random real.
Proof. First, we define the computable measure 
 $\nu $
. Let
$\nu $
. Let 
 $(z_n)_{n\in \mathbb {N}}$
 be a sequence of uniformly computable positive reals such that
$(z_n)_{n\in \mathbb {N}}$
 be a sequence of uniformly computable positive reals such that 
 $s=\sum _{n\in \mathbb {N}} z_n<1$
 is a ML-random real. Let
$s=\sum _{n\in \mathbb {N}} z_n<1$
 is a ML-random real. Let 
 $Z^\sigma \in \{0,1\}^{\mathbb {N}}$
 be a computable sequence uniformly in
$Z^\sigma \in \{0,1\}^{\mathbb {N}}$
 be a computable sequence uniformly in 
 $\sigma $
 such that
$\sigma $
 such that 
 $\sigma \prec Z^\sigma $
 and
$\sigma \prec Z^\sigma $
 and 
 $\mu (Z^\sigma )=0$
, whose existence will be shown in Lemma 4.4.
$\mu (Z^\sigma )=0$
, whose existence will be shown in Lemma 4.4.
 Define measures 
 $\mu _n, \nu $
 by
$\mu _n, \nu $
 by 
 $$\begin{align*}\mu_n(\sigma)= \begin{cases} \mu(\sigma),\ &\ \text{ if }|\sigma|\le n,\\ \mu(\tau),\ &\ \text{ if }|\sigma|>n,\ \tau=\sigma_{\le n},\ \sigma\prec Z^\tau,\\ 0,\ &\ \text{ if }|\sigma|>n,\ \tau=\sigma_{\le n},\ \sigma\not\prec Z^\tau,\\ \end{cases}\end{align*}$$
$$\begin{align*}\mu_n(\sigma)= \begin{cases} \mu(\sigma),\ &\ \text{ if }|\sigma|\le n,\\ \mu(\tau),\ &\ \text{ if }|\sigma|>n,\ \tau=\sigma_{\le n},\ \sigma\prec Z^\tau,\\ 0,\ &\ \text{ if }|\sigma|>n,\ \tau=\sigma_{\le n},\ \sigma\not\prec Z^\tau,\\ \end{cases}\end{align*}$$
for all 
 $\sigma \in \{0,1\}^*$
 and
$\sigma \in \{0,1\}^*$
 and 
 $$ \begin{align} \nu=\sum_{n}z_n\mu_n+(1-s)\mu. \end{align} $$
$$ \begin{align} \nu=\sum_{n}z_n\mu_n+(1-s)\mu. \end{align} $$
The measure 
 $\mu _n$
 coincides with
$\mu _n$
 coincides with 
 $\mu $
 up to depth n, but beyond that point it collapses the distribution onto a single predetermined infinite path
$\mu $
 up to depth n, but beyond that point it collapses the distribution onto a single predetermined infinite path 
 $Z^\tau $
 extending each prefix
$Z^\tau $
 extending each prefix 
 $\tau $
 of length n; in other words, all of the mass that
$\tau $
 of length n; in other words, all of the mass that 
 $\mu $
 assigns to
$\mu $
 assigns to 
 $\tau $
 is concentrated along one chosen branch, and every other continuation gets zero. The measure
$\tau $
 is concentrated along one chosen branch, and every other continuation gets zero. The measure 
 $\nu $
 mixes the collapsed measures
$\nu $
 mixes the collapsed measures 
 $\mu _n$
 with weights
$\mu _n$
 with weights 
 $z_n$
 together with a portion of the original measure
$z_n$
 together with a portion of the original measure 
 $\mu $
, so it combines
$\mu $
, so it combines 
 $\mu $
 with versions that eventually follow a single deterministic path.
$\mu $
 with versions that eventually follow a single deterministic path.
 Now, we claim that the measure 
 $\nu $
 is computable. This is because
$\nu $
 is computable. This is because 
 $$ \begin{align*} \nu(\sigma) =&\sum_{n<|\sigma|}z_n\mu_n(\sigma)+\sum_{n\ge|\sigma|}z_n\mu_n(\sigma)+(1-s)\mu(\sigma)\\ =&\sum_{n<|\sigma|}z_n\mu_n(\sigma)+(1-\sum_{n<|\sigma|}z_n)\mu(\sigma). \end{align*} $$
$$ \begin{align*} \nu(\sigma) =&\sum_{n<|\sigma|}z_n\mu_n(\sigma)+\sum_{n\ge|\sigma|}z_n\mu_n(\sigma)+(1-s)\mu(\sigma)\\ =&\sum_{n<|\sigma|}z_n\mu_n(\sigma)+(1-\sum_{n<|\sigma|}z_n)\mu(\sigma). \end{align*} $$
 Next we find 
 $\frac {d\mu }{d\nu }$
. Because
$\frac {d\mu }{d\nu }$
. Because 
 $\mu \ll \nu $
, by Remark 3.5(iii),
$\mu \ll \nu $
, by Remark 3.5(iii), 
 $\frac {d\mu }{d\nu }=\lim _n\frac {\mu (X_{\le n})}{\nu (X_{\le n})} \nu $
-almost surely.
$\frac {d\mu }{d\nu }=\lim _n\frac {\mu (X_{\le n})}{\nu (X_{\le n})} \nu $
-almost surely.
 Consider 
 $X\in \{0,1\}^{\mathbb {N}}$
 such that
$X\in \{0,1\}^{\mathbb {N}}$
 such that 
 $\mu (X_{\le n})>0$
 for all n. Then,
$\mu (X_{\le n})>0$
 for all n. Then, 
 $\mu $
-almost such sequences satisfy
$\mu $
-almost such sequences satisfy 
 $X\not \ne Z^\sigma $
 for any
$X\not \ne Z^\sigma $
 for any 
 $\sigma \in \{0,1\}^*$
. For each n and sufficiently large k depending on n, we have
$\sigma \in \{0,1\}^*$
. For each n and sufficiently large k depending on n, we have 
 $\mu _n(X_{\le k})=0$
. Thus,
$\mu _n(X_{\le k})=0$
. Thus, 
 $\lim _k\frac {\mu (X_{\le k})}{\nu (X_{\le k})}=\frac {1}{1-s}$
.
$\lim _k\frac {\mu (X_{\le k})}{\nu (X_{\le k})}=\frac {1}{1-s}$
.
 If 
 $X=Z^\sigma $
 for some
$X=Z^\sigma $
 for some 
 $\sigma \in \{0,1\}^*$
, then
$\sigma \in \{0,1\}^*$
, then 
 $$ \begin{align*} \mu(X_{\le n})&\to\mu(X)=\mu(Z^\sigma)=0,\\ \nu(X_{\le n})&\to\nu(X)=\sum\{z_n\mu_n(\sigma)\ :\ Z^\sigma=X\}>0, \end{align*} $$
$$ \begin{align*} \mu(X_{\le n})&\to\mu(X)=\mu(Z^\sigma)=0,\\ \nu(X_{\le n})&\to\nu(X)=\sum\{z_n\mu_n(\sigma)\ :\ Z^\sigma=X\}>0, \end{align*} $$
as 
 $n\to \infty $
. Hence,
$n\to \infty $
. Hence, 
 $\lim _k\frac {\mu (X_{\le k})}{\nu (X_{\le k})}=0$
.
$\lim _k\frac {\mu (X_{\le k})}{\nu (X_{\le k})}=0$
.
 We also observe that the set of X such that 
 $\mu (X_{\le n})=0$
 for some n has
$\mu (X_{\le n})=0$
 for some n has 
 $\mu $
-measure
$\mu $
-measure 
 $0$
. Because s is a left-c.e. ML-random, so is
$0$
. Because s is a left-c.e. ML-random, so is 
 $\frac {1}{1-s}$
. Hence, the first half of the claim follows.
$\frac {1}{1-s}$
. Hence, the first half of the claim follows.
Finally,
 $$\begin{align*}D(\mu||\nu)=\int\ln\frac{d\mu}{d\nu}d\mu=\ln\frac{1}{1-s},\end{align*}$$
$$\begin{align*}D(\mu||\nu)=\int\ln\frac{d\mu}{d\nu}d\mu=\ln\frac{1}{1-s},\end{align*}$$
which is ML-random by Proposition 4.5.
Lemma 4.4. For each 
 $\sigma \in \{0,1\}^*$
, we can compute a sequence
$\sigma \in \{0,1\}^*$
, we can compute a sequence 
 $Z^\sigma \in \{0,1\}^{\mathbb {N}}$
 such that
$Z^\sigma \in \{0,1\}^{\mathbb {N}}$
 such that 
 $\sigma \prec Z^\sigma $
 and
$\sigma \prec Z^\sigma $
 and 
 $\mu (Z^\sigma )=0$
. Furthermore, the construction is uniform in
$\mu (Z^\sigma )=0$
. Furthermore, the construction is uniform in 
 $\sigma $
.
$\sigma $
.
 We construct 
 $Z^\sigma $
 as the limit of extending sequences
$Z^\sigma $
 as the limit of extending sequences 
 $\sigma =\tau _0\prec \tau _1\prec \tau _2\dots $
.
$\sigma =\tau _0\prec \tau _1\prec \tau _2\dots $
.
 One might attempt to define 
 $\tau _{k+1}$
 from
$\tau _{k+1}$
 from 
 $\tau _k$
 with the following properties:
$\tau _k$
 with the following properties: 
- 
•  $\tau _k \prec \tau _{k+1}$
, $\tau _k \prec \tau _{k+1}$
,
- 
•  $|\tau _{k+1}|=|\tau _k|+1$
, $|\tau _{k+1}|=|\tau _k|+1$
,
- 
•  $\mu (\tau _{k+1})<\frac {2}{3}\cdot \mu (\tau _k)$
. $\mu (\tau _{k+1})<\frac {2}{3}\cdot \mu (\tau _k)$
.
Roughly saying, one computes the conditional probability and takes the smaller one.
 However, this simple idea does not work. Since 
 $\mu (\sigma )$
 may be
$\mu (\sigma )$
 may be 
 $0$
 for some
$0$
 for some 
 ${\sigma \in \{0,1\}^*}$
, the conditional probability may not be computable.
${\sigma \in \{0,1\}^*}$
, the conditional probability may not be computable.
To make the construction uniform, we need the following modified strategy to construct it.
Proof. Let 
 $p,q\in (0,1)$
 be rational numbers such that
$p,q\in (0,1)$
 be rational numbers such that 
 $$\begin{align*}0<p<q<1,\quad pq>\frac{1}{2},\end{align*}$$
$$\begin{align*}0<p<q<1,\quad pq>\frac{1}{2},\end{align*}$$
for example, 
 $p=\frac {3}{4}$
 and
$p=\frac {3}{4}$
 and 
 $q=\frac {4}{5}$
.
$q=\frac {4}{5}$
.
 Let 
 $\tau _0=\sigma $
.
$\tau _0=\sigma $
.
 Suppose 
 $\tau _k$
 is already defined and satisfies
$\tau _k$
 is already defined and satisfies 
 $$ \begin{align} \mu(\tau_k)\le q^k\max\left\{\mu(\sigma),p^k\right\}. \end{align} $$
$$ \begin{align} \mu(\tau_k)\le q^k\max\left\{\mu(\sigma),p^k\right\}. \end{align} $$
Notice that (8) holds for 
 $k=0$
.
$k=0$
.
 Now we define 
 $\tau _{k+1}$
 so that
$\tau _{k+1}$
 so that 
 $\tau _k\prec \tau _{k+1}$
,
$\tau _k\prec \tau _{k+1}$
, 
 $|\tau _{k+1}|=|\tau _k|+1$
,
$|\tau _{k+1}|=|\tau _k|+1$
, 
 $$ \begin{align} \mu(\tau_{k+1})<q^{k+1}\max\left\{\mu(\sigma),p^{k+1}\right\}. \end{align} $$
$$ \begin{align} \mu(\tau_{k+1})<q^{k+1}\max\left\{\mu(\sigma),p^{k+1}\right\}. \end{align} $$
We claim that 
 $\tau _{k+1}$
 computationally can be found. If neither of the strings extending
$\tau _{k+1}$
 computationally can be found. If neither of the strings extending 
 $\tau _k$
 satisfies (9), then
$\tau _k$
 satisfies (9), then 
 $$\begin{align*}\mu(\tau_k)\ge 2 q^{k+1}\max\left\{\mu(\sigma),p^{k+1}\right\}>q^k\max\left\{\mu(\sigma),p^k\right\},\end{align*}$$
$$\begin{align*}\mu(\tau_k)\ge 2 q^{k+1}\max\left\{\mu(\sigma),p^{k+1}\right\}>q^k\max\left\{\mu(\sigma),p^k\right\},\end{align*}$$
which contradicts (8). Hence, one of the two strings extending 
 $\tau _k$
 satisfies (9), which can be found computably.
$\tau _k$
 satisfies (9), which can be found computably.
Finally, the claim follows by letting k tend to infinity in (8).
Proposition 4.5. Let I be an open interval in the real line and 
 $f: I\to \mathbb {R}$
 be a computable function in
$f: I\to \mathbb {R}$
 be a computable function in 
 $C^1$
. If
$C^1$
. If 
 $z\in I$
 is ML-random and
$z\in I$
 is ML-random and 
 $f'(z)\ne 0$
, then
$f'(z)\ne 0$
, then 
 $f(z)$
 is ML-random. Here
$f(z)$
 is ML-random. Here 
 $f'$
 is the derivative of f.
$f'$
 is the derivative of f.
This fact follows from the more advanced fact called randomness preservation or conservation of randomness [Reference Bienvenu and Porter4, Theorem 3.2]. However, we give a direct proof here.
Proof. Without loss of generality, we can assume 
 $f'(z)>0$
. Because
$f'(z)>0$
. Because 
 $f'$
 is continuous, there exists a closed interval
$f'$
 is continuous, there exists a closed interval 
 $[a, b]$
 with rational endpoints such that
$[a, b]$
 with rational endpoints such that 
 $z\in [a, b]\subseteq I$
 and
$z\in [a, b]\subseteq I$
 and 
 $f'(x)>0$
 for every
$f'(x)>0$
 for every 
 $x\in [a, b]$
. Because
$x\in [a, b]$
. Because 
 $f'$
 is continuous and
$f'$
 is continuous and 
 $[a, b]$
 is a bounded closed set, by the extreme value theorem, we have a positive rational
$[a, b]$
 is a bounded closed set, by the extreme value theorem, we have a positive rational 
 $m<\inf _{x\in [a, b]}f'(x)$
.
$m<\inf _{x\in [a, b]}f'(x)$
.
 Suppose 
 $f(z)$
 is not ML-random. Then there exists a ML-test
$f(z)$
 is not ML-random. Then there exists a ML-test 
 $(U_n)_n$
 such that
$(U_n)_n$
 such that 
 $f(z)\in \bigcap _n U_n$
. Let
$f(z)\in \bigcap _n U_n$
. Let 
 $V_n=\{x\ :\ f(x)\in U_n\}\cap [a, b]$
. Then,
$V_n=\{x\ :\ f(x)\in U_n\}\cap [a, b]$
. Then, 
 $(V_n)_n$
 is a sequence of uniformly c.e. open sets. We also have
$(V_n)_n$
 is a sequence of uniformly c.e. open sets. We also have 
 $z\in \bigcap _n V_n$
 because
$z\in \bigcap _n V_n$
 because 
 $f(z)\in U_n$
 for all n.
$f(z)\in U_n$
 for all n.
 We claim that 
 $\mu (V_n)\le 2^{-n}/m$
 for all n. When some interval
$\mu (V_n)\le 2^{-n}/m$
 for all n. When some interval 
 $(c, d)\subseteq [f(a),f(b)]$
 is enumerated into
$(c, d)\subseteq [f(a),f(b)]$
 is enumerated into 
 $U_n$
, the corresponding interval
$U_n$
, the corresponding interval 
 $(f^{-1}(c),f^{-1}(d))\subseteq [a, b]$
 is enumerated into
$(f^{-1}(c),f^{-1}(d))\subseteq [a, b]$
 is enumerated into 
 $V_n$
. By the mean-value theorem, there exists
$V_n$
. By the mean-value theorem, there exists 
 $w\in (f^{-1}(c),f^{-1}(d))$
 such that
$w\in (f^{-1}(c),f^{-1}(d))$
 such that 
 $$\begin{align*}(d-c)=f'(w)(f^{-1}(d)-f^{-1}(c))\ge m(f^{-1}(d)-f^{-1}(c)).\end{align*}$$
$$\begin{align*}(d-c)=f'(w)(f^{-1}(d)-f^{-1}(c))\ge m(f^{-1}(d)-f^{-1}(c)).\end{align*}$$
Hence, the claim follows.
The last piece for the proof is the following result on Solovay reducibility. For a proof, see [Reference Downey and Hirschfeldt9, Theorem 9.1.4] or [Reference Nies19, Proposition 3.2.27].
Proposition 4.6. The sum of a left-c.e. ML-random real and a left-c.e. real is ML-random.
Theorem 4.1.
 Let 
 $\nu $
 be the measure constructed in Lemma 4.3. Let
$\nu $
 be the measure constructed in Lemma 4.3. Let 
 $\xi $
 be a measure dominating
$\xi $
 be a measure dominating 
 $\nu $
. Then,
$\nu $
. Then, 
 $$\begin{align*}D(\mu||\xi)=\int\ln\frac{d\mu}{d\xi}d\mu =\int\ln\frac{d\mu}{d\nu}d\mu+\int\frac{d\mu}{d\nu}\ln\frac{d\nu}{d\xi}d\nu =D(\mu||\nu)+\alpha D(\nu||\xi),\end{align*}$$
$$\begin{align*}D(\mu||\xi)=\int\ln\frac{d\mu}{d\xi}d\mu =\int\ln\frac{d\mu}{d\nu}d\mu+\int\frac{d\mu}{d\nu}\ln\frac{d\nu}{d\xi}d\nu =D(\mu||\nu)+\alpha D(\nu||\xi),\end{align*}$$
where 
 $\alpha $
 is the left-c.e. real such that
$\alpha $
 is the left-c.e. real such that 
 $\frac {d\mu }{d\nu }=\alpha\ \mu $
-a.s. Here,
$\frac {d\mu }{d\nu }=\alpha\ \mu $
-a.s. Here, 
 $D(\mu ||\nu )$
 is ML-random by Lemma 4.3 and
$D(\mu ||\nu )$
 is ML-random by Lemma 4.3 and 
 $D(\mu ||\nu )$
 and
$D(\mu ||\nu )$
 and 
 $D(\nu ||\xi )$
 are left-c.e., as in Proposition 3.3. Thus, by Proposition 4.6,
$D(\nu ||\xi )$
 are left-c.e., as in Proposition 3.3. Thus, by Proposition 4.6, 
 $D_\infty (\mu ||\xi )$
 is ML-random.
$D_\infty (\mu ||\xi )$
 is ML-random.
4.2 
 $L^p$
-norm of measures
$L^p$
-norm of measures
 We begin by introducing distances between measures on the finite alphabet 
 $\{0,1\}$
. These distances will later be applied to conditional distributions arising from measures on the infinite sequence space
$\{0,1\}$
. These distances will later be applied to conditional distributions arising from measures on the infinite sequence space 
 $\{0,1\}^{\mathbb {N}}$
.
$\{0,1\}^{\mathbb {N}}$
.
 Let 
 $\mu ,\xi $
 be measures on the discrete space
$\mu ,\xi $
 be measures on the discrete space 
 $\{0,1\}$
. For
$\{0,1\}$
. For 
 $p\ge 1$
, the distance between
$p\ge 1$
, the distance between 
 $\mu $
 and
$\mu $
 and 
 $\xi $
 by the
$\xi $
 by the 
 $L^p$
-norm is
$L^p$
-norm is 
 $$\begin{align*}||\mu-\xi||_p=(\sum_{k\in\{0,1\}}|\mu(k)-\xi(k)|^p)^{1/p}.\end{align*}$$
$$\begin{align*}||\mu-\xi||_p=(\sum_{k\in\{0,1\}}|\mu(k)-\xi(k)|^p)^{1/p}.\end{align*}$$
Let
 $$\begin{align*}\ell_p(\mu,\xi)=||\mu-\xi||_p^p.\end{align*}$$
$$\begin{align*}\ell_p(\mu,\xi)=||\mu-\xi||_p^p.\end{align*}$$
Some closely related distances are:
- 
•  $\ell _1(\mu ,\xi )=||\mu -\xi ||_1$
 is the Manhattan distance. $\ell _1(\mu ,\xi )=||\mu -\xi ||_1$
 is the Manhattan distance.
- 
•  $\ell _2(\mu ,\xi )=||\mu -\xi ||_2^2$
 is the squared Euclidian distance. $\ell _2(\mu ,\xi )=||\mu -\xi ||_2^2$
 is the squared Euclidian distance.
- 
•  $\frac {1}{2}\ell _1(\mu ,\xi )=\frac {1}{2}||\mu -\xi ||_1$
 is the total variation distance. $\frac {1}{2}\ell _1(\mu ,\xi )=\frac {1}{2}||\mu -\xi ||_1$
 is the total variation distance.
 We now extend these notions to measures on the sequence space 
 $\{0,1\}^{\mathbb {N}}$
, in the same way as was previously done for the KL divergence. For measures
$\{0,1\}^{\mathbb {N}}$
, in the same way as was previously done for the KL divergence. For measures 
 $\mu ,\xi $
 on
$\mu ,\xi $
 on 
 $\{0,1\}^{\mathbb {N}}$
, we write:
$\{0,1\}^{\mathbb {N}}$
, we write: 
- 
•  $\ell _{p,\sigma }(\mu ,\xi )=\ell _p(\mu (\cdot |\sigma ),\xi (\cdot |\sigma ))$
, $\ell _{p,\sigma }(\mu ,\xi )=\ell _p(\mu (\cdot |\sigma ),\xi (\cdot |\sigma ))$
,
- 
•  $L_{p,n}(\mu ,\xi )=\sum _{k=1}^n E_{X\sim \mu }[\ell _{p,X_{<k}}(\mu ,\xi )]$
, $L_{p,n}(\mu ,\xi )=\sum _{k=1}^n E_{X\sim \mu }[\ell _{p,X_{<k}}(\mu ,\xi )]$
,
- 
•  $L_{p,\infty }(\mu ,\xi )=\lim _{n\to \infty }L_{p,n}(\mu ,\xi )$
. $L_{p,\infty }(\mu ,\xi )=\lim _{n\to \infty }L_{p,n}(\mu ,\xi )$
.
If 
 $\mu $
,
$\mu $
, 
 $\xi $
, and p are computable and
$\xi $
, and p are computable and 
 $L_{p,\infty }(\mu ,\xi )$
 is finite, then
$L_{p,\infty }(\mu ,\xi )$
 is finite, then 
 $L_{p,\infty }(\mu ,\xi )$
 is left-c.e.
$L_{p,\infty }(\mu ,\xi )$
 is left-c.e.
 Let 
 $\mu $
 be a computable measure on
$\mu $
 be a computable measure on 
 $\{0,1\}^{\mathbb {N}}$
. We ask at which p the left-c.e. reals
$\{0,1\}^{\mathbb {N}}$
. We ask at which p the left-c.e. reals 
 $D_{\infty }(\mu ,\xi )$
 and
$D_{\infty }(\mu ,\xi )$
 and 
 $L_{p,\infty }(\mu ,\xi )$
 have the same rate of convergence, which mainly depends on
$L_{p,\infty }(\mu ,\xi )$
 have the same rate of convergence, which mainly depends on 
 $\mu $
.
$\mu $
.
 In the theory of algorithmic randomness, Solovay reducibility measures the convergence rate of left-c.e. reals. Instead of the original definition by Solovay, we use the following characterization by Downey, Hirschfeldt, and Nies [Reference Downey, Hirschfeldt and Nies10] (see also [Reference Downey and Hirschfeldt9, Theorem 9.1.8]). For two left-c.e. reals 
 $\alpha ,\beta $
, we say that
$\alpha ,\beta $
, we say that 
 $\alpha $
 is Solovay reducible to
$\alpha $
 is Solovay reducible to 
 $\beta $
, denoted by
$\beta $
, denoted by 
 $\alpha \le _S \beta $
, if there exists a constant
$\alpha \le _S \beta $
, if there exists a constant 
 $c\in \mathbb {N}$
 and a left-c.e. real
$c\in \mathbb {N}$
 and a left-c.e. real 
 $\gamma $
 such that
$\gamma $
 such that 
 $c\beta =\alpha +\gamma $
. Roughly saying,
$c\beta =\alpha +\gamma $
. Roughly saying, 
 $\alpha \le _S\beta $
 means that the convergence rate of
$\alpha \le _S\beta $
 means that the convergence rate of 
 $\beta $
 is not faster than
$\beta $
 is not faster than 
 $\alpha $
. The induced equivalence relation, denoted by
$\alpha $
. The induced equivalence relation, denoted by 
 $\equiv _S$
, is defined by
$\equiv _S$
, is defined by 
 $\alpha \equiv _S \beta \iff (\alpha \le _S \beta \;\;\text {and}\;\; \beta \le _S \alpha )$
. If
$\alpha \equiv _S \beta \iff (\alpha \le _S \beta \;\;\text {and}\;\; \beta \le _S \alpha )$
. If 
 $\alpha $
 is ML-random and
$\alpha $
 is ML-random and 
 $\alpha \le _S \beta $
, then
$\alpha \le _S \beta $
, then 
 $\beta $
 is ML-random by Proposition 4.6.
$\beta $
 is ML-random by Proposition 4.6.
Definition 4.7. We define 
 $R(\mu )$
 to be the set of positive computable reals p such that
$R(\mu )$
 to be the set of positive computable reals p such that 
 $L_{p,\infty }(\mu ,\xi )<\infty $
 and
$L_{p,\infty }(\mu ,\xi )<\infty $
 and 
 $D_{\infty }(\mu ,\xi )\equiv _S L_{p,\infty }(\mu ,\xi )$
 for all computable measures
$D_{\infty }(\mu ,\xi )\equiv _S L_{p,\infty }(\mu ,\xi )$
 for all computable measures 
 $\xi $
 dominating
$\xi $
 dominating 
 $\mu $
.
$\mu $
.
 In what follows, we determine 
 $R(\mu )$
 for Dirac measures
$R(\mu )$
 for Dirac measures 
 $\mu $
 and separated measures
$\mu $
 and separated measures 
 $\mu $
. If
$\mu $
. If 
 $R(\mu )$
 is a single point set, we write
$R(\mu )$
 is a single point set, we write 
 $R(\mu )=p$
 for
$R(\mu )=p$
 for 
 $R(\mu )=\{p\}$
.
$R(\mu )=\{p\}$
.
 The rough rate of convergence of left-c.e. reals can be represented by the effective Hausdorff dimension. Let K be the prefix-free Kolmogorov complexity, that is, 
 $K(\sigma )=\min \{|\tau |\ :\ U(\tau )=\sigma \}$
 where U is a fixed universal prefix-free Turing machine. The Levin–Schnorr theorem states that
$K(\sigma )=\min \{|\tau |\ :\ U(\tau )=\sigma \}$
 where U is a fixed universal prefix-free Turing machine. The Levin–Schnorr theorem states that 
 $X\in \{0,1\}^{\mathbb {N}}$
 is ML-random if and only if
$X\in \{0,1\}^{\mathbb {N}}$
 is ML-random if and only if 
 $K(X\restriction n)>n-O(1)$
 where we identify a real in the unit interval with its binary expansion. The effective Hausdorff dimension of
$K(X\restriction n)>n-O(1)$
 where we identify a real in the unit interval with its binary expansion. The effective Hausdorff dimension of 
 $X\in \{0,1\}^{\mathbb {N}}$
 is defined by
$X\in \{0,1\}^{\mathbb {N}}$
 is defined by 
 $$\begin{align*}\mathrm{dim}(X)=\liminf_n \frac{K(X\restriction n)}{n}.\end{align*}$$
$$\begin{align*}\mathrm{dim}(X)=\liminf_n \frac{K(X\restriction n)}{n}.\end{align*}$$
In particular, 
 $\mathrm {dim}(X)=1$
 for each ML-random sequence X. See [Reference Downey and Hirschfeldt9, Chapter 13] for details.
$\mathrm {dim}(X)=1$
 for each ML-random sequence X. See [Reference Downey and Hirschfeldt9, Chapter 13] for details.
Theorem 4.8 (Theorem 3.2 in [Reference Tadaki25]).
 Let 
 $(a_n)_n$
 be a sequence of uniformly computable positive reals such that
$(a_n)_n$
 be a sequence of uniformly computable positive reals such that 
 $\sum _n a_n$
 is finite and is ML-random. Then, the following holds
$\sum _n a_n$
 is finite and is ML-random. Then, the following holds
 $:$
$:$
 
- 
(i)  $\mathrm {dim}(\sum _n (a_n)^p)=1/p$
 for each computable $\mathrm {dim}(\sum _n (a_n)^p)=1/p$
 for each computable $p\ge 1$
. $p\ge 1$
.
- 
(ii)  $\sum _n (a_n)^p=\infty $
 for each $\sum _n (a_n)^p=\infty $
 for each $p\in (0,1)$
. $p\in (0,1)$
.
The original statement by Tadaki is about the halting probability but the statement also holds for any sequence of uniformly computable positive reals whose sum is finite and ML-random by almost the same proof.
4.3 Case of Dirac measures
 From now on, we discuss the rate of convergence more concretely. First, we consider the case in which the model measure 
 $\mu $
 is a Dirac measure, which means that the model is deterministic.
$\mu $
 is a Dirac measure, which means that the model is deterministic.
 Let 
 $\mu $
 be a computable Dirac measure; that is,
$\mu $
 be a computable Dirac measure; that is, 
 $\mu =\mathbf {1}_A$
 for some
$\mu =\mathbf {1}_A$
 for some 
 $A\in \{0,1\}^{\mathbb {N}}$
. Because A is an atom of the computable measure
$A\in \{0,1\}^{\mathbb {N}}$
. Because A is an atom of the computable measure 
 $\mu $
, the sequence A is computable (see, for example, [Reference Downey and Hirschfeldt9, Lemma 6.12.7]). The goal is to evaluate the error of
$\mu $
, the sequence A is computable (see, for example, [Reference Downey and Hirschfeldt9, Lemma 6.12.7]). The goal is to evaluate the error of 
 $\xi $
$\xi $
 
 $$\begin{align*}1-\xi(A_n|A_{<n})\end{align*}$$
$$\begin{align*}1-\xi(A_n|A_{<n})\end{align*}$$
for each 
 $n\in \mathbb {N}$
 for general computable prediction measures
$n\in \mathbb {N}$
 for general computable prediction measures 
 $\xi $
.
$\xi $
.
Proposition 4.9. Let 
 $A\in \{0,1\}^{\mathbb {N}}$
 be a computable sequence and
$A\in \{0,1\}^{\mathbb {N}}$
 be a computable sequence and 
 $\mu =\mathbf {1}_A$
. Then,
$\mu =\mathbf {1}_A$
. Then, 
 $R(\mu )=1$
. In particular,
$R(\mu )=1$
. In particular, 
 $L_{1,\infty }(\mu ,\xi )$
 is finite and is a left-c.e. ML-random real for all sufficiently general computable prediction measures
$L_{1,\infty }(\mu ,\xi )$
 is finite and is a left-c.e. ML-random real for all sufficiently general computable prediction measures 
 $\xi $
.
$\xi $
.
Lemma 4.10. Let 
 $A\in \{0,1\}^{\mathbb {N}}$
 be a computable sequence and
$A\in \{0,1\}^{\mathbb {N}}$
 be a computable sequence and 
 $\mu =\mathbf {1}_A$
. Let
$\mu =\mathbf {1}_A$
. Let 
 $\xi $
 be a computable measure dominating
$\xi $
 be a computable measure dominating 
 $\mu $
. Then,
$\mu $
. Then, 
 $$\begin{align*}L_{1,\infty}(\mu,\xi)=2\sum_{n=1}^\infty(1-\xi(A_n|A_{<n})).\end{align*}$$
$$\begin{align*}L_{1,\infty}(\mu,\xi)=2\sum_{n=1}^\infty(1-\xi(A_n|A_{<n})).\end{align*}$$
Proof. For each 
 $\sigma \in \{0,1\}^*$
, we have
$\sigma \in \{0,1\}^*$
, we have 
 $$\begin{align*}\ell_{1,\sigma}=|\mu(0|\sigma)-\xi(0|\sigma)|+|\mu(1|\sigma)-\xi(1|\sigma)|.\end{align*}$$
$$\begin{align*}\ell_{1,\sigma}=|\mu(0|\sigma)-\xi(0|\sigma)|+|\mu(1|\sigma)-\xi(1|\sigma)|.\end{align*}$$
Since 
 $\mu =\mathbf {1}_A$
, we have
$\mu =\mathbf {1}_A$
, we have 
 $$\begin{align*}E_{X\sim \mu}[\ell_{1,X_{<n}}(\mu,\xi)] =|\mu(0|A_{<n})-\xi(0|A_{<n})|+|\mu(1|A_{<n})-\xi(1|A_{<n})|\end{align*}$$
$$\begin{align*}E_{X\sim \mu}[\ell_{1,X_{<n}}(\mu,\xi)] =|\mu(0|A_{<n})-\xi(0|A_{<n})|+|\mu(1|A_{<n})-\xi(1|A_{<n})|\end{align*}$$
for each 
 $n\in \mathbb {N}$
. Since
$n\in \mathbb {N}$
. Since 
 $\mu (A_n|A_{<n})=1$
 and
$\mu (A_n|A_{<n})=1$
 and 
 $\mu (\overline {A_n}|A_{<n})=0$
 where
$\mu (\overline {A_n}|A_{<n})=0$
 where 
 $\overline {k}=1-k$
, we have
$\overline {k}=1-k$
, we have 
 $$\begin{align*}L_{1,\infty}(\mu,\xi) =\sum_{n=1}^\infty E_{X\sim \mu}[\ell_{1,X_{<n}}(\mu,\xi)] =\sum_{n=1}^\infty(1-\xi(A_n|A_{<n})+\xi(\overline{A_n}|A_{<n})).\end{align*}$$
$$\begin{align*}L_{1,\infty}(\mu,\xi) =\sum_{n=1}^\infty E_{X\sim \mu}[\ell_{1,X_{<n}}(\mu,\xi)] =\sum_{n=1}^\infty(1-\xi(A_n|A_{<n})+\xi(\overline{A_n}|A_{<n})).\end{align*}$$
Finally, notice that 
 $\xi (\overline {A_n}|A_{<n})=1-\xi (A_n|A_{<n})$
. Hence, the claim follows.
$\xi (\overline {A_n}|A_{<n})=1-\xi (A_n|A_{<n})$
. Hence, the claim follows.
Lemma 4.11. Let 
 $A\in \{0,1\}^{\mathbb {N}}$
 be a computable sequence and
$A\in \{0,1\}^{\mathbb {N}}$
 be a computable sequence and 
 $\mu =\mathbf {1}_A$
. Then,
$\mu =\mathbf {1}_A$
. Then, 
 ${1\in R(\mu )}$
.
${1\in R(\mu )}$
.
Proof. Let 
 $\xi $
 be a computable measure dominating
$\xi $
 be a computable measure dominating 
 $\mu $
.
$\mu $
.
 First, we demonstrate that 
 $L_{1,\infty }(\mu ,\xi )<\infty $
. By the inequality
$L_{1,\infty }(\mu ,\xi )<\infty $
. By the inequality 
 $$\begin{align*}\ln(1-x)\le -x\end{align*}$$
$$\begin{align*}\ln(1-x)\le -x\end{align*}$$
for all 
 $x\in \mathbb {R}$
, we have
$x\in \mathbb {R}$
, we have 
 $$ \begin{align} 1-\xi(A_n|A_{<n})\le -\ln\xi(A_n|A_{<n})=d_{A_{<n}}(\mu||\xi). \end{align} $$
$$ \begin{align} 1-\xi(A_n|A_{<n})\le -\ln\xi(A_n|A_{<n})=d_{A_{<n}}(\mu||\xi). \end{align} $$
From this and by Lemma 4.10, we have
 $$\begin{align*}L_{1,\infty}(\mu,\xi)\le 2 D_\infty(\mu||\xi)<\infty,\end{align*}$$
$$\begin{align*}L_{1,\infty}(\mu,\xi)\le 2 D_\infty(\mu||\xi)<\infty,\end{align*}$$
where the last inequality follows from Theorem 3.2.
 Let 
 $f(n)$
 be a computable function from
$f(n)$
 be a computable function from 
 $\mathbb {N}$
 to
$\mathbb {N}$
 to 
 $\mathbb {R}$
 such that
$\mathbb {R}$
 such that 
 $$\begin{align*}\ell_{1,A_{<n}}(\mu,\xi)+f(n)=2d_{A_{<n}}(\mu||\xi).\end{align*}$$
$$\begin{align*}\ell_{1,A_{<n}}(\mu,\xi)+f(n)=2d_{A_{<n}}(\mu||\xi).\end{align*}$$
Then, 
 $f(n)\ge 0$
 for all n by (10). Hence,
$f(n)\ge 0$
 for all n by (10). Hence, 
 $$\begin{align*}L_{1,\infty}(\mu,\xi)+\sum_n f(n)=2D_\infty(\mu||\xi),\end{align*}$$
$$\begin{align*}L_{1,\infty}(\mu,\xi)+\sum_n f(n)=2D_\infty(\mu||\xi),\end{align*}$$
which implies 
 $L_{1,\infty }(\mu ,\xi )\le _S D_\infty (\mu ||\xi )$
.
$L_{1,\infty }(\mu ,\xi )\le _S D_\infty (\mu ||\xi )$
.
Next, we prove the converse relation. For sufficiently large n, we have
 $$\begin{align*}\ell_{1,A_{<n}}(\mu||\xi)>2(\ln2)(1-\xi(A_n|A_{<n}))\ge-\ln\xi(A_n|A_{<n})=d_{A_{<n}}(\mu||\xi),\end{align*}$$
$$\begin{align*}\ell_{1,A_{<n}}(\mu||\xi)>2(\ln2)(1-\xi(A_n|A_{<n}))\ge-\ln\xi(A_n|A_{<n})=d_{A_{<n}}(\mu||\xi),\end{align*}$$
where we used 
 $0<\ln 2<1$
 for the first inequality and
$0<\ln 2<1$
 for the first inequality and 
 $\ln (1-x)\ge -2(\ln 2)x$
 for all
$\ln (1-x)\ge -2(\ln 2)x$
 for all 
 $x\in [0,1/2]$
 for the second inequality. Also note that, since
$x\in [0,1/2]$
 for the second inequality. Also note that, since 
 $L_{1,\infty }(\mu ,\xi )<\infty $
 by above, we have
$L_{1,\infty }(\mu ,\xi )<\infty $
 by above, we have 
 $1-\xi (A_n|A_{<n})\to 0$
 as
$1-\xi (A_n|A_{<n})\to 0$
 as 
 $n\to \infty $
. Thus, there exists a left-c.e. real
$n\to \infty $
. Thus, there exists a left-c.e. real 
 $\alpha $
 such that
$\alpha $
 such that 
 $L_{1,\infty }(\mu ,\xi )=D_\infty (\mu ||\xi )+\alpha $
. Hence,
$L_{1,\infty }(\mu ,\xi )=D_\infty (\mu ||\xi )+\alpha $
. Hence, 
 $D_\infty (\mu ||\xi )\le _S L_{1,\infty }(\mu ,\xi )$
.
$D_\infty (\mu ||\xi )\le _S L_{1,\infty }(\mu ,\xi )$
.
Lemma 4.12. Let 
 $A\in \{0,1\}^{\mathbb {N}}$
 be a computable sequence and
$A\in \{0,1\}^{\mathbb {N}}$
 be a computable sequence and 
 $\mu =\mathbf {1}_A$
. Then,
$\mu =\mathbf {1}_A$
. Then, 
 ${p\not \in R(\mu )}$
 for each positive computable real
${p\not \in R(\mu )}$
 for each positive computable real 
 $p\ne 1$
.
$p\ne 1$
.
Proof. Let 
 $\xi $
 be a computable measure on
$\xi $
 be a computable measure on 
 $\{0,1\}^{\mathbb {N}}$
 dominating
$\{0,1\}^{\mathbb {N}}$
 dominating 
 $\nu $
 constructed in Lemma 4.3. Then,
$\nu $
 constructed in Lemma 4.3. Then, 
 $L_{1,\infty }(\mu ,\xi )$
 is ML-random by Lemma 4.11. We also have
$L_{1,\infty }(\mu ,\xi )$
 is ML-random by Lemma 4.11. We also have 
 $$ \begin{align*} L_{p,\infty}(\mu,\xi) &=\sum_{n=1}^\infty \ell_{p,A_{<n}}(\mu,\xi) =\sum_{n=1}^\infty\sum_{a\in\{0,1\}}|\mu(a|A_{<n})-\xi(a|A_{<n})|^p\\ &=2\sum_{n=1}^\infty|\mu(A_n|A_{<n})-\xi(A_n|A_{<n})|^p. \end{align*} $$
$$ \begin{align*} L_{p,\infty}(\mu,\xi) &=\sum_{n=1}^\infty \ell_{p,A_{<n}}(\mu,\xi) =\sum_{n=1}^\infty\sum_{a\in\{0,1\}}|\mu(a|A_{<n})-\xi(a|A_{<n})|^p\\ &=2\sum_{n=1}^\infty|\mu(A_n|A_{<n})-\xi(A_n|A_{<n})|^p. \end{align*} $$
Now, by Theorem 4.8(ii), 
 $L_{p,\infty }(\mu ,\xi )=\infty $
 for each computable
$L_{p,\infty }(\mu ,\xi )=\infty $
 for each computable 
 $p\in (0,1)$
. Similarly, by Theorem 4.8(i),
$p\in (0,1)$
. Similarly, by Theorem 4.8(i), 
 $L_{p,\infty }(\mu ,\xi )<\infty $
 is not ML-random for each computable
$L_{p,\infty }(\mu ,\xi )<\infty $
 is not ML-random for each computable 
 $p>1$
, which is not Solovay equivalent to a left-c.e. ML-random real
$p>1$
, which is not Solovay equivalent to a left-c.e. ML-random real 
 $D_\infty (\mu ,\xi )$
. Hence,
$D_\infty (\mu ,\xi )$
. Hence, 
 $p\not \in R(\mu )$
 for each positive computable real
$p\not \in R(\mu )$
 for each positive computable real 
 $p\ne 1$
.
$p\ne 1$
.
Proof of Proposition 4.9.
 The claim 
 $R(\mu )=1$
 follows from Lemmas 4.11 and 4.12. Since
$R(\mu )=1$
 follows from Lemmas 4.11 and 4.12. Since 
 $1\in R(\mu )$
, we have
$1\in R(\mu )$
, we have 
 $L_{1,\infty }(\mu ,\xi )<\infty $
 and
$L_{1,\infty }(\mu ,\xi )<\infty $
 and 
 $D_\infty (\mu ||\xi )\equiv _S L_{1,\infty }(\mu ,\xi )$
 for all computable measures
$D_\infty (\mu ||\xi )\equiv _S L_{1,\infty }(\mu ,\xi )$
 for all computable measures 
 $\xi $
 dominating
$\xi $
 dominating 
 $\mu $
. By Theorem 4.1, there exists a computable measure
$\mu $
. By Theorem 4.1, there exists a computable measure 
 $\nu $
 such that
$\nu $
 such that 
 $D_\infty (\mu ||\xi )$
 is a left-c.e. ML-random real for all computable measures
$D_\infty (\mu ||\xi )$
 is a left-c.e. ML-random real for all computable measures 
 $\xi $
 dominating
$\xi $
 dominating 
 $\nu $
. Thus,
$\nu $
. Thus, 
 $L_{1,\infty }(\mu ,\xi )$
 is ML-random for all computable measures
$L_{1,\infty }(\mu ,\xi )$
 is ML-random for all computable measures 
 $\xi $
 dominating
$\xi $
 dominating 
 $\mu $
 and
$\mu $
 and 
 $\nu $
.
$\nu $
.
 When the model measure is a Dirac measure, the rate of convergence can be expressed more concretely by time-bounded Kolmogorov complexity. Let 
 $h:\mathbb {N}\to \mathbb {N}$
 be a computable function, and let
$h:\mathbb {N}\to \mathbb {N}$
 be a computable function, and let 
 $M:\subseteq \{0,1\}^*\to \mathbb {N}$
 be a prefix-free machine. The Kolmogorov complexity relative to M with time bound h is
$M:\subseteq \{0,1\}^*\to \mathbb {N}$
 be a prefix-free machine. The Kolmogorov complexity relative to M with time bound h is 
 $$\begin{align*}K^h_M(\sigma)=\min\{|\tau|\ :\ M(\tau)=\sigma \text{ in at most } h(|\sigma|)\text{ steps }\}.\end{align*}$$
$$\begin{align*}K^h_M(\sigma)=\min\{|\tau|\ :\ M(\tau)=\sigma \text{ in at most } h(|\sigma|)\text{ steps }\}.\end{align*}$$
Here, 
 $h:\mathbb {N}\to \mathbb {N}$
 is a total computable function. We write
$h:\mathbb {N}\to \mathbb {N}$
 is a total computable function. We write 
 $K^h(\sigma )$
 as the mean
$K^h(\sigma )$
 as the mean 
 $K^h_U(\sigma )$
 for a fixed universal prefix-free machine U.
$K^h_U(\sigma )$
 for a fixed universal prefix-free machine U.
Proposition 4.13. Let 
 $A\in \{0,1\}^{\mathbb {N}}$
 be a computable sequence.
$A\in \{0,1\}^{\mathbb {N}}$
 be a computable sequence. 
- 
(i) For every total computable prediction  $\xi $
 dominating $\xi $
 dominating $\mu =\mathbf {1}_A$
, there exists a computable function $\mu =\mathbf {1}_A$
, there exists a computable function $h:\mathbb {N}\to \mathbb {N}$
 such that $h:\mathbb {N}\to \mathbb {N}$
 such that $$\begin{align*}K^h(n)\le-\log(1-\xi(A_n|A_{<n}))+O(1).\end{align*}$$ $$\begin{align*}K^h(n)\le-\log(1-\xi(A_n|A_{<n}))+O(1).\end{align*}$$
- 
(ii) For every total computable function  $h:\mathbb {N}\to \mathbb {N}$
, we have for all sufficiently general computable prediction measure $h:\mathbb {N}\to \mathbb {N}$
, we have for all sufficiently general computable prediction measure $$\begin{align*}-\log(1-\xi(A_n|A_{<n}))\le K^h(n)+O(1) \end{align*}$$ $$\begin{align*}-\log(1-\xi(A_n|A_{<n}))\le K^h(n)+O(1) \end{align*}$$ $\xi $
. $\xi $
.
Here, 
 $\log $
 is the logarithm with base
$\log $
 is the logarithm with base 
 $2$
.
$2$
.
 From this theorem, we know that the error 
 $1-\xi (A_n|A_{<n})$
 is essentially the same as
$1-\xi (A_n|A_{<n})$
 is essentially the same as 
 $2^{-K^h(n)}$
 up to a multiplicative constant. We use this formulation because of the non-optimality of the time-bounded Kolmogorov complexity.
$2^{-K^h(n)}$
 up to a multiplicative constant. We use this formulation because of the non-optimality of the time-bounded Kolmogorov complexity.
Proof. (i) By Proposition 4.9, we have
 $$\begin{align*}\sum_n(1-\xi(A_n|A_{<n}))<\infty.\end{align*}$$
$$\begin{align*}\sum_n(1-\xi(A_n|A_{<n}))<\infty.\end{align*}$$
By the KC-theorem [Reference Downey and Hirschfeldt9, Theorem 3.6.1], there exists a prefix-free machine 
 ${M:\subseteq \{0,1\}^*\to \mathbb {N}}$
 and a computable sequence
${M:\subseteq \{0,1\}^*\to \mathbb {N}}$
 and a computable sequence 
 $(\sigma _n)_n$
 of strings such that
$(\sigma _n)_n$
 of strings such that 
 $$\begin{align*}M(\sigma_n)=n,\ \ |\sigma_n|\le-\log(1-\xi(A_n|A_{<n}))+O(1).\end{align*}$$
$$\begin{align*}M(\sigma_n)=n,\ \ |\sigma_n|\le-\log(1-\xi(A_n|A_{<n}))+O(1).\end{align*}$$
Let 
 $\tau \in \{0,1\}^*$
 be a string such that
$\tau \in \{0,1\}^*$
 be a string such that 
 $U(\tau \sigma )\simeq M(\sigma )$
 for all
$U(\tau \sigma )\simeq M(\sigma )$
 for all 
 $\sigma \in \{0,1\}^*$
. Then, the function
$\sigma \in \{0,1\}^*$
. Then, the function 
 $n\mapsto U(\tau \sigma _n)$
 is a total computable function. Therefore, there exists a total computable function
$n\mapsto U(\tau \sigma _n)$
 is a total computable function. Therefore, there exists a total computable function 
 $h : \mathbb {N} \to \mathbb {N}$
 such that, for every
$h : \mathbb {N} \to \mathbb {N}$
 such that, for every 
 $n\in \mathbb {N}$
, the computation of
$n\in \mathbb {N}$
, the computation of 
 $U(\tau \sigma _n)$
 halts within at most
$U(\tau \sigma _n)$
 halts within at most 
 $h(n)$
 steps. By this definition of h, we obtain
$h(n)$
 steps. By this definition of h, we obtain 
 $$\begin{align*}K^h(n) \leq |\tau| + |\sigma_n|\,.\end{align*}$$
$$\begin{align*}K^h(n) \leq |\tau| + |\sigma_n|\,.\end{align*}$$
 (ii) We define a computable prediction measure 
 $\nu $
 by
$\nu $
 by 
 $$\begin{align*}\nu=\sum_n 2^{-K^h(n)}\mathbf{1}_{A_{<n}\overline{A_n}0^{\mathbb{N}}}+(1-s)\mathbf{1}_{A},\end{align*}$$
$$\begin{align*}\nu=\sum_n 2^{-K^h(n)}\mathbf{1}_{A_{<n}\overline{A_n}0^{\mathbb{N}}}+(1-s)\mathbf{1}_{A},\end{align*}$$
where 
 $s=\sum _n 2^{-K^h(n)}<1$
 and
$s=\sum _n 2^{-K^h(n)}<1$
 and 
 $\overline {k}=1-k$
 for
$\overline {k}=1-k$
 for 
 $k\in \{0,1\}$
.
$k\in \{0,1\}$
.
 We claim that this measure 
 $\nu $
 is computable. We show that
$\nu $
 is computable. We show that 
 $\nu (\sigma )$
 is computable uniformly in
$\nu (\sigma )$
 is computable uniformly in 
 $\sigma \in \{0,1\}^*$
. If
$\sigma \in \{0,1\}^*$
. If 
 $\sigma \prec A$
, then
$\sigma \prec A$
, then 
 $$\begin{align*}\nu(\sigma)=\sum_{n>|\sigma|}2^{-K^h(n)}+(1-s)=1-\sum_{n\le|\sigma|}2^{-K^h(n)}.\end{align*}$$
$$\begin{align*}\nu(\sigma)=\sum_{n>|\sigma|}2^{-K^h(n)}+(1-s)=1-\sum_{n\le|\sigma|}2^{-K^h(n)}.\end{align*}$$
If 
 $\sigma =A_{<k}\overline {A_k}0^i$
 for some
$\sigma =A_{<k}\overline {A_k}0^i$
 for some 
 $k,i\in \mathbb {N}$
, then
$k,i\in \mathbb {N}$
, then 
 $$\begin{align*}\nu(\sigma)=2^{-K^h(k)}.\end{align*}$$
$$\begin{align*}\nu(\sigma)=2^{-K^h(k)}.\end{align*}$$
If 
 $\sigma =A_{<k}\overline {A_k}0^i1\tau $
 for some
$\sigma =A_{<k}\overline {A_k}0^i1\tau $
 for some 
 $k,i\in \mathbb {N}$
 and
$k,i\in \mathbb {N}$
 and 
 $\tau \in \{0,1\}^*$
, then
$\tau \in \{0,1\}^*$
, then 
 $$\begin{align*}\nu(\sigma)=0.\end{align*}$$
$$\begin{align*}\nu(\sigma)=0.\end{align*}$$
In any case, 
 $\nu (\sigma )$
 is computable from n. Furthermore, these relations are decidable.
$\nu (\sigma )$
 is computable from n. Furthermore, these relations are decidable.
 Let 
 $\xi $
 be a computable measure dominating
$\xi $
 be a computable measure dominating 
 $\nu $
. Then, there exists
$\nu $
. Then, there exists 
 $c\in \mathbb {N}$
 such that
$c\in \mathbb {N}$
 such that 
 $\nu (\sigma )\le c\xi (\sigma )$
 for all
$\nu (\sigma )\le c\xi (\sigma )$
 for all 
 $\sigma \in \{0,1\}^*$
. Then,
$\sigma \in \{0,1\}^*$
. Then, 
 $$\begin{align*}1-\xi(A_n|A_{<n}) =1-\frac{\xi(A_{\le n})}{\xi(A_{<n})} =\frac{\xi(A_{<n}\overline{A_n})}{\xi(A_{<n})} \ge\frac{\nu(A_{<n}\overline{A_n})}{c} =\frac{2^{-K^h(n)}}{c}.\\[-41pt] \end{align*}$$
$$\begin{align*}1-\xi(A_n|A_{<n}) =1-\frac{\xi(A_{\le n})}{\xi(A_{<n})} =\frac{\xi(A_{<n}\overline{A_n})}{\xi(A_{<n})} \ge\frac{\nu(A_{<n}\overline{A_n})}{c} =\frac{2^{-K^h(n)}}{c}.\\[-41pt] \end{align*}$$
4.4 Case of separated measures
Now, we discuss the convergence rate of general computable predictions when the computable model measure is separated. In this case, the convergence rate is much slower than that for the Dirac measures.
 We call a measure to be separated if the conditional probabilities are far away from 
 $0$
 and
$0$
 and 
 $1$
. A formal definition is as follows.
$1$
. A formal definition is as follows.
Definition 4.14 (See before Theorem 196 in [Reference Shen, Uspensky and Vereshchagin22]).
 A measure 
 $\mu $
 on
$\mu $
 on 
 $\{0,1\}^{\mathbb {N}}$
 is called separated (from 0 to 1), if
$\{0,1\}^{\mathbb {N}}$
 is called separated (from 0 to 1), if 
 $$\begin{align*}\inf_{\sigma\in\{0,1\}^*,\ k\in\{0,1\}}\mu(k|\sigma)>0.\end{align*}$$
$$\begin{align*}\inf_{\sigma\in\{0,1\}^*,\ k\in\{0,1\}}\mu(k|\sigma)>0.\end{align*}$$
Remark 4.15. Li–Vitányi’s book called this notion “conditionally bounded away from zero” [Reference Li and Vitányi15, Definition 5.2.3].
Proposition 4.16. Let 
 $\mu $
 be a computable separated measure. Then,
$\mu $
 be a computable separated measure. Then, 
 $R(\mu )=2$
. In particular,
$R(\mu )=2$
. In particular, 
 $L_{2,\infty }(\mu ,\xi )<\infty $
 and is a left-c.e. ML-random real for all sufficiently general computable prediction measure
$L_{2,\infty }(\mu ,\xi )<\infty $
 and is a left-c.e. ML-random real for all sufficiently general computable prediction measure 
 $\xi $
.
$\xi $
.
Lemma 4.17. Let 
 $\mu $
 be a computable separated measure. Then,
$\mu $
 be a computable separated measure. Then, 
 $2\in R(\mu )$
.
$2\in R(\mu )$
.
In the following proof, we use a version of Pinsker’s inequality and a reverse Pinsker inequality. A Pinsker inequality bounds the squared total variation from above by the KL divergence (see, for example, Verdú [Reference Verdú26, (51)]). A reverse inequality does not hold in general, but it does under separation assumptions (see, for instance, [Reference Csiszar and Talata8, Lemma 6.3]). For a more comprehensive survey, see the work of Sason [Reference Sason21].
Proof. Let 
 $\xi $
 be a computable measure dominating
$\xi $
 be a computable measure dominating 
 $\mu $
. By Pinsker’s inequality and a reverse Pinsker inequality, there are
$\mu $
. By Pinsker’s inequality and a reverse Pinsker inequality, there are 
 $a, b\in \mathbb {N}$
 such that
$a, b\in \mathbb {N}$
 such that 
 $$\begin{align*}(\ell_{1,\sigma}(\mu,\xi))^2\le a\cdot d_\sigma(\mu||\xi)\le b \cdot(\ell_{1,\sigma}(\mu,\xi))^2.\end{align*}$$
$$\begin{align*}(\ell_{1,\sigma}(\mu,\xi))^2\le a\cdot d_\sigma(\mu||\xi)\le b \cdot(\ell_{1,\sigma}(\mu,\xi))^2.\end{align*}$$
Now we look at the relation between 
 $(\ell _{1,\sigma }(\mu ,\xi ))^2$
 and
$(\ell _{1,\sigma }(\mu ,\xi ))^2$
 and 
 $\ell _{2,\sigma }(\mu ,\xi )$
. We use the inequalities
$\ell _{2,\sigma }(\mu ,\xi )$
. We use the inequalities 
 $$\begin{align*}x^2+y^2\le(x+y)^2\le 2(x^2+y^2)\end{align*}$$
$$\begin{align*}x^2+y^2\le(x+y)^2\le 2(x^2+y^2)\end{align*}$$
for 
 $x,y\ge 0$
 to deduce
$x,y\ge 0$
 to deduce 
 $$ \begin{align} \ell_{2,\sigma}(\mu,\xi)\le a\cdot d_{\sigma}(\mu||\xi)\le 2b\cdot \ell_{2,\sigma}(\mu,\xi). \end{align} $$
$$ \begin{align} \ell_{2,\sigma}(\mu,\xi)\le a\cdot d_{\sigma}(\mu||\xi)\le 2b\cdot \ell_{2,\sigma}(\mu,\xi). \end{align} $$
The first inequality implies
 $$\begin{align*}L_{2,\infty}(\mu,\xi)\le a D_\infty(\mu||\xi)<\infty\end{align*}$$
$$\begin{align*}L_{2,\infty}(\mu,\xi)\le a D_\infty(\mu||\xi)<\infty\end{align*}$$
by Theorem 4.1, and thus 
 $2\in R(\mu )$
. The first inequality in (11) also implies the existence of a computable function
$2\in R(\mu )$
. The first inequality in (11) also implies the existence of a computable function 
 $f:\{0,1\}^*\to \mathbb {R}$
 such that
$f:\{0,1\}^*\to \mathbb {R}$
 such that 
 $$\begin{align*}\ell_{2,\sigma}(\mu,\xi)+f(\sigma)=a d_\sigma(\mu||\xi),\end{align*}$$
$$\begin{align*}\ell_{2,\sigma}(\mu,\xi)+f(\sigma)=a d_\sigma(\mu||\xi),\end{align*}$$
and thus the existence of a left-c.e. real 
 $\gamma $
 such that
$\gamma $
 such that 
 $$\begin{align*}L_{2,\sigma}(\mu,\xi)+\gamma=a D_\infty(\mu||\xi).\end{align*}$$
$$\begin{align*}L_{2,\sigma}(\mu,\xi)+\gamma=a D_\infty(\mu||\xi).\end{align*}$$
Hence, 
 $L_{2,\infty }(\mu ,\xi )\le _S D_\infty (\mu ,\xi )$
. Similarly, the second inequality in (11) implies
$L_{2,\infty }(\mu ,\xi )\le _S D_\infty (\mu ,\xi )$
. Similarly, the second inequality in (11) implies 
 $D_\infty (\mu ,\xi )\le _S L_{2,\infty }(\mu ,\xi )$
. Hence, we have
$D_\infty (\mu ,\xi )\le _S L_{2,\infty }(\mu ,\xi )$
. Hence, we have 
 $L_{2,\infty }(\mu ,\xi )\equiv _S D_\infty (\mu ,\xi )$
.
$L_{2,\infty }(\mu ,\xi )\equiv _S D_\infty (\mu ,\xi )$
.
Lemma 4.18. Let 
 $\mu $
 be a computable separated measure. Then,
$\mu $
 be a computable separated measure. Then, 
 $p\not \in R(\mu )$
 for each positive computable real
$p\not \in R(\mu )$
 for each positive computable real 
 $p\ne 2$
.
$p\ne 2$
.
Proof. By Theorem 4.1, there exists a computable 
 $\xi $
 such that
$\xi $
 such that 
 $\xi $
 dominates
$\xi $
 dominates 
 $\mu $
 and
$\mu $
 and 
 $D_\infty (\mu ||\xi )$
 is a finite left-c.e. ML-random real. By Lemma 4.17,
$D_\infty (\mu ||\xi )$
 is a finite left-c.e. ML-random real. By Lemma 4.17, 
 $D_\infty (\mu ||\xi )\equiv _S L_{2,\infty }(\mu ,\xi )$
, which implies
$D_\infty (\mu ||\xi )\equiv _S L_{2,\infty }(\mu ,\xi )$
, which implies 
 $L_{2,\infty }(\mu ,\xi )$
 is a finite left-c.e. ML-random real by Proposition 4.6. By (ii) of Theorem 4.8, we have
$L_{2,\infty }(\mu ,\xi )$
 is a finite left-c.e. ML-random real by Proposition 4.6. By (ii) of Theorem 4.8, we have 
 $L_{p,\infty }(\mu ,\xi )=\infty $
 for each
$L_{p,\infty }(\mu ,\xi )=\infty $
 for each 
 $p\in (0,2)$
. In particular,
$p\in (0,2)$
. In particular, 
 $p\not \in R(\mu )$
 for each
$p\not \in R(\mu )$
 for each 
 $p\in (0,2)$
.
$p\in (0,2)$
.
 Let 
 $p>2$
 be a computable real. We construct a computable measure
$p>2$
 be a computable real. We construct a computable measure 
 $\nu $
 such that:
$\nu $
 such that: 
- 
(i)  $\nu $
 dominates $\nu $
 dominates $\mu $
, $\mu $
,
- 
(ii)  $\mathrm {dim}(L_{2,\infty }(\mu ,\nu ))=\frac {1}{2}$
, $\mathrm {dim}(L_{2,\infty }(\mu ,\nu ))=\frac {1}{2}$
,
- 
(iii)  $\mathrm {dim}(L_{p,\infty }(\mu ,\nu ))=\frac {1}{p}$
. $\mathrm {dim}(L_{p,\infty }(\mu ,\nu ))=\frac {1}{p}$
.
Suppose such a measure 
 $\nu $
 exists and
$\nu $
 exists and 
 $p\in R(\mu )$
. By
$p\in R(\mu )$
. By 
 $2\in R(\mu )$
 and (i), we have
$2\in R(\mu )$
 and (i), we have 
 $D_\infty (\mu ,\nu )\equiv _S L_{2,\infty }(\mu ,\nu )$
. By
$D_\infty (\mu ,\nu )\equiv _S L_{2,\infty }(\mu ,\nu )$
. By 
 $p\in R(\mu )$
 and (i), we have
$p\in R(\mu )$
 and (i), we have 
 $D_\infty (\mu ,\nu )\equiv _S L_{p,\infty }(\mu ,\nu )$
. Since Solovay equivalence implies the same effective Hausdorff dimension, we have
$D_\infty (\mu ,\nu )\equiv _S L_{p,\infty }(\mu ,\nu )$
. Since Solovay equivalence implies the same effective Hausdorff dimension, we have 
 $\mathrm {dim}(L_{2,\infty }(\mu ,\nu ))=\mathrm {dim}(L_{p,\infty }(\mu ,\nu ))$
, which contradicts with (ii) and (iii). Thus,
$\mathrm {dim}(L_{2,\infty }(\mu ,\nu ))=\mathrm {dim}(L_{p,\infty }(\mu ,\nu ))$
, which contradicts with (ii) and (iii). Thus, 
 $p\not \in R(\mu )$
.
$p\not \in R(\mu )$
.
 The construction of 
 $\nu $
 is as follows. Let
$\nu $
 is as follows. Let 
 $\alpha $
 be a rational such that
$\alpha $
 be a rational such that 
 $0<\alpha <\inf \{\mu (a|\sigma )\ :\ a\in \{0,1\},\ \sigma \in \{0,1\}^*\}$
. Since
$0<\alpha <\inf \{\mu (a|\sigma )\ :\ a\in \{0,1\},\ \sigma \in \{0,1\}^*\}$
. Since 
 $\mu $
 is separated, such
$\mu $
 is separated, such 
 $\alpha $
 exists. Let
$\alpha $
 exists. Let 
 $(z_n)_n$
 be a computable sequence of positive rationals such that
$(z_n)_n$
 be a computable sequence of positive rationals such that 
 $z_n<\frac {\alpha }{2}$
 and
$z_n<\frac {\alpha }{2}$
 and 
 $\sum _{n=0}^\infty z_n$
 is a finite left-c.e. ML-random real. Fix a sufficiently small rational
$\sum _{n=0}^\infty z_n$
 is a finite left-c.e. ML-random real. Fix a sufficiently small rational 
 $\epsilon>0$
. Consider a computable function
$\epsilon>0$
. Consider a computable function 
 $\sigma \in \{0,1\}^*\mapsto a_\sigma \in \{0,1\}$
 such that
$\sigma \in \{0,1\}^*\mapsto a_\sigma \in \{0,1\}$
 such that 
 $\mu (a_\sigma |\sigma )>\frac {1}{2}-\epsilon $
. We define a computable measure
$\mu (a_\sigma |\sigma )>\frac {1}{2}-\epsilon $
. We define a computable measure 
 $\nu $
 as follows:
$\nu $
 as follows: 
 $$ \begin{align*} \nu(a|\sigma)= \begin{cases} \mu(a|\sigma)-z_{|\sigma|,}\ &\ \text{ if }a=a_\sigma,\\ \mu(a|\sigma)+z_{|\sigma|,}\ &\ \text{ if }a\ne a_\sigma. \end{cases} \end{align*} $$
$$ \begin{align*} \nu(a|\sigma)= \begin{cases} \mu(a|\sigma)-z_{|\sigma|,}\ &\ \text{ if }a=a_\sigma,\\ \mu(a|\sigma)+z_{|\sigma|,}\ &\ \text{ if }a\ne a_\sigma. \end{cases} \end{align*} $$
 (i). First we evaluate 
 $\nu (a|\sigma )/\mu (a|\sigma )$
. If
$\nu (a|\sigma )/\mu (a|\sigma )$
. If 
 $a=a_\sigma $
, then
$a=a_\sigma $
, then 
 $$\begin{align*}\frac{\nu(a|\sigma)}{\mu(a|\sigma)}=1-\frac{z_{|\sigma|}}{\mu(a_\sigma|\sigma)} \ge1-\frac{z_{|\sigma|}}{1/2-\epsilon}.\end{align*}$$
$$\begin{align*}\frac{\nu(a|\sigma)}{\mu(a|\sigma)}=1-\frac{z_{|\sigma|}}{\mu(a_\sigma|\sigma)} \ge1-\frac{z_{|\sigma|}}{1/2-\epsilon}.\end{align*}$$
If 
 $a\ne \sigma $
, then
$a\ne \sigma $
, then 
 $$\begin{align*}\frac{\nu(a|\sigma)}{\mu(a|\sigma)}=1+\frac{z_{|\sigma|}}{\mu(a|\sigma)}\ge1.\end{align*}$$
$$\begin{align*}\frac{\nu(a|\sigma)}{\mu(a|\sigma)}=1+\frac{z_{|\sigma|}}{\mu(a|\sigma)}\ge1.\end{align*}$$
Thus, we have
 $$ \begin{align*} \nu(\sigma) =\prod_{n=1}^{|\sigma|} \nu(\sigma_n|\sigma_{<n}) \ge\prod_{n=1}^{|\sigma|} \mu(\sigma_n|\sigma_{<n}) (1-\frac{z_{n-1}}{1/2-\epsilon}) \ge\frac{\mu(\sigma)}{c} \end{align*} $$
$$ \begin{align*} \nu(\sigma) =\prod_{n=1}^{|\sigma|} \nu(\sigma_n|\sigma_{<n}) \ge\prod_{n=1}^{|\sigma|} \mu(\sigma_n|\sigma_{<n}) (1-\frac{z_{n-1}}{1/2-\epsilon}) \ge\frac{\mu(\sigma)}{c} \end{align*} $$
for some constant 
 $c\in \mathbb {N}$
.
$c\in \mathbb {N}$
.
(ii)(iii). Notice that
 $$\begin{align*}L_{1,\infty}(\mu,\nu)=\sum_{n=0}^\infty z_n\end{align*}$$
$$\begin{align*}L_{1,\infty}(\mu,\nu)=\sum_{n=0}^\infty z_n\end{align*}$$
is a finite left-c.e. ML-random real, and that
 $$\begin{align*}L_{q,\infty}(\mu,\nu)=\sum_{n=0}^\infty z_n^q\end{align*}$$
$$\begin{align*}L_{q,\infty}(\mu,\nu)=\sum_{n=0}^\infty z_n^q\end{align*}$$
for any 
 $q\ge 1$
. Thus, the claims follow by Theorem 4.8.
$q\ge 1$
. Thus, the claims follow by Theorem 4.8.
Proof of Proposition 4.16.
 The claim 
 $R(\mu )=2$
 follows by Lemmas 4.17 and 4.18. Since
$R(\mu )=2$
 follows by Lemmas 4.17 and 4.18. Since 
 $2\in R(\mu )$
, we have
$2\in R(\mu )$
, we have 
 $L_{2,\infty }(\mu ,\xi )<\infty $
 and
$L_{2,\infty }(\mu ,\xi )<\infty $
 and 
 $D_\infty (\mu ||\xi )\equiv _S L_{2,\infty }(\mu ,\xi )$
 for all computable measures
$D_\infty (\mu ||\xi )\equiv _S L_{2,\infty }(\mu ,\xi )$
 for all computable measures 
 $\xi $
 dominating
$\xi $
 dominating 
 $\mu $
. By Theorem 4.1, there exists a computable measure
$\mu $
. By Theorem 4.1, there exists a computable measure 
 $\nu $
 such that
$\nu $
 such that 
 $D_\infty (\mu ||\xi )$
 is a left-c.e. ML-random real for all computable measures
$D_\infty (\mu ||\xi )$
 is a left-c.e. ML-random real for all computable measures 
 $\xi $
 dominating
$\xi $
 dominating 
 $\nu $
. Thus,
$\nu $
. Thus, 
 $L_{2,\infty }(\mu ,\xi )$
 is ML-random for all computable measures
$L_{2,\infty }(\mu ,\xi )$
 is ML-random for all computable measures 
 $\xi $
 dominating
$\xi $
 dominating 
 $\mu $
 and
$\mu $
 and 
 $\nu $
.
$\nu $
.
Acknowledgments
The author appreciates the anonymous reviewers’ efforts and helpful feedback. In particular, the proof of Theorem 3.2 was shortened by one of the reviewers.
Funding
The author is supported by Research Project Grant (B) by Institute of Science and Technology Meiji University, and JSPS KAKENHI (Grant Numbers 22K03408, 21K18585, 21K03340, and 21H03392). This work was also supported by the Research Institute for Mathematical Sciences, an International Joint Usage/Research Center located in Kyoto University.
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
